[jira] [Updated] (HUDI-3544) Reading from Metadata table fails w/ NPE

2022-03-07 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3544:
--
Status: Patch Available  (was: In Progress)

> Reading from Metadata table fails w/ NPE
> 
>
> Key: HUDI-3544
> URL: https://issues.apache.org/jira/browse/HUDI-3544
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> In one of the prod tables, ran into NullPointerExcpetion when reading from 
> MDT table. We are using one of the latest master commit hash. 
>  
> {code:java}
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Removed TaskSet 20.0, whose tasks 
> have all completed, from pool 
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Cancelling stage 20
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Killing all running tasks in stage 
> 20: Stage cancelled
> 22/03/01 15:23:33 INFO DAGScheduler: ResultStage 20 (collectAsMap at 
> UpsertPartitioner.java:253) failed in 10.901 s due to Job aborted due to 
> stage failure: Task 0 in stage 20.0 failed 4 times, most recent failure: Lost 
> task 0.3 in stage 20.0 (TID 460) (10.0.30.133 executor 1): 
> org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve files 
> in partition 
> s3a://kwabhudi-76437a13-5c90-471b-b6fb-1d362c409e5b/kwabhudi_default/threads 
> from metadata
>   at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:134)
>   at 
> org.apache.hudi.metadata.HoodieMetadataFileSystemView.listPartition(HoodieMetadataFileSystemView.java:65)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:304)
>   at 
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:295)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestFileSlicesBeforeOrOn(AbstractTableFileSystemView.java:638)
>   at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:119)
>   at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestFileSlicesBeforeOrOn(PriorityBasedFileSystemView.java:182)
>   at 
> org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitPartitioner.getSmallFileCandidates(SparkUpsertDeltaCommitPartitioner.java:107)
>   at 
> org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitPartitioner.getSmallFiles(SparkUpsertDeltaCommitPartitioner.java:66)
>   at 
> org.apache.hudi.table.action.commit.UpsertPartitioner.lambda$getSmallFilesForPartitions$f1d92f9e$1(UpsertPartitioner.java:253)
>   at 
> org.apache.spark.api.java.JavaPairRDD$.$anonfun$pairFunToScalaFun$1(JavaPairRDD.scala:1073)
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
>   at scala.collection.Iterator.foreach(Iterator.scala:943)
>   at scala.collection.Iterator.foreach$(Iterator.scala:943)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
>   at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
>   at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1431)
>   at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
>   at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
>   at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
>   at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
>   at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
>   at 
> org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at 

[jira] [Updated] (HUDI-3544) Reading from Metadata table fails w/ NPE

2022-03-07 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3544:
--
Status: In Progress  (was: Open)

> Reading from Metadata table fails w/ NPE
> 
>
> Key: HUDI-3544
> URL: https://issues.apache.org/jira/browse/HUDI-3544
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> In one of the prod tables, ran into NullPointerExcpetion when reading from 
> MDT table. We are using one of the latest master commit hash. 
>  
> {code:java}
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Removed TaskSet 20.0, whose tasks 
> have all completed, from pool 
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Cancelling stage 20
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Killing all running tasks in stage 
> 20: Stage cancelled
> 22/03/01 15:23:33 INFO DAGScheduler: ResultStage 20 (collectAsMap at 
> UpsertPartitioner.java:253) failed in 10.901 s due to Job aborted due to 
> stage failure: Task 0 in stage 20.0 failed 4 times, most recent failure: Lost 
> task 0.3 in stage 20.0 (TID 460) (10.0.30.133 executor 1): 
> org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve files 
> in partition 
> s3a://kwabhudi-76437a13-5c90-471b-b6fb-1d362c409e5b/kwabhudi_default/threads 
> from metadata
>   at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:134)
>   at 
> org.apache.hudi.metadata.HoodieMetadataFileSystemView.listPartition(HoodieMetadataFileSystemView.java:65)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:304)
>   at 
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:295)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestFileSlicesBeforeOrOn(AbstractTableFileSystemView.java:638)
>   at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:119)
>   at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestFileSlicesBeforeOrOn(PriorityBasedFileSystemView.java:182)
>   at 
> org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitPartitioner.getSmallFileCandidates(SparkUpsertDeltaCommitPartitioner.java:107)
>   at 
> org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitPartitioner.getSmallFiles(SparkUpsertDeltaCommitPartitioner.java:66)
>   at 
> org.apache.hudi.table.action.commit.UpsertPartitioner.lambda$getSmallFilesForPartitions$f1d92f9e$1(UpsertPartitioner.java:253)
>   at 
> org.apache.spark.api.java.JavaPairRDD$.$anonfun$pairFunToScalaFun$1(JavaPairRDD.scala:1073)
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
>   at scala.collection.Iterator.foreach(Iterator.scala:943)
>   at scala.collection.Iterator.foreach$(Iterator.scala:943)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
>   at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
>   at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1431)
>   at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
>   at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
>   at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
>   at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
>   at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
>   at 
> org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)

[jira] [Updated] (HUDI-3544) Reading from Metadata table fails w/ NPE

2022-03-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3544:
-
Sprint: Hudi-Sprint-Mar-01

> Reading from Metadata table fails w/ NPE
> 
>
> Key: HUDI-3544
> URL: https://issues.apache.org/jira/browse/HUDI-3544
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> In one of the prod tables, ran into NullPointerExcpetion when reading from 
> MDT table. We are using one of the latest master commit hash. 
>  
> {code:java}
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Removed TaskSet 20.0, whose tasks 
> have all completed, from pool 
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Cancelling stage 20
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Killing all running tasks in stage 
> 20: Stage cancelled
> 22/03/01 15:23:33 INFO DAGScheduler: ResultStage 20 (collectAsMap at 
> UpsertPartitioner.java:253) failed in 10.901 s due to Job aborted due to 
> stage failure: Task 0 in stage 20.0 failed 4 times, most recent failure: Lost 
> task 0.3 in stage 20.0 (TID 460) (10.0.30.133 executor 1): 
> org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve files 
> in partition 
> s3a://kwabhudi-76437a13-5c90-471b-b6fb-1d362c409e5b/kwabhudi_default/threads 
> from metadata
>   at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:134)
>   at 
> org.apache.hudi.metadata.HoodieMetadataFileSystemView.listPartition(HoodieMetadataFileSystemView.java:65)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:304)
>   at 
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:295)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestFileSlicesBeforeOrOn(AbstractTableFileSystemView.java:638)
>   at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:119)
>   at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestFileSlicesBeforeOrOn(PriorityBasedFileSystemView.java:182)
>   at 
> org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitPartitioner.getSmallFileCandidates(SparkUpsertDeltaCommitPartitioner.java:107)
>   at 
> org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitPartitioner.getSmallFiles(SparkUpsertDeltaCommitPartitioner.java:66)
>   at 
> org.apache.hudi.table.action.commit.UpsertPartitioner.lambda$getSmallFilesForPartitions$f1d92f9e$1(UpsertPartitioner.java:253)
>   at 
> org.apache.spark.api.java.JavaPairRDD$.$anonfun$pairFunToScalaFun$1(JavaPairRDD.scala:1073)
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
>   at scala.collection.Iterator.foreach(Iterator.scala:943)
>   at scala.collection.Iterator.foreach$(Iterator.scala:943)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
>   at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
>   at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1431)
>   at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
>   at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
>   at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
>   at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
>   at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
>   at 
> org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at 

[jira] [Updated] (HUDI-3544) Reading from Metadata table fails w/ NPE

2022-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3544:
-
Labels: pull-request-available  (was: )

> Reading from Metadata table fails w/ NPE
> 
>
> Key: HUDI-3544
> URL: https://issues.apache.org/jira/browse/HUDI-3544
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> In one of the prod tables, ran into NullPointerExcpetion when reading from 
> MDT table. We are using one of the latest master commit hash. 
>  
> {code:java}
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Removed TaskSet 20.0, whose tasks 
> have all completed, from pool 
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Cancelling stage 20
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Killing all running tasks in stage 
> 20: Stage cancelled
> 22/03/01 15:23:33 INFO DAGScheduler: ResultStage 20 (collectAsMap at 
> UpsertPartitioner.java:253) failed in 10.901 s due to Job aborted due to 
> stage failure: Task 0 in stage 20.0 failed 4 times, most recent failure: Lost 
> task 0.3 in stage 20.0 (TID 460) (10.0.30.133 executor 1): 
> org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve files 
> in partition 
> s3a://kwabhudi-76437a13-5c90-471b-b6fb-1d362c409e5b/kwabhudi_default/threads 
> from metadata
>   at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:134)
>   at 
> org.apache.hudi.metadata.HoodieMetadataFileSystemView.listPartition(HoodieMetadataFileSystemView.java:65)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:304)
>   at 
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:295)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestFileSlicesBeforeOrOn(AbstractTableFileSystemView.java:638)
>   at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:119)
>   at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestFileSlicesBeforeOrOn(PriorityBasedFileSystemView.java:182)
>   at 
> org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitPartitioner.getSmallFileCandidates(SparkUpsertDeltaCommitPartitioner.java:107)
>   at 
> org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitPartitioner.getSmallFiles(SparkUpsertDeltaCommitPartitioner.java:66)
>   at 
> org.apache.hudi.table.action.commit.UpsertPartitioner.lambda$getSmallFilesForPartitions$f1d92f9e$1(UpsertPartitioner.java:253)
>   at 
> org.apache.spark.api.java.JavaPairRDD$.$anonfun$pairFunToScalaFun$1(JavaPairRDD.scala:1073)
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
>   at scala.collection.Iterator.foreach(Iterator.scala:943)
>   at scala.collection.Iterator.foreach$(Iterator.scala:943)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
>   at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
>   at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1431)
>   at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
>   at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
>   at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
>   at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
>   at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
>   at 
> org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   

[jira] [Updated] (HUDI-3544) Reading from Metadata table fails w/ NPE

2022-03-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3544:
--
Fix Version/s: 0.11.0

> Reading from Metadata table fails w/ NPE
> 
>
> Key: HUDI-3544
> URL: https://issues.apache.org/jira/browse/HUDI-3544
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.11.0
>
>
> In one of the prod tables, ran into NullPointerExcpetion when reading from 
> MDT table. We are using one of the latest master commit hash. 
>  
> {code:java}
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Removed TaskSet 20.0, whose tasks 
> have all completed, from pool 
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Cancelling stage 20
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Killing all running tasks in stage 
> 20: Stage cancelled
> 22/03/01 15:23:33 INFO DAGScheduler: ResultStage 20 (collectAsMap at 
> UpsertPartitioner.java:253) failed in 10.901 s due to Job aborted due to 
> stage failure: Task 0 in stage 20.0 failed 4 times, most recent failure: Lost 
> task 0.3 in stage 20.0 (TID 460) (10.0.30.133 executor 1): 
> org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve files 
> in partition 
> s3a://kwabhudi-76437a13-5c90-471b-b6fb-1d362c409e5b/kwabhudi_default/threads 
> from metadata
>   at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:134)
>   at 
> org.apache.hudi.metadata.HoodieMetadataFileSystemView.listPartition(HoodieMetadataFileSystemView.java:65)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:304)
>   at 
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:295)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestFileSlicesBeforeOrOn(AbstractTableFileSystemView.java:638)
>   at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:119)
>   at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestFileSlicesBeforeOrOn(PriorityBasedFileSystemView.java:182)
>   at 
> org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitPartitioner.getSmallFileCandidates(SparkUpsertDeltaCommitPartitioner.java:107)
>   at 
> org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitPartitioner.getSmallFiles(SparkUpsertDeltaCommitPartitioner.java:66)
>   at 
> org.apache.hudi.table.action.commit.UpsertPartitioner.lambda$getSmallFilesForPartitions$f1d92f9e$1(UpsertPartitioner.java:253)
>   at 
> org.apache.spark.api.java.JavaPairRDD$.$anonfun$pairFunToScalaFun$1(JavaPairRDD.scala:1073)
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
>   at scala.collection.Iterator.foreach(Iterator.scala:943)
>   at scala.collection.Iterator.foreach$(Iterator.scala:943)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
>   at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
>   at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1431)
>   at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
>   at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
>   at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
>   at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
>   at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
>   at 
> org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at 

[jira] [Updated] (HUDI-3544) Reading from Metadata table fails w/ NPE

2022-03-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3544:
--
Priority: Blocker  (was: Major)

> Reading from Metadata table fails w/ NPE
> 
>
> Key: HUDI-3544
> URL: https://issues.apache.org/jira/browse/HUDI-3544
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.11.0
>
>
> In one of the prod tables, ran into NullPointerExcpetion when reading from 
> MDT table. We are using one of the latest master commit hash. 
>  
> {code:java}
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Removed TaskSet 20.0, whose tasks 
> have all completed, from pool 
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Cancelling stage 20
> 22/03/01 15:23:33 INFO TaskSchedulerImpl: Killing all running tasks in stage 
> 20: Stage cancelled
> 22/03/01 15:23:33 INFO DAGScheduler: ResultStage 20 (collectAsMap at 
> UpsertPartitioner.java:253) failed in 10.901 s due to Job aborted due to 
> stage failure: Task 0 in stage 20.0 failed 4 times, most recent failure: Lost 
> task 0.3 in stage 20.0 (TID 460) (10.0.30.133 executor 1): 
> org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve files 
> in partition 
> s3a://kwabhudi-76437a13-5c90-471b-b6fb-1d362c409e5b/kwabhudi_default/threads 
> from metadata
>   at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:134)
>   at 
> org.apache.hudi.metadata.HoodieMetadataFileSystemView.listPartition(HoodieMetadataFileSystemView.java:65)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:304)
>   at 
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:295)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestFileSlicesBeforeOrOn(AbstractTableFileSystemView.java:638)
>   at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:119)
>   at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestFileSlicesBeforeOrOn(PriorityBasedFileSystemView.java:182)
>   at 
> org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitPartitioner.getSmallFileCandidates(SparkUpsertDeltaCommitPartitioner.java:107)
>   at 
> org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitPartitioner.getSmallFiles(SparkUpsertDeltaCommitPartitioner.java:66)
>   at 
> org.apache.hudi.table.action.commit.UpsertPartitioner.lambda$getSmallFilesForPartitions$f1d92f9e$1(UpsertPartitioner.java:253)
>   at 
> org.apache.spark.api.java.JavaPairRDD$.$anonfun$pairFunToScalaFun$1(JavaPairRDD.scala:1073)
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
>   at scala.collection.Iterator.foreach(Iterator.scala:943)
>   at scala.collection.Iterator.foreach$(Iterator.scala:943)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
>   at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
>   at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1431)
>   at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
>   at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
>   at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
>   at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
>   at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
>   at 
> org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at