[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason updated HUDI-3517: - Fix Version/s: 0.14.1 (was: 0.14.0) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: Lokesh Jain >Priority: Blocker > Labels: hudi-on-call > Fix For: 0.14.1 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Zhang updated HUDI-3517: Fix Version/s: 0.14.0 (was: 0.13.1) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: Lokesh Jain >Priority: Blocker > Labels: hudi-on-call > Fix For: 0.14.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3517: -- Fix Version/s: (was: 0.12.3) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: Lokesh Jain >Priority: Blocker > Labels: hudi-on-call > Fix For: 0.13.1 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Issue Type: Bug (was: Improvement) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: Lokesh Jain >Priority: Blocker > Labels: hudi-on-call > Fix For: 0.13.1, 0.12.3 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Fix Version/s: 0.12.3 > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: Lokesh Jain >Priority: Blocker > Labels: hudi-on-call > Fix For: 0.13.1, 0.12.3 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3517: -- Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: Lokesh Jain >Priority: Blocker > Labels: hudi-on-call > Fix For: 0.13.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: Lokesh Jain >Priority: Blocker > Labels: hudi-on-call > Fix For: 0.13.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: Lokesh Jain >Priority: Blocker > Labels: hudi-on-call > Fix For: 0.13.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Epic Link: HUDI-5425 > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: sivabalan narayanan >Priority: Blocker > Labels: hudi-on-call > Fix For: 0.13.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Priority: Blocker (was: Critical) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: sivabalan narayanan >Priority: Blocker > Labels: hudi-on-call > Fix For: 0.13.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-3517: -- Sprint: 2022/12/26 > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: sivabalan narayanan >Priority: Critical > Labels: hudi-on-call > Fix For: 0.13.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaojing Yu updated HUDI-3517: -- Fix Version/s: 0.13.0 (was: 0.12.1) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: sivabalan narayanan >Priority: Critical > Labels: hudi-on-call > Fix For: 0.13.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-3517: Sprint: (was: 2022/09/19) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: sivabalan narayanan >Priority: Critical > Labels: hudi-on-call > Fix For: 0.12.1 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3517: -- Sprint: 2022/09/19 (was: 2022/09/05) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: sivabalan narayanan >Priority: Critical > Labels: hudi-on-call > Fix For: 0.12.1 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Sprint: 2022/09/05 (was: 2022/08/22) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: sivabalan narayanan >Priority: Critical > Labels: hudi-on-call > Fix For: 0.12.1 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Priority: Critical (was: Blocker) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: sivabalan narayanan >Priority: Critical > Labels: hudi-on-call > Fix For: 0.12.1 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Story Points: 3 > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: sivabalan narayanan >Priority: Blocker > Labels: hudi-on-call > Fix For: 0.12.1 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Priority: Blocker (was: Critical) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: sivabalan narayanan >Priority: Blocker > Labels: hudi-on-call > Fix For: 0.12.1 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3517: -- Priority: Critical (was: Major) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: sivabalan narayanan >Priority: Critical > Labels: hudi-on-call > Fix For: 0.12.1 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-3517: -- Fix Version/s: 0.12.1 (was: 0.12.0) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: sivabalan narayanan >Priority: Major > Labels: hudi-on-call > Fix For: 0.12.1 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Component/s: spark-sql > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql, writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: sivabalan narayanan >Priority: Major > Labels: hudi-on-call > Fix For: 0.12.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Issue Type: Improvement (was: Bug) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Improvement > Components: writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: sivabalan narayanan >Priority: Major > Labels: hudi-on-call > Fix For: 0.12.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Fix Version/s: 0.12.0 (was: 0.11.0) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: sivabalan narayanan >Priority: Major > Labels: hudi-on-call > Fix For: 0.12.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3517: -- Remaining Estimate: 2h Original Estimate: 2h > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: sivabalan narayanan >Priority: Major > Labels: hudi-on-call > Fix For: 0.11.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3517: -- Sprint: Cont' improve - 2022/03/7 > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Assignee: sivabalan narayanan >Priority: Major > Labels: hudi-on-call > Fix For: 0.11.0 > > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Fix Version/s: 0.11.0 > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Priority: Major > Labels: hudi-on-call > Fix For: 0.11.0 > > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by:
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Labels: hudi-on-call (was: ) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Priority: Major > Labels: hudi-on-call > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by:
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Component/s: writer-core > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Affects Versions: 0.10.1 >Reporter: Ji Qi >Priority: Major > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.FileNotFoundException: File >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Priority: Major (was: Minor) > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.10.1 >Reporter: Ji Qi >Priority: Major > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.FileNotFoundException: File >
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Qi updated HUDI-3517: Description: When there is unicode in the partition path, the upsert fails. h3. To reproduce # Create this dataframe in spark-shell (note the dotted I) {code:none} scala> res0.show(truncate=false) +---+---+ |_c0|_c1| +---+---+ |1 |İ | +---+---+ {code} # Write it to hudi (this write will create the hudi table and succeed) {code:none} res0.write.format("hudi").option("hoodie.table.name", "unicode_test").option("hoodie.datasource.write.precombine.field", "_c0").option("hoodie.datasource.write.recordkey.field", "_c0").option("hoodie.datasource.write.partitionpath.field", "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") {code} # Try to write {{res0}} again (this upsert will fail at index lookup stage) Environment * Hudi version: 0.10.1 * Spark version: 3.1.2 h3. Stacktrace {code:none} 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) org.apache.hudi.exception.HoodieIOException: Failed to read footer for parquet file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet at org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) at org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) at org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) at org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) at org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) at org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at scala.collection.Iterator.foreach(Iterator.scala:941) at scala.collection.Iterator.foreach$(Iterator.scala:941) at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) at scala.collection.AbstractIterator.to(Iterator.scala:1429) at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: File file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:666) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:987) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:656) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454) at
[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Qi updated HUDI-3517: Affects Version/s: 0.10.1 > Unicode in partition path causes it to be resolved wrongly > -- > > Key: HUDI-3517 > URL: https://issues.apache.org/jira/browse/HUDI-3517 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.10.1 >Reporter: Ji Qi >Priority: Minor > > When there is unicode in the partition path, the upsert fails. > h3. To reproduce > # Create this dataframe in spark-shell (note the dotted I) > {code:none} > scala> res0.show(truncate=false) > +---+---+ > |_c0|_c1| > +---+---+ > |1 |İ | > +---+---+ > {code} > # Write it to hudi (this write will create the hudi table and succeed) > {code:none} > res0.write.format("hudi").option("hoodie.table.name", > "unicode_test").option("hoodie.datasource.write.precombine.field", > "_c0").option("hoodie.datasource.write.recordkey.field", > "_c0").option("hoodie.datasource.write.partitionpath.field", > "_c1").mode("append").save("file:///Users/ji.qi/Desktop/unicode_test") > {code} > # Try to write {{res0}} again (this upsert will fail at index lookup stage) > Environment > * Hudi version: 0.10.1 > * Spark version: 3.1.2 > h3. Stacktrace > {code:none} > 22/02/25 18:23:14 INFO RemoteHoodieTableFileSystemView: Sending request : > (http://192.168.1.148:54043/v1/hoodie/view/datafile/latest/partition?partition=%C4%B0=file%3A%2FUsers%2Fji.qi%2FDesktop%2Funicode_test=31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0=20220225182311228=65c5a6a5c6836dc4f7805550e81ca034b30ad85c38794f9f8ce68a9e914aab83) > 22/02/25 18:23:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 403) > org.apache.hudi.exception.HoodieIOException: Failed to read footer for > parquet > file:/Users/ji.qi/Desktop/unicode_test/Ä°/31517a5e-af56-4fbc-9aa6-1ef1729bb89d-0_0-30-2006_20220225181656520.parquet > at > org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185) > at > org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201) > at > org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109) > at > org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49) > at > org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39) > at > org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) > at scala.collection.AbstractIterator.to(Iterator.scala:1429) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.FileNotFoundException: File >