[ https://issues.apache.org/jira/browse/HUDI-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu updated HUDI-2857: ----------------------------- Component/s: (was: Spark Integration) > HoodieTableMetaClient.TEMPFOLDER_NAME causes IllegalArgumentException in > windows environment > -------------------------------------------------------------------------------------------- > > Key: HUDI-2857 > URL: https://issues.apache.org/jira/browse/HUDI-2857 > Project: Apache Hudi > Issue Type: Bug > Affects Versions: 0.9.0 > Environment: win10 spark2.4.4 hudi 0.9.0 > Reporter: 王范明 > Priority: Major > Labels: easyfix > Original Estimate: 12h > Remaining Estimate: 12h > > {code:java} > val tableName = "cow_prices" > val basePath = "hdfs://xxxxx:9000//tmp//cow_prices//" > val dataGen = new DataGenerator > // spark-shell > val inserts = convertToStringList(dataGen.generateInserts(10)) > val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2)) > df.write.format("hudi"). > options(getQuickstartWriteConfigs). > option(PRECOMBINE_FIELD.key(), "ts"). > option(RECORDKEY_FIELD.key(), "uuid"). > option(PARTITIONPATH_FIELD.key(), "partitionpath"). > option(TBL_NAME.key(), tableName). > mode(Overwrite). > save(basePath) {code} > The above is the sample code provided by Hudi's official website. I plan to > run the Spark program directly on the win10 environment and store the data on > the remote HDFS.The following exception occurred: > {code:java} > Caused by: java.lang.IllegalArgumentException: Not in marker dir. Marker > Path=hdfs://10.38.23.2:9000/tmp/cow_prices/.hoodie\.temp/20211125163531/asia/india/chennai/c9218a3b-f248-436b-b41f-4a0b968dfff2-0_2-27-29_20211125163531.parquet.marker.CREATE, > Expected Marker Root=/tmp/cow_prices/.hoodie/.temp/20211125163531 > at > org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40) > at > org.apache.hudi.common.util.MarkerUtils.stripMarkerFolderPrefix(MarkerUtils.java:87) > at > org.apache.hudi.common.util.MarkerUtils.stripMarkerFolderPrefix(MarkerUtils.java:75) > at > org.apache.hudi.table.marker.DirectWriteMarkers.translateMarkerToDataPath(DirectWriteMarkers.java:153) > at > org.apache.hudi.table.marker.DirectWriteMarkers.lambda$createdAndMergedDataPaths$69cdea3b$1(DirectWriteMarkers.java:142) > at > org.apache.hudi.client.common.HoodieSparkEngineContext.lambda$flatMap$7d470b86$1(HoodieSparkEngineContext.java:78) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) > at scala.collection.AbstractIterator.to(Iterator.scala:1334) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1334) > at > org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) > at > org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:123) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > After investigation, it was found that the root cause of the abnormality was > that > {code:java} > HoodieTableMetaClient.TEMPFOLDER_NAME {code} > was constructed incorrectly during initialization. > > {code:java} > public static final String TEMPFOLDER_NAME = METAFOLDER_NAME + File.separator > + ".temp"; {code} > File.separator is {color:#ff0000}"\"{color} in the windows environment -- This message was sent by Atlassian Jira (v8.20.1#820001)