lamber-ken edited a comment on issue #1105: [WIP] [HUDI-405] Fix sync no hive partition at first time URL: https://github.com/apache/incubator-hudi/pull/1105#issuecomment-569981982 > > So, modify assumeDatePartitioning to ! assumeDatePartitioning is the best way to fix this issue. > > This is not going to fix this issue.. and we should not be changing this code.. It will cause side effects, like I mentioned before. > > Can we reproduce this issue in a unit test or sample code first? hi, follow bellow steps can reproduce this issuse 1, Define the PartitionValueExtractor ``` package org.apache.hudi.hive; import org.joda.time.DateTime; import org.joda.time.format.DateTimeFormat; import org.joda.time.format.DateTimeFormatter; import java.util.Collections; import java.util.List; public class DayPartitionValueExtractor implements PartitionValueExtractor { private transient DateTimeFormatter dtfOut; public DayPartitionValueExtractor() { this.dtfOut = DateTimeFormat.forPattern("yyyy-MM-dd"); } private DateTimeFormatter getDtfOut() { if (dtfOut == null) { dtfOut = DateTimeFormat.forPattern("yyyy-MM-dd"); } return dtfOut; } @Override public List<String> extractPartitionValuesInPath(String partitionPath) { String[] splits = partitionPath.split("-"); if (splits.length != 3) { throw new IllegalArgumentException( "Partition path " + partitionPath + " is not in the form yyyy-mm-dd "); } int year = Integer.parseInt(splits[0]); int mm = Integer.parseInt(splits[1]); int dd = Integer.parseInt(splits[2]); DateTime dateTime = new DateTime(year, mm, dd, 0, 0); return Collections.singletonList(getDtfOut().print(dateTime)); } } ``` 2, Write data by spark-shell ``` export SPARK_HOME=/work/BigData/install/spark/spark-2.3.3-bin-hadoop2.6 $${SPARK_HOME}/bin/spark-shell --packages org.apache.hudi:hudi-spark-bundle:0.5.0-incubating --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' import org.apache.spark.sql.SaveMode val basePath = "/tmp/hoodie_test" var datas = List("""{ "key": "uuid", "event_time": 1574297893836, "part_date": "2019-11-12"}""") val df = spark.read.json(spark.sparkContext.parallelize(datas, 2)) df.write.format("org.apache.hudi"). option("hoodie.insert.shuffle.parallelism", "10"). option("hoodie.upsert.shuffle.parallelism", "10"). option("hoodie.delete.shuffle.parallelism", "10"). option("hoodie.bulkinsert.shuffle.parallelism", "10"). option("hoodie.datasource.hive_sync.enable", true). option("hoodie.datasource.hive_sync.jdbcurl", "jdbc:hive2://0.0.0.0:12326"). option("hoodie.datasource.hive_sync.username", "dcadmin"). option("hoodie.datasource.hive_sync.password", "dcadmin"). option("hoodie.datasource.hive_sync.database", "default"). option("hoodie.datasource.hive_sync.table", "hoodie_test"). option("hoodie.datasource.hive_sync.partition_fields", "part_date"). option("hoodie.datasource.hive_sync.assume_date_partitioning", true). option("hoodie.datasource.hive_sync.partition_extractor_class", "org.apache.hudi.hive.DayPartitionValueExtractor"). option("hoodie.datasource.write.precombine.field", "event_time"). option("hoodie.datasource.write.recordkey.field", "key"). option("hoodie.datasource.write.partitionpath.field", "part_date"). option("hoodie.table.name", "hoodie_test"). mode(SaveMode.Overwrite). save(basePath); ``` 3, Query data from hive ``` no data ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services