[GitHub] [hudi] vingov commented on a diff in pull request #5179: [HUDI-3290] Different file formats for the partition metadata file.

GitBox Sun, 03 Apr 2022 19:46:03 -0700


vingov commented on code in PR #5179:
URL: https://github.com/apache/hudi/pull/5179#discussion_r841331153



##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##########
@@ -117,30 +133,119 @@ public void trySave(int taskPartitionId) {
     }
   }
 
+  private String getMetafileExtension() {
+    // To be backwards compatible, there is no extension to the properties 
file base partition metafile
+    return format.isPresent() ? format.get().getFileExtension() : "";
+  }
+
+  /**
+   * Write the partition metadata in the correct format in the given file path.
+   *
+   * @param filePath Path of the file to write
+   * @throws IOException
+   */
+  private void writeMetafile(Path filePath) throws IOException {
+    if (format.isPresent()) {
+      Schema schema = HoodieAvroUtils.getRecordKeySchema();
+
+      switch (format.get()) {
+        case PARQUET:
+          // Since we are only interested in saving metadata to the footer, 
the schema, blocksizes and other
+          // parameters are not important.
+          MessageType type = 
Types.buildMessage().optional(PrimitiveTypeName.INT64).named("dummyint").named("dummy");
+          HoodieAvroWriteSupport writeSupport = new 
HoodieAvroWriteSupport(type, schema, Option.empty());

Review Comment:
   Hi @prashantwason, I passed the required config and tested it using the 
[docker demo env](https://hudi.apache.org/docs/docker_demo) using the delta 
streamer code, but the `.hoodie_partition_metadata` was still created as a 
plain text file, it wasn't creating the files in parquet format as expected.
   
   I used the following command to start my delta streamer job:
   `spark-submit   --packages com.google.cloud:google-cloud-bigquery:2.10.5  
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE   --table-type COPY_ON_WRITE   --source-class 
org.apache.hudi.utilities.sources.JsonKafkaSource   --source-ordering-field ts  
 --target-base-path gs://hudi-demo/stock_ticks   --target-table stock_ticks   
--props hdfs://namenode:8020/var/demo/config/kafka-source.properties   
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider 
--partition-metafile-use-data-format true --enable-sync --sync-tool-classes 
org.apache.hudi.gcp.sync.BigQuerySyncTool`
   
   Can you please take a look? thanks!
   
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] vingov commented on a diff in pull request #5179: [HUDI-3290] Different file formats for the partition metadata file.

Reply via email to