Re: [I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]

2024-06-07 Thread via GitHub


ad1happy2go commented on issue #11273:
URL: https://github.com/apache/hudi/issues/11273#issuecomment-2155139922

   @SuneethaYamani 
https://hudi.apache.org/docs/configurations/#hoodiemetadataenable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]

2024-06-03 Thread via GitHub


SuneethaYamani commented on issue #11273:
URL: https://github.com/apache/hudi/issues/11273#issuecomment-2144490005

   @ad1happy2go can you please share the config to disable this.
   Temporirly I changed hoodie.metadata.compact.max.delta.commits=365 to avoid 
this blocker
   
   I am using below config
   arguments = [
   "--table-type", table_type,
   "--op", op,
   "--enable-sync",
   "--source-ordering-field", source_ordering_field,
   "--source-class", "org.apache.hudi.utilities.sources.JsonDFSSource",
   "--target-table", table_name,
   "--target-base-path", hudi_target_path,
   "--payload-class", "org.apache.hudi.common.model.HoodieAvroPayload",
   "--transformer-class", 
"org.apache.hudi.utilities.transform.SqlQueryBasedTransformer",
   "--props", props,
   "--schemaprovider-class", 
"org.apache.hudi.utilities.schema.FilebasedSchemaProvider",
   "--hoodie-conf", 
"hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator",
   "--hoodie-conf", 
"hoodie.datasource.write.recordkey.field={}".format(record_key),
   "--hoodie-conf", 
"hoodie.datasource.write.partitionpath.field={}".format(partition_field),
   "--hoodie-conf", 
"hoodie.streamer.source.dfs.root={}".format(delta_streamer_source),
   "--hoodie-conf", 
"hoodie.datasource.write.precombine.field={}".format(precombine),
   "--hoodie-conf", "hoodie.database.name={}".format(glue_db),
   "--hoodie-conf", "hoodie.datasource.hive_sync.enable=true",
   "--hoodie-conf", "hoodie.metadata.record.index.enable=true",
   "--hoodie-conf", "hoodie.datasource.insert.dup.policy=true",
   "--hoodie-conf", "hoodie.table.cdc.enabled=true",
   "--hoodie-conf", "hoodie.index.type=RECORD_INDEX", 
   "--hoodie-conf", 
"hoodie.datasource.hive_sync.table={}".format(table_name),
   "--hoodie-conf", 
"hoodie.datasource.hive_sync.partition_fields={}".format(partition_field),
   "--hoodie-conf", 
"hoodie.datasource.schema.avro.path={}".format(schema_path),
   "--hoodie-conf", "hoodie.datasource.schema.strategy=UNION",
   "--hoodie-conf", "hoodie.streamer.transformer.sql={}".format(sql),
   "--hoodie-conf", 
"hoodie.streamer.schemaprovider.source.schema.file={}".format(schema_path),
   "--hoodie-conf", 
"hoodie.streamer.schemaproider.target.schema.file={}".format(schema_path),
   "--hoodie-conf", "hoodie.metadata.compact.max.delta.commits=365"
   ]
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]

2024-05-31 Thread via GitHub


ad1happy2go commented on issue #11273:
URL: https://github.com/apache/hudi/issues/11273#issuecomment-2142552863

   @SuneethaYamani Metadata table helps you to reduce file listing api calls. 
You can disable in case this is only becoming the bottleneck.
   
   Although we want to understand why it's taking so long. Can you share writer 
configs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]

2024-05-28 Thread via GitHub


SuneethaYamani commented on issue #11273:
URL: https://github.com/apache/hudi/issues/11273#issuecomment-2134603908

   @ad1happy2go  Yes it is for metadata


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]

2024-05-27 Thread via GitHub


ad1happy2go commented on issue #11273:
URL: https://github.com/apache/hudi/issues/11273#issuecomment-2134341804

   @SuneethaYamani That's not possible. Can you share the configs. One thing 
may be compaction what you are seeing is not for your main table, It may be for 
metadata table which is MOR by design
   Can you confirm if it's the metadata table. You can try disabling metadata 
table also.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org