Re: [I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]
ad1happy2go commented on issue #11273: URL: https://github.com/apache/hudi/issues/11273#issuecomment-2155139922 @SuneethaYamani https://hudi.apache.org/docs/configurations/#hoodiemetadataenable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]
SuneethaYamani commented on issue #11273: URL: https://github.com/apache/hudi/issues/11273#issuecomment-2144490005 @ad1happy2go can you please share the config to disable this. Temporirly I changed hoodie.metadata.compact.max.delta.commits=365 to avoid this blocker I am using below config arguments = [ "--table-type", table_type, "--op", op, "--enable-sync", "--source-ordering-field", source_ordering_field, "--source-class", "org.apache.hudi.utilities.sources.JsonDFSSource", "--target-table", table_name, "--target-base-path", hudi_target_path, "--payload-class", "org.apache.hudi.common.model.HoodieAvroPayload", "--transformer-class", "org.apache.hudi.utilities.transform.SqlQueryBasedTransformer", "--props", props, "--schemaprovider-class", "org.apache.hudi.utilities.schema.FilebasedSchemaProvider", "--hoodie-conf", "hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator", "--hoodie-conf", "hoodie.datasource.write.recordkey.field={}".format(record_key), "--hoodie-conf", "hoodie.datasource.write.partitionpath.field={}".format(partition_field), "--hoodie-conf", "hoodie.streamer.source.dfs.root={}".format(delta_streamer_source), "--hoodie-conf", "hoodie.datasource.write.precombine.field={}".format(precombine), "--hoodie-conf", "hoodie.database.name={}".format(glue_db), "--hoodie-conf", "hoodie.datasource.hive_sync.enable=true", "--hoodie-conf", "hoodie.metadata.record.index.enable=true", "--hoodie-conf", "hoodie.datasource.insert.dup.policy=true", "--hoodie-conf", "hoodie.table.cdc.enabled=true", "--hoodie-conf", "hoodie.index.type=RECORD_INDEX", "--hoodie-conf", "hoodie.datasource.hive_sync.table={}".format(table_name), "--hoodie-conf", "hoodie.datasource.hive_sync.partition_fields={}".format(partition_field), "--hoodie-conf", "hoodie.datasource.schema.avro.path={}".format(schema_path), "--hoodie-conf", "hoodie.datasource.schema.strategy=UNION", "--hoodie-conf", "hoodie.streamer.transformer.sql={}".format(sql), "--hoodie-conf", "hoodie.streamer.schemaprovider.source.schema.file={}".format(schema_path), "--hoodie-conf", "hoodie.streamer.schemaproider.target.schema.file={}".format(schema_path), "--hoodie-conf", "hoodie.metadata.compact.max.delta.commits=365" ] -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]
ad1happy2go commented on issue #11273: URL: https://github.com/apache/hudi/issues/11273#issuecomment-2142552863 @SuneethaYamani Metadata table helps you to reduce file listing api calls. You can disable in case this is only becoming the bottleneck. Although we want to understand why it's taking so long. Can you share writer configs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]
SuneethaYamani commented on issue #11273: URL: https://github.com/apache/hudi/issues/11273#issuecomment-2134603908 @ad1happy2go Yes it is for metadata -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]
ad1happy2go commented on issue #11273: URL: https://github.com/apache/hudi/issues/11273#issuecomment-2134341804 @SuneethaYamani That's not possible. Can you share the configs. One thing may be compaction what you are seeing is not for your main table, It may be for metadata table which is MOR by design Can you confirm if it's the metadata table. You can try disabling metadata table also. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]
SuneethaYamani opened a new issue, #11273: URL: https://github.com/apache/hudi/issues/11273 Hi, I am creating COW table.I want run compaction separately instead of along with my write operation.So I used hoodie.datasource.write.streaming.disable.compaction=true. Still compaction is getting triggered. Usually data write was happening in 2min when ever compaction is getting triggered jobs are staying stuck in running state, Thanks, Suneetha -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org