Re: [I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]
ad1happy2go commented on issue #11273: URL: https://github.com/apache/hudi/issues/11273#issuecomment-2155139922 @SuneethaYamani https://hudi.apache.org/docs/configurations/#hoodiemetadataenable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]
SuneethaYamani commented on issue #11273: URL: https://github.com/apache/hudi/issues/11273#issuecomment-2144490005 @ad1happy2go can you please share the config to disable this. Temporirly I changed hoodie.metadata.compact.max.delta.commits=365 to avoid this blocker I am using below config arguments = [ "--table-type", table_type, "--op", op, "--enable-sync", "--source-ordering-field", source_ordering_field, "--source-class", "org.apache.hudi.utilities.sources.JsonDFSSource", "--target-table", table_name, "--target-base-path", hudi_target_path, "--payload-class", "org.apache.hudi.common.model.HoodieAvroPayload", "--transformer-class", "org.apache.hudi.utilities.transform.SqlQueryBasedTransformer", "--props", props, "--schemaprovider-class", "org.apache.hudi.utilities.schema.FilebasedSchemaProvider", "--hoodie-conf", "hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator", "--hoodie-conf", "hoodie.datasource.write.recordkey.field={}".format(record_key), "--hoodie-conf", "hoodie.datasource.write.partitionpath.field={}".format(partition_field), "--hoodie-conf", "hoodie.streamer.source.dfs.root={}".format(delta_streamer_source), "--hoodie-conf", "hoodie.datasource.write.precombine.field={}".format(precombine), "--hoodie-conf", "hoodie.database.name={}".format(glue_db), "--hoodie-conf", "hoodie.datasource.hive_sync.enable=true", "--hoodie-conf", "hoodie.metadata.record.index.enable=true", "--hoodie-conf", "hoodie.datasource.insert.dup.policy=true", "--hoodie-conf", "hoodie.table.cdc.enabled=true", "--hoodie-conf", "hoodie.index.type=RECORD_INDEX", "--hoodie-conf", "hoodie.datasource.hive_sync.table={}".format(table_name), "--hoodie-conf", "hoodie.datasource.hive_sync.partition_fields={}".format(partition_field), "--hoodie-conf", "hoodie.datasource.schema.avro.path={}".format(schema_path), "--hoodie-conf", "hoodie.datasource.schema.strategy=UNION", "--hoodie-conf", "hoodie.streamer.transformer.sql={}".format(sql), "--hoodie-conf", "hoodie.streamer.schemaprovider.source.schema.file={}".format(schema_path), "--hoodie-conf", "hoodie.streamer.schemaproider.target.schema.file={}".format(schema_path), "--hoodie-conf", "hoodie.metadata.compact.max.delta.commits=365" ] -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]
ad1happy2go commented on issue #11273: URL: https://github.com/apache/hudi/issues/11273#issuecomment-2142552863 @SuneethaYamani Metadata table helps you to reduce file listing api calls. You can disable in case this is only becoming the bottleneck. Although we want to understand why it's taking so long. Can you share writer configs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]
SuneethaYamani commented on issue #11273: URL: https://github.com/apache/hudi/issues/11273#issuecomment-2134603908 @ad1happy2go Yes it is for metadata -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]
ad1happy2go commented on issue #11273: URL: https://github.com/apache/hudi/issues/11273#issuecomment-2134341804 @SuneethaYamani That's not possible. Can you share the configs. One thing may be compaction what you are seeing is not for your main table, It may be for metadata table which is MOR by design Can you confirm if it's the metadata table. You can try disabling metadata table also. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org