rnatarajan commented on issue #2083: URL: https://github.com/apache/hudi/issues/2083#issuecomment-693531236
@n3nash Environment - AWS. Master Nodes - 1 m5.xlarge - 4 vCore, 16 GiB memory Core Nodes - 6 c5.xlarge - 4 vCore, 8 GiB memory Spark Submit config - --driver-memory 4G --executor-memory 5G --executor-cores 4 --num-executors 6 Hudi Config - hoodie.combine.before.upsert=false hoodie.bulkinsert.shuffle.parallelism=10 hoodie.insert.shuffle.parallelism=10 hoodie.upsert.shuffle.parallelism=10 hoodie.delete.shuffle.parallelism=1 hoodie.datasource.write.operatio=bulk_insert hoodie.bulkinsert.sort.mode=NONE hoodie.datasource.write.table.type=MERGE_ON_READ hoodie.datasource.write.partitionpath.field="" hoodie.combine.before.upsert=false The events ingested are not time based events. Events have unique id in the of type long and using the unique id as hoodie.datasource.write.recordkey.field Events have a date field and the date field is used as hoodie.datasource.write.precombine.field Events have 40 columns of types - long, int, date, timestamp, string. For Ingesting, we attempted both bulk insert and insert. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org