bwu2 edited a comment on issue #1328: Hudi upsert hangs URL: https://github.com/apache/incubator-hudi/issues/1328#issuecomment-586071719 Ok, thanks for this. I have run the jobs again. First, insert 4m records, then upsert 3m of them, then upsert 4m, then upsert 4m. The two jobs upserting 3m records work fine and quickly, but the one where upsert 4m takes >200 times as long. My results (from a synthetic dataset) are: ```bash hudi:json_data->commits show --limit 4 ╔════════════════╤═════════════════════╤═══════════════════╤═════════════════════╤══════════════════════════╤═══════════════════════╤══════════════════════════════╤══════════════╗ ║ CommitTime │ Total Bytes Written │ Total Files Added │ Total Files Updated │ Total Partitions Written │ Total Records Written │ Total Update Records Written │ Total Errors ║ ╠════════════════╪═════════════════════╪═══════════════════╪═════════════════════╪══════════════════════════╪═══════════════════════╪══════════════════════════════╪══════════════╣ ║ 20200214013937 │ 25.5 MB │ 0 │ 1 │ 1 │ 4000000 │ 3000000 │ 0 ║ ╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢ ║ 20200213224532 │ 25.5 MB │ 0 │ 1 │ 1 │ 4000000 │ 4000000 │ 0 ║ ╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢ ║ 20200213224325 │ 25.6 MB │ 0 │ 1 │ 1 │ 4000000 │ 3000000 │ 0 ║ ╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢ ║ 20200213224218 │ 25.5 MB │ 1 │ 0 │ 1 │ 4000000 │ 0 │ 0 ║ ╚════════════════╧═════════════════════╧═══════════════════╧═════════════════════╧══════════════════════════╧═══════════════════════╧══════════════════════════════╧══════════════╝ ``` and the times: ```bash grep -n -e totalCreateTime -e totalUpsertTime *.commit 20200213224218.commit:36: "totalCreateTime" : 30012, 20200213224218.commit:37: "totalUpsertTime" : 0, 20200213224325.commit:36: "totalCreateTime" : 0, 20200213224325.commit:37: "totalUpsertTime" : 46879, 20200213224532.commit:36: "totalCreateTime" : 0, 20200213224532.commit:37: "totalUpsertTime" : 10347280, 20200214013937.commit:36: "totalCreateTime" : 0, 20200214013937.commit:37: "totalUpsertTime" : 44598, ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services