hudi-bot opened a new issue, #16971:
URL: https://github.com/apache/hudi/issues/16971
User is running the same Hudi upsert application and same hudi
configurations on Hudi versions 0.9.0 and 0.14.0 for performance benchmarking.
The results show that there is about a 2x performance regression on the job
{code:java}
Doing partition and writing data{code}
with ~2.5 mins on 0.9.0 and ~5 mins on 0.14.0.
Is this a known issue on the performance regression and what is the cause of
this regression?
Hudi config
{code:java}
upsert_hudi_config = {
"hoodie.table.name": "[table_name]",
"hoodie.database.name": "[database_name]",
"hoodie.datasource.write.keygenerator.class":
"org.apache.hudi.keygen.NonpartitionedKeyGenerator",
"hoodie.datasource.write.operation": "upsert",
"hoodie.datasource.write.precombine.field": "[precombine_key]",
"hoodie.datasource.write.recordkey.field": "[record_key]",
"hoodie.datasource.write.table.name": "[table_name]",
"hoodie.index.type": "BLOOM",
"hoodie.metadata.enable": False,
"hoodie.upsert.shuffle.parallelism": 3,
}{code}
Data Characteristics
{code:java}
Table size: ~5GB uncompressed parquet data
Column count: 310 columns
High NULL density:
- Average NULLs per row: 217.74
- Min NULLs per row: 185
- Max NULLs per row: 230{code}
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-9313
- Type: Bug
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]