[ https://issues.apache.org/jira/browse/HUDI-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan Guo updated HUDI-4071: ---------------------------- Fix Version/s: 0.13.0 > Better Spark Datasource default configs > --------------------------------------- > > Key: HUDI-4071 > URL: https://issues.apache.org/jira/browse/HUDI-4071 > Project: Apache Hudi > Issue Type: Task > Reporter: Sagar Sumit > Assignee: Sagar Sumit > Priority: Major > Labels: pull-request-available > Fix For: 0.12.0, 0.13.0 > > > Default configs should be: > # Optimized for insert/bulk_insert e.g. by default if we have NONE sort mode > then it's as good as parquet writes with some additional work for meta > columns. An extension of this is to keep a map of minimal optimized configs > per operation type. This is partly related to better performant configs > HUDI-2151 > # Make reasonable assumptions, e.g. for index type, bloom filter does not > rely on any external system, so it can be a better default candidate than > let's say HBase index. > # Scout all configs with noDefaultValue and assign a default if necessary. > # Keep spark-sql and spark datasource config keys same as much as possible, > otherwise it's difficult operationally for the user. Rename/reuse existing > datasource keys that are meant for same purpose. This is related to HUDI-4070 > as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)