[ https://issues.apache.org/jira/browse/HUDI-327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Scheller updated HUDI-327: ---------------------------------- Summary: Introduce "null" supporting KeyGenerator (was: Introduce "null" supporting ComplexKeyGenerator) > Introduce "null" supporting KeyGenerator > ---------------------------------------- > > Key: HUDI-327 > URL: https://issues.apache.org/jira/browse/HUDI-327 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Reporter: Brandon Scheller > Priority: Major > > Customers have been running into issues where they would like to use a > record_key from columns that can contain null values. Currently, this will > cause Hudi to crash and throw a cryptic exception.(improving error messaging > is a separate but related issue) > We would like to propose a new KeyGenerator based on ComplexKeyGenerator that > allows for null record_keys. > At a basic level, using the key generator without any options would > essentially allow a null record_key to be accepted. (It can be replaced with > an empty string, null, or some predefined "null" string representation) > This comes with the negative side effect that all records with a null > record_key would then be associated together. To work around this, you would > be able to specify a secondary record_key to be used in the case that the > first one is null. You would specify this in the same way that you do for the > ComplexKeyGenerator as a comma separated list of record_keys. In this case, > when the first key is seen as null then the second key will be used instead. > We could support any arbitrary limit of record_keys here. > While we are aware there are many alternatives to avoid using a null > record_key. We believe this will act as a usability improvement so that new > users are not forced to clean/update their data in order to use Hudi. > We are hoping to get some feedback on the idea > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)