hudi-bot opened a new issue, #14500:
URL: https://github.com/apache/hudi/issues/14500

   As a fallout of [PR 956|https://github.com/apache/incubator-hudi/pull/956] 
we would like to understand how Avro behaves with case sensitive column names.
   
   Couple of action items:
    * Test with different field names just differing in case.
    * *AbstractRealtimeRecordReader* is one of the classes where we are 
converting Avro Schema field names to lower case, to be able to verify them 
against column names from Hive. We can consider removing the *lowercase* 
conversion there if we verify it does not break anything.
   
    
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-303
   - Type: Test
   
   
   ---
   
   
   ## Comments
   
   23/May/20 22:07;shivnarayan;[~guoyihua]: this ticket is also related to case 
sensitivity. If you plan to take the other ticket, this should be on similar 
lines. ;;;
   
   ---
   
   19/Oct/20 13:42;309637554;i do not think this should fix. because hive meta 
column is case insensitive. if do not *lowercase  will not match the hive meta 
schema with avro schema. just like :  
hive_metastoreConstants.META_TABLE_COLUMNS will be case insensitive.* 
   
   Map<String, Field> schemaFieldsMap = 
HoodieRealtimeRecordReaderUtils.getNameToFieldMap(writerSchema);
   hiveSchema = constructHiveOrderedSchema(writerSchema, schemaFieldsMap);
   
   // Get all column names of hive table
   String hiveColumnString = 
jobConf.get(hive_metastoreConstants.META_TABLE_COLUMNS);
   LOG.info("Hive Columns : " + hiveColumnString);
   String[] hiveColumns = hiveColumnString.split(",");
   LOG.info("Hive Columns : " + hiveColumnString);
   List<Field> hiveSchemaFields = new ArrayList<>();
   
   for (String columnName : hiveColumns) {
    Field field = schemaFieldsMap.get(columnName.toLowerCase());
   
    if (field != null) {
    hiveSchemaFields.add(new Schema.Field(field.name(), field.schema(), 
field.doc(), field.defaultVal()));
    } else {
    // Hive has some extra virtual columns like BLOCK__OFFSET__INSIDE__FILE 
which do not exist in table schema.
    // They will get skipped as they won't be found in the original schema.
    LOG.debug("Skipping Hive Column => " + columnName);
    }
   };;;
   
   ---
   
   19/Oct/20 13:45;309637554;[~uditme]    , [~vinoth]   what do you think about 
this  :D**;;;
   
   ---
   
   19/Oct/20 23:58;vinoth;[~309637554] this task is about exploring all 
possibilities and making a call.  IIUC you are making the case for retaining 
the lower casing. I think what you point out is why we lower cased this. 
   
   I can't decide for myself until we paint the full picture. :) ;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to