hudi-bot opened a new issue, #14500: URL: https://github.com/apache/hudi/issues/14500
As a fallout of [PR 956|https://github.com/apache/incubator-hudi/pull/956] we would like to understand how Avro behaves with case sensitive column names. Couple of action items: * Test with different field names just differing in case. * *AbstractRealtimeRecordReader* is one of the classes where we are converting Avro Schema field names to lower case, to be able to verify them against column names from Hive. We can consider removing the *lowercase* conversion there if we verify it does not break anything. ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-303 - Type: Test --- ## Comments 23/May/20 22:07;shivnarayan;[~guoyihua]: this ticket is also related to case sensitivity. If you plan to take the other ticket, this should be on similar lines. ;;; --- 19/Oct/20 13:42;309637554;i do not think this should fix. because hive meta column is case insensitive. if do not *lowercase will not match the hive meta schema with avro schema. just like : hive_metastoreConstants.META_TABLE_COLUMNS will be case insensitive.* Map<String, Field> schemaFieldsMap = HoodieRealtimeRecordReaderUtils.getNameToFieldMap(writerSchema); hiveSchema = constructHiveOrderedSchema(writerSchema, schemaFieldsMap); // Get all column names of hive table String hiveColumnString = jobConf.get(hive_metastoreConstants.META_TABLE_COLUMNS); LOG.info("Hive Columns : " + hiveColumnString); String[] hiveColumns = hiveColumnString.split(","); LOG.info("Hive Columns : " + hiveColumnString); List<Field> hiveSchemaFields = new ArrayList<>(); for (String columnName : hiveColumns) { Field field = schemaFieldsMap.get(columnName.toLowerCase()); if (field != null) { hiveSchemaFields.add(new Schema.Field(field.name(), field.schema(), field.doc(), field.defaultVal())); } else { // Hive has some extra virtual columns like BLOCK__OFFSET__INSIDE__FILE which do not exist in table schema. // They will get skipped as they won't be found in the original schema. LOG.debug("Skipping Hive Column => " + columnName); } };;; --- 19/Oct/20 13:45;309637554;[~uditme] , [~vinoth] what do you think about this :D**;;; --- 19/Oct/20 23:58;vinoth;[~309637554] this task is about exploring all possibilities and making a call. IIUC you are making the case for retaining the lower casing. I think what you point out is why we lower cased this. I can't decide for myself until we paint the full picture. :) ;;; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
