[ https://issues.apache.org/jira/browse/HIVE-24539?focusedWorklogId=524986&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524986 ]
ASF GitHub Bot logged work on HIVE-24539: ----------------------------------------- Author: ASF GitHub Bot Created on: 16/Dec/20 11:48 Start Date: 16/Dec/20 11:48 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1783: URL: https://github.com/apache/hive/pull/1783#discussion_r544233488 ########## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java ########## @@ -2675,12 +2676,13 @@ public static TypeDescription convertTypeInfo(TypeInfo info) { public static TypeDescription getDesiredRowTypeDescr(Configuration conf, boolean isAcidRead, int dataColumns) { - String columnNameProperty = null; String columnTypeProperty = null; ArrayList<String> schemaEvolutionColumnNames = null; ArrayList<TypeDescription> schemaEvolutionTypeDescrs = null; + // Make sure we split colNames using the right Delimiter + final String columnNameDelimiter = conf.get(serdeConstants.COLUMN_NAME_DELIMITER, String.valueOf(SerDeUtils.COMMA)); Review comment: Hey @abstractdog thanks for taking a look! > this makes me think that in order to use commas in column names, you need to define another column name delimiter, otherwise those are cannot be distinguished from each other, right? We already check this corner case when creating the **TableDesc** https://github.com/apache/hive/blob/95f3d6512f35839f2fad3cfd608616534e506a4b/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java#L562 https://github.com/pgaref/hive/blob/0d2b39ac180d809788f662fbe3271482cd4d909d/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L1562 **getColumnNameDelimiter** method actually checks for commas in colNames and uses `\0` to split them instead. This PR is just making use of the custom delimiter that was forgotten for OrcInputFormat but is done for others e.g., OrcOutputFormat In your example above the DELIMITER will be `\0` to avoid colName splitting issues ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 524986) Time Spent: 1h 10m (was: 1h) > OrcInputFormat schema generation should respect column delimiter > ---------------------------------------------------------------- > > Key: HIVE-24539 > URL: https://issues.apache.org/jira/browse/HIVE-24539 > Project: Hive > Issue Type: Bug > Reporter: Panagiotis Garefalakis > Assignee: Panagiotis Garefalakis > Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > OrcInputFormat currently generates schema using the given configuration and > the default delimiter – that causes inconsistencies when names contain commas. > We should follow a similar approach to > [OrcOutputFormat|https://github.com/apache/hive/blob/9563dd63188280f4b7c307f36e1ffffea0c69aec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java#L145] -- This message was sent by Atlassian Jira (v8.3.4#803005)