[jira] [Work logged] (HIVE-24539) OrcInputFormat schema generation should respect column delimiter

ASF GitHub Bot (Jira) Wed, 16 Dec 2020 03:49:14 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-24539?focusedWorklogId=524986&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524986
 ]


ASF GitHub Bot logged work on HIVE-24539:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Dec/20 11:48
            Start Date: 16/Dec/20 11:48
    Worklog Time Spent: 10m 
      Work Description: pgaref commented on a change in pull request #1783:
URL: https://github.com/apache/hive/pull/1783#discussion_r544233488



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
##########
@@ -2675,12 +2676,13 @@ public static TypeDescription convertTypeInfo(TypeInfo 
info) {
   public static TypeDescription getDesiredRowTypeDescr(Configuration conf,
                                                        boolean isAcidRead,
                                                        int dataColumns) {
-
     String columnNameProperty = null;
     String columnTypeProperty = null;
 
     ArrayList<String> schemaEvolutionColumnNames = null;
     ArrayList<TypeDescription> schemaEvolutionTypeDescrs = null;
+    // Make sure we split colNames using the right Delimiter
+    final String columnNameDelimiter = 
conf.get(serdeConstants.COLUMN_NAME_DELIMITER, 
String.valueOf(SerDeUtils.COMMA));

Review comment:
       Hey @abstractdog  thanks for taking a look!
   > this makes me think that in order to use commas in column names, you need 
to define another column name delimiter, otherwise those are cannot be 
distinguished from each other, right?
   
   We already check this corner case when creating the **TableDesc** 
https://github.com/apache/hive/blob/95f3d6512f35839f2fad3cfd608616534e506a4b/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java#L562
   
   
https://github.com/pgaref/hive/blob/0d2b39ac180d809788f662fbe3271482cd4d909d/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L1562
   **getColumnNameDelimiter** method actually checks for commas in colNames and 
uses `\0` to split them instead.
   
   This PR is just making use of the custom delimiter that was forgotten for 
OrcInputFormat but is done for others e.g., OrcOutputFormat
   
   In your example above the DELIMITER will be `\0` to avoid colName splitting 
issues
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 524986)
    Time Spent: 1h 10m  (was: 1h)

> OrcInputFormat schema generation should respect column delimiter
> ----------------------------------------------------------------
>
>                 Key: HIVE-24539
>                 URL: https://issues.apache.org/jira/browse/HIVE-24539
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Panagiotis Garefalakis
>            Assignee: Panagiotis Garefalakis
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> OrcInputFormat currently generates schema using the given configuration and 
> the default delimiter – that causes inconsistencies when names contain commas.
> We should follow a similar approach to 
> [OrcOutputFormat|https://github.com/apache/hive/blob/9563dd63188280f4b7c307f36e1ffffea0c69aec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java#L145]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24539) OrcInputFormat schema generation should respect column delimiter

Reply via email to