[jira] [Work logged] (HIVE-24539) OrcInputFormat schema generation should respect column delimiter

ASF GitHub Bot (Jira) Wed, 16 Dec 2020 03:40:06 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-24539?focusedWorklogId=524977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524977
 ]


ASF GitHub Bot logged work on HIVE-24539:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Dec/20 11:39
            Start Date: 16/Dec/20 11:39
    Worklog Time Spent: 10m 
      Work Description: abstractdog commented on a change in pull request #1783:
URL: https://github.com/apache/hive/pull/1783#discussion_r544085969



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
##########
@@ -2675,12 +2676,13 @@ public static TypeDescription convertTypeInfo(TypeInfo 
info) {
   public static TypeDescription getDesiredRowTypeDescr(Configuration conf,
                                                        boolean isAcidRead,
                                                        int dataColumns) {
-
     String columnNameProperty = null;
     String columnTypeProperty = null;
 
     ArrayList<String> schemaEvolutionColumnNames = null;
     ArrayList<TypeDescription> schemaEvolutionTypeDescrs = null;
+    // Make sure we split colNames using the right Delimiter
+    final String columnNameDelimiter = 
conf.get(serdeConstants.COLUMN_NAME_DELIMITER, 
String.valueOf(SerDeUtils.COMMA));

Review comment:
       this makes me think that in order to use commas in column names, you 
need to define another column name delimiter, otherwise those are cannot be 
distinguished from each other, right?
   I mean, could you please include an example where the configuration can be 
used for this purpose? I haven't seen that in q test, what's the valid use-case 
for multiple columns? what happens if you try to do something like:
   ```
   create table test_n4 (`x,y` int, z int);
   select `x,y`, z from test_n4 where `x,y` >= 2 and z = 0;
   ```
   other than this, the patch is simple and neat, which I like :)
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 524977)
    Time Spent: 0.5h  (was: 20m)

> OrcInputFormat schema generation should respect column delimiter
> ----------------------------------------------------------------
>
>                 Key: HIVE-24539
>                 URL: https://issues.apache.org/jira/browse/HIVE-24539
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Panagiotis Garefalakis
>            Assignee: Panagiotis Garefalakis
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> OrcInputFormat currently generates schema using the given configuration and 
> the default delimiter – that causes inconsistencies when names contain commas.
> We should follow a similar approach to 
> [OrcOutputFormat|https://github.com/apache/hive/blob/9563dd63188280f4b7c307f36e1ffffea0c69aec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java#L145]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24539) OrcInputFormat schema generation should respect column delimiter

Reply via email to