[jira] [Work logged] (HIVE-22622) Hive allows to create a struct with duplicate attribute names

2020-09-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22622?focusedWorklogId=477493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-477493
 ]

ASF GitHub Bot logged work on HIVE-22622:
-

Author: ASF GitHub Bot
Created on: 01/Sep/20 22:26
Start Date: 01/Sep/20 22:26
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #1446:
URL: https://github.com/apache/hive/pull/1446


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 477493)
Time Spent: 50m  (was: 40m)

> Hive allows to create a struct with duplicate attribute names
> -
>
> Key: HIVE-22622
> URL: https://issues.apache.org/jira/browse/HIVE-22622
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When you create at table with a struct with twice the same attribute name, 
> hive allow you to create it.
> create table test_struct( duplicateColumn struct);
> You can insert data into it :
> insert into test_struct select named_struct("id",1,"id",1);
> But you can not read it :
> select * from test_struct;
> Return : java.io.IOException: java.io.IOException: Error reading file: 
> hdfs://.../test_struct/delta_001_001_/bucket_0 ,
> We can create and insert. but fail on read the Struct part of the tables. We 
> can still read all other columns (if we have more than one) but not the 
> struct anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22622) Hive allows to create a struct with duplicate attribute names

2020-09-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22622?focusedWorklogId=477026=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-477026
 ]

ASF GitHub Bot logged work on HIVE-22622:
-

Author: ASF GitHub Bot
Created on: 01/Sep/20 07:13
Start Date: 01/Sep/20 07:13
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on pull request #1446:
URL: https://github.com/apache/hive/pull/1446#issuecomment-684505690


   Adding a test case for various underlying formats doesn't make sense this 
case because the duplicate check is performed in the semantical analysis phase 
of the `create table` statement and the error message would be the same. So if 
a duplicate found the code flow doesn't reach the point where the table is 
actually created.
   
   Added a test case for nested struct.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 477026)
Time Spent: 40m  (was: 0.5h)

> Hive allows to create a struct with duplicate attribute names
> -
>
> Key: HIVE-22622
> URL: https://issues.apache.org/jira/browse/HIVE-22622
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When you create at table with a struct with twice the same attribute name, 
> hive allow you to create it.
> create table test_struct( duplicateColumn struct);
> You can insert data into it :
> insert into test_struct select named_struct("id",1,"id",1);
> But you can not read it :
> select * from test_struct;
> Return : java.io.IOException: java.io.IOException: Error reading file: 
> hdfs://.../test_struct/delta_001_001_/bucket_0 ,
> We can create and insert. but fail on read the Struct part of the tables. We 
> can still read all other columns (if we have more than one) but not the 
> struct anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22622) Hive allows to create a struct with duplicate attribute names

2020-09-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22622?focusedWorklogId=477019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-477019
 ]

ASF GitHub Bot logged work on HIVE-22622:
-

Author: ASF GitHub Bot
Created on: 01/Sep/20 07:03
Start Date: 01/Sep/20 07:03
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1446:
URL: https://github.com/apache/hive/pull/1446#discussion_r480897671



##
File path: common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
##
@@ -471,6 +471,7 @@
   "Not an ordered-set aggregate function: {0}. WITHIN GROUP clause is 
not allowed.", true),
   WITHIN_GROUP_PARAMETER_MISMATCH(10422,
   "The number of hypothetical direct arguments ({0}) must match the 
number of ordering columns ({1})", true),
+  AMBIGUOUS_STRUCT_FIELD(10423, "Struct field is not unique: {0}", true),

Review comment:
   Renamed `field` to `attribute`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 477019)
Time Spent: 0.5h  (was: 20m)

> Hive allows to create a struct with duplicate attribute names
> -
>
> Key: HIVE-22622
> URL: https://issues.apache.org/jira/browse/HIVE-22622
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When you create at table with a struct with twice the same attribute name, 
> hive allow you to create it.
> create table test_struct( duplicateColumn struct);
> You can insert data into it :
> insert into test_struct select named_struct("id",1,"id",1);
> But you can not read it :
> select * from test_struct;
> Return : java.io.IOException: java.io.IOException: Error reading file: 
> hdfs://.../test_struct/delta_001_001_/bucket_0 ,
> We can create and insert. but fail on read the Struct part of the tables. We 
> can still read all other columns (if we have more than one) but not the 
> struct anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22622) Hive allows to create a struct with duplicate attribute names

2020-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22622?focusedWorklogId=476848=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-476848
 ]

ASF GitHub Bot logged work on HIVE-22622:
-

Author: ASF GitHub Bot
Created on: 31/Aug/20 21:28
Start Date: 31/Aug/20 21:28
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #1446:
URL: https://github.com/apache/hive/pull/1446#discussion_r480409116



##
File path: common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
##
@@ -471,6 +471,7 @@
   "Not an ordered-set aggregate function: {0}. WITHIN GROUP clause is 
not allowed.", true),
   WITHIN_GROUP_PARAMETER_MISMATCH(10422,
   "The number of hypothetical direct arguments ({0}) must match the 
number of ordering columns ({1})", true),
+  AMBIGUOUS_STRUCT_FIELD(10423, "Struct field is not unique: {0}", true),

Review comment:
   nit: Usually we use "field" for row types and "attribute" for struct 
types. Plus it reads more natural if we inline the attribute name in the 
sentence: 
   
   `Attribute \"{0}\" specified more than once in structured type.` 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 476848)
Time Spent: 20m  (was: 10m)

> Hive allows to create a struct with duplicate attribute names
> -
>
> Key: HIVE-22622
> URL: https://issues.apache.org/jira/browse/HIVE-22622
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When you create at table with a struct with twice the same attribute name, 
> hive allow you to create it.
> create table test_struct( duplicateColumn struct);
> You can insert data into it :
> insert into test_struct select named_struct("id",1,"id",1);
> But you can not read it :
> select * from test_struct;
> Return : java.io.IOException: java.io.IOException: Error reading file: 
> hdfs://.../test_struct/delta_001_001_/bucket_0 ,
> We can create and insert. but fail on read the Struct part of the tables. We 
> can still read all other columns (if we have more than one) but not the 
> struct anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22622) Hive allows to create a struct with duplicate attribute names

2020-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22622?focusedWorklogId=476568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-476568
 ]

ASF GitHub Bot logged work on HIVE-22622:
-

Author: ASF GitHub Bot
Created on: 31/Aug/20 13:02
Start Date: 31/Aug/20 13:02
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #1446:
URL: https://github.com/apache/hive/pull/1446


   ### What changes were proposed in this pull request?
   Add a check for duplicated struct field identifiers and throw 
SemanticException with customized error message when found.
   
   ### Why are the changes needed?
   Creating a table with a struct type column with duplicate field identifier 
and inserting records is allowed but later when querying from the table we 
cannot distinguish between the attributes of the struct has the same identifier.
   In some cases (depending on table serde format) the query may fails. See 
jira for details.
   
   ### Does this PR introduce _any_ user-facing change?
   Introduce new error code and message. Example:
   ```
   FAILED: SemanticException [Error 10423]: Struct field is not unique: id
   ```
   
   ### How was this patch tested?
   1. Create new negative test:
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestNegativeCliDriver -Dqfile=struct_field_uniqueness.q -pl itests/qtest 
-Pitests
   ```
   
   2. Reproduce query failure
   ```
   CREATE TABLE person
   (
   `id`  int,
   `address` struct
   )
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
   STORED AS INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
   
   INSERT INTO person
   VALUES (1, named_struct('number', 61, 'street', 'Terrasse', 'number', 62));
   INSERT INTO person
   VALUES (2, named_struct('number', 51, 'street', 'Terrasse', 'number', 52));
   
   SELECT address.number FROM person;
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 476568)
Remaining Estimate: 0h
Time Spent: 10m

> Hive allows to create a struct with duplicate attribute names
> -
>
> Key: HIVE-22622
> URL: https://issues.apache.org/jira/browse/HIVE-22622
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Krisztian Kasa
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When you create at table with a struct with twice the same attribute name, 
> hive allow you to create it.
> create table test_struct( duplicateColumn struct);
> You can insert data into it :
> insert into test_struct select named_struct("id",1,"id",1);
> But you can not read it :
> select * from test_struct;
> Return : java.io.IOException: java.io.IOException: Error reading file: 
> hdfs://.../test_struct/delta_001_001_/bucket_0 ,
> We can create and insert. but fail on read the Struct part of the tables. We 
> can still read all other columns (if we have more than one) but not the 
> struct anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)