[jira] [Work logged] (HIVE-22622) Hive allows to create a struct with duplicate attribute names
[ https://issues.apache.org/jira/browse/HIVE-22622?focusedWorklogId=477493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-477493 ] ASF GitHub Bot logged work on HIVE-22622: - Author: ASF GitHub Bot Created on: 01/Sep/20 22:26 Start Date: 01/Sep/20 22:26 Worklog Time Spent: 10m Work Description: jcamachor merged pull request #1446: URL: https://github.com/apache/hive/pull/1446 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 477493) Time Spent: 50m (was: 40m) > Hive allows to create a struct with duplicate attribute names > - > > Key: HIVE-22622 > URL: https://issues.apache.org/jira/browse/HIVE-22622 > Project: Hive > Issue Type: Bug >Reporter: Denys Kuzmenko >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > When you create at table with a struct with twice the same attribute name, > hive allow you to create it. > create table test_struct( duplicateColumn struct); > You can insert data into it : > insert into test_struct select named_struct("id",1,"id",1); > But you can not read it : > select * from test_struct; > Return : java.io.IOException: java.io.IOException: Error reading file: > hdfs://.../test_struct/delta_001_001_/bucket_0 , > We can create and insert. but fail on read the Struct part of the tables. We > can still read all other columns (if we have more than one) but not the > struct anymore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22622) Hive allows to create a struct with duplicate attribute names
[ https://issues.apache.org/jira/browse/HIVE-22622?focusedWorklogId=477026=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-477026 ] ASF GitHub Bot logged work on HIVE-22622: - Author: ASF GitHub Bot Created on: 01/Sep/20 07:13 Start Date: 01/Sep/20 07:13 Worklog Time Spent: 10m Work Description: kasakrisz commented on pull request #1446: URL: https://github.com/apache/hive/pull/1446#issuecomment-684505690 Adding a test case for various underlying formats doesn't make sense this case because the duplicate check is performed in the semantical analysis phase of the `create table` statement and the error message would be the same. So if a duplicate found the code flow doesn't reach the point where the table is actually created. Added a test case for nested struct. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 477026) Time Spent: 40m (was: 0.5h) > Hive allows to create a struct with duplicate attribute names > - > > Key: HIVE-22622 > URL: https://issues.apache.org/jira/browse/HIVE-22622 > Project: Hive > Issue Type: Bug >Reporter: Denys Kuzmenko >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > When you create at table with a struct with twice the same attribute name, > hive allow you to create it. > create table test_struct( duplicateColumn struct); > You can insert data into it : > insert into test_struct select named_struct("id",1,"id",1); > But you can not read it : > select * from test_struct; > Return : java.io.IOException: java.io.IOException: Error reading file: > hdfs://.../test_struct/delta_001_001_/bucket_0 , > We can create and insert. but fail on read the Struct part of the tables. We > can still read all other columns (if we have more than one) but not the > struct anymore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22622) Hive allows to create a struct with duplicate attribute names
[ https://issues.apache.org/jira/browse/HIVE-22622?focusedWorklogId=477019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-477019 ] ASF GitHub Bot logged work on HIVE-22622: - Author: ASF GitHub Bot Created on: 01/Sep/20 07:03 Start Date: 01/Sep/20 07:03 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #1446: URL: https://github.com/apache/hive/pull/1446#discussion_r480897671 ## File path: common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java ## @@ -471,6 +471,7 @@ "Not an ordered-set aggregate function: {0}. WITHIN GROUP clause is not allowed.", true), WITHIN_GROUP_PARAMETER_MISMATCH(10422, "The number of hypothetical direct arguments ({0}) must match the number of ordering columns ({1})", true), + AMBIGUOUS_STRUCT_FIELD(10423, "Struct field is not unique: {0}", true), Review comment: Renamed `field` to `attribute` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 477019) Time Spent: 0.5h (was: 20m) > Hive allows to create a struct with duplicate attribute names > - > > Key: HIVE-22622 > URL: https://issues.apache.org/jira/browse/HIVE-22622 > Project: Hive > Issue Type: Bug >Reporter: Denys Kuzmenko >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When you create at table with a struct with twice the same attribute name, > hive allow you to create it. > create table test_struct( duplicateColumn struct); > You can insert data into it : > insert into test_struct select named_struct("id",1,"id",1); > But you can not read it : > select * from test_struct; > Return : java.io.IOException: java.io.IOException: Error reading file: > hdfs://.../test_struct/delta_001_001_/bucket_0 , > We can create and insert. but fail on read the Struct part of the tables. We > can still read all other columns (if we have more than one) but not the > struct anymore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22622) Hive allows to create a struct with duplicate attribute names
[ https://issues.apache.org/jira/browse/HIVE-22622?focusedWorklogId=476848=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-476848 ] ASF GitHub Bot logged work on HIVE-22622: - Author: ASF GitHub Bot Created on: 31/Aug/20 21:28 Start Date: 31/Aug/20 21:28 Worklog Time Spent: 10m Work Description: zabetak commented on a change in pull request #1446: URL: https://github.com/apache/hive/pull/1446#discussion_r480409116 ## File path: common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java ## @@ -471,6 +471,7 @@ "Not an ordered-set aggregate function: {0}. WITHIN GROUP clause is not allowed.", true), WITHIN_GROUP_PARAMETER_MISMATCH(10422, "The number of hypothetical direct arguments ({0}) must match the number of ordering columns ({1})", true), + AMBIGUOUS_STRUCT_FIELD(10423, "Struct field is not unique: {0}", true), Review comment: nit: Usually we use "field" for row types and "attribute" for struct types. Plus it reads more natural if we inline the attribute name in the sentence: `Attribute \"{0}\" specified more than once in structured type.` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 476848) Time Spent: 20m (was: 10m) > Hive allows to create a struct with duplicate attribute names > - > > Key: HIVE-22622 > URL: https://issues.apache.org/jira/browse/HIVE-22622 > Project: Hive > Issue Type: Bug >Reporter: Denys Kuzmenko >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When you create at table with a struct with twice the same attribute name, > hive allow you to create it. > create table test_struct( duplicateColumn struct); > You can insert data into it : > insert into test_struct select named_struct("id",1,"id",1); > But you can not read it : > select * from test_struct; > Return : java.io.IOException: java.io.IOException: Error reading file: > hdfs://.../test_struct/delta_001_001_/bucket_0 , > We can create and insert. but fail on read the Struct part of the tables. We > can still read all other columns (if we have more than one) but not the > struct anymore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22622) Hive allows to create a struct with duplicate attribute names
[ https://issues.apache.org/jira/browse/HIVE-22622?focusedWorklogId=476568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-476568 ] ASF GitHub Bot logged work on HIVE-22622: - Author: ASF GitHub Bot Created on: 31/Aug/20 13:02 Start Date: 31/Aug/20 13:02 Worklog Time Spent: 10m Work Description: kasakrisz opened a new pull request #1446: URL: https://github.com/apache/hive/pull/1446 ### What changes were proposed in this pull request? Add a check for duplicated struct field identifiers and throw SemanticException with customized error message when found. ### Why are the changes needed? Creating a table with a struct type column with duplicate field identifier and inserting records is allowed but later when querying from the table we cannot distinguish between the attributes of the struct has the same identifier. In some cases (depending on table serde format) the query may fails. See jira for details. ### Does this PR introduce _any_ user-facing change? Introduce new error code and message. Example: ``` FAILED: SemanticException [Error 10423]: Struct field is not unique: id ``` ### How was this patch tested? 1. Create new negative test: ``` mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestNegativeCliDriver -Dqfile=struct_field_uniqueness.q -pl itests/qtest -Pitests ``` 2. Reproduce query failure ``` CREATE TABLE person ( `id` int, `address` struct ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'; INSERT INTO person VALUES (1, named_struct('number', 61, 'street', 'Terrasse', 'number', 62)); INSERT INTO person VALUES (2, named_struct('number', 51, 'street', 'Terrasse', 'number', 52)); SELECT address.number FROM person; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 476568) Remaining Estimate: 0h Time Spent: 10m > Hive allows to create a struct with duplicate attribute names > - > > Key: HIVE-22622 > URL: https://issues.apache.org/jira/browse/HIVE-22622 > Project: Hive > Issue Type: Bug >Reporter: Denys Kuzmenko >Assignee: Krisztian Kasa >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When you create at table with a struct with twice the same attribute name, > hive allow you to create it. > create table test_struct( duplicateColumn struct); > You can insert data into it : > insert into test_struct select named_struct("id",1,"id",1); > But you can not read it : > select * from test_struct; > Return : java.io.IOException: java.io.IOException: Error reading file: > hdfs://.../test_struct/delta_001_001_/bucket_0 , > We can create and insert. but fail on read the Struct part of the tables. We > can still read all other columns (if we have more than one) but not the > struct anymore. -- This message was sent by Atlassian Jira (v8.3.4#803005)