[jira] [Commented] (HIVE-22622) Hive allows to create a struct with duplicate attribute names

Stamatis Zampetakis (Jira) Sat, 29 Aug 2020 02:28:53 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-22622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186922#comment-17186922
 ]


Stamatis Zampetakis commented on HIVE-22622:
--------------------------------------------

I just tried the following test on master and it fails.
{code:sql}
CREATE TABLE person
(
    `id`      int,
    `address` struct<number:int,street:string,number:int>
)
    ROW FORMAT SERDE
        'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
    STORED AS INPUTFORMAT
        'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
        OUTPUTFORMAT
            'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';

INSERT INTO person
VALUES (1, named_struct('number', 61, 'street', 'Terrasse', 'number', 62));
INSERT INTO person
VALUES (2, named_struct('number', 51, 'street', 'Terrasse', 'number', 52));

SELECT address.number FROM person;
{code}
And it fails with the following exception when performing the SELECT statement:

{noformat}
java.io.IOException: java.io.IOException: Error reading file: 
file:/home/stamatis/Projects/Apache/hive/itests/qtest/target/localfs/warehouse/person/000000_0
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:638)
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545)
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150)
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:556)
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:243)
        at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:279)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
        at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:740)
        at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:710)
        at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
        at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
        at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
        at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
        at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
        at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
        at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
        at org.junit.runners.Suite.runChild(Suite.java:128)
        at org.junit.runners.Suite.runChild(Suite.java:27)
        at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
        at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
        at 
org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:95)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
        at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:377)
        at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:138)
        at 
org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:465)
        at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:451)
Caused by: java.io.IOException: Error reading file: 
file:/home/stamatis/Projects/Apache/hive/itests/qtest/target/localfs/warehouse/person/000000_0
        at 
org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1329)
        at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:88)
        at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:104)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:256)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:231)
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:605)
        ... 53 more
Caused by: java.io.EOFException: Read past end of RLE integer from compressed 
stream Stream for column 5 kind DATA position: 6 length: 6 range: 0 offset: 23 
limit: 23 range 0 = 0 to 6 uncompressed: 3 to 3
        at 
org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:61)
        at 
org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)
        at 
org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373)
        at 
org.apache.orc.impl.TreeReaderFactory$IntTreeReader.nextVector(TreeReaderFactory.java:570)
        at 
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2077)
        at 
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2059)
        at 
org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1322)
        ... 58 more
{noformat}
Apart from the failure the SELECT statement is also ambiguous since we cannot 
distinguish between the two attributes of the struct. Due to the above I 
believe the best solution would be to dissallow duplicate fields in structs. 

Note that if the table is declared to be in JSON format the duplicate checks 
seem to be in place and the CREATE statement fails so I think the declared 
format of the table is important for reproducing the problem.

> Hive allows to create a struct with duplicate attribute names
> -------------------------------------------------------------
>
>                 Key: HIVE-22622
>                 URL: https://issues.apache.org/jira/browse/HIVE-22622
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Denys Kuzmenko
>            Assignee: Krisztian Kasa
>            Priority: Major
>
> When you create at table with a struct with twice the same attribute name, 
> hive allow you to create it.
> create table test_struct( duplicateColumn struct<id:int, id:int>);
> You can insert data into it :
> insert into test_struct select named_struct("id",1,"id",1);
> But you can not read it :
> select * from test_struct;
> Return : java.io.IOException: java.io.IOException: Error reading file: 
> hdfs://.../test_struct/delta_0000001_0000001_0000/bucket_00000 ,
> We can create and insert. but fail on read the Struct part of the tables. We 
> can still read all other columns (if we have more than one) but not the 
> struct anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-22622) Hive allows to create a struct with duplicate attribute names

Reply via email to