[ https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776181#comment-16776181 ]
Bo Hai edited comment on SPARK-26932 at 2/24/19 9:22 AM: --------------------------------------------------------- To reproduce this issue, please create ORC table by Spark 2.3.2/2.4 and read by Hive 2.1.1 like : spark-sql -e 'CREATE TABLE tmp.orcTable2 USING orc AS SELECT * FROM tmp.orcTable1 limit 10;' hive -e 'select * from tmp.orcTable2;' Hive will throw exception showing below: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 6 at org.apache.orc.OrcFile$WriterVersion.from(OrcFile.java:145) at org.apache.orc.impl.OrcTail.getWriterVersion(OrcTail.java:74) at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:385) at org.apache.orc.OrcFile.createReader(OrcFile.java:222) at org.apache.orc.tools.FileDump.getReader(FileDump.java:255) at org.apache.orc.tools.FileDump.printMetaDataImpl(FileDump.java:328) at org.apache.orc.tools.FileDump.printMetaData(FileDump.java:307) at org.apache.orc.tools.FileDump.main(FileDump.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) was (Author: haiboself): To reproduce this issue, please create ORC table by Spark 2.4 and read by Hive 2.1.1 like : spark-sql -e 'CREATE TABLE tmp.orcTable2 USING orc AS SELECT * FROM tmp.orcTable1 limit 10;' hive -e 'select * from tmp.orcTable2;' Hive will throw exception showing below: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 6 at org.apache.orc.OrcFile$WriterVersion.from(OrcFile.java:145) at org.apache.orc.impl.OrcTail.getWriterVersion(OrcTail.java:74) at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:385) at org.apache.orc.OrcFile.createReader(OrcFile.java:222) at org.apache.orc.tools.FileDump.getReader(FileDump.java:255) at org.apache.orc.tools.FileDump.printMetaDataImpl(FileDump.java:328) at org.apache.orc.tools.FileDump.printMetaData(FileDump.java:307) at org.apache.orc.tools.FileDump.main(FileDump.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Orc compatibility between hive and spark > ---------------------------------------- > > Key: SPARK-26932 > URL: https://issues.apache.org/jira/browse/SPARK-26932 > Project: Spark > Issue Type: Documentation > Components: Documentation > Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0 > Reporter: Bo Hai > Priority: Minor > > As of Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer > and reader. In older version of Hive, orc reader(isn't forward-compitaient) > implemented by its own. > So Hive 2.2 and older can not read orc table created by spark 2.3 and newer > which using apache/orc instead of Hive orc. > I think we should add these information into Spark2.4 orc configuration file > : https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org