[jira] [Commented] (SPARK-26932) Orc compatibility between hive and spark
[ https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782635#comment-16782635 ] Bo Hai commented on SPARK-26932: Thanks for your patience and guide, [~dongjoon] I am a newcomer to spark and open source community and I would like to do something useful. > Orc compatibility between hive and spark > > > Key: SPARK-26932 > URL: https://issues.apache.org/jira/browse/SPARK-26932 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0 >Reporter: Bo Hai >Priority: Minor > > As of Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer > and reader. In older version of Hive, orc reader(isn't forward-compitaient) > implemented by its own. > So Hive 2.2 and older can not read orc table created by spark 2.3 and newer > which using apache/orc instead of Hive orc. > I think we should add these information into Spark2.4 orc configuration file > : https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26932) Orc compatibility between hive and spark
[ https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777654#comment-16777654 ] Dongjoon Hyun commented on SPARK-26932: --- `Migration Guide` might be the best place for that. Please use the migration guide from 2.3 to 2.4. - https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html#upgrading-from-spark-sql-23-to-24 > Orc compatibility between hive and spark > > > Key: SPARK-26932 > URL: https://issues.apache.org/jira/browse/SPARK-26932 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0 >Reporter: Bo Hai >Priority: Minor > > As of Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer > and reader. In older version of Hive, orc reader(isn't forward-compitaient) > implemented by its own. > So Hive 2.2 and older can not read orc table created by spark 2.3 and newer > which using apache/orc instead of Hive orc. > I think we should add these information into Spark2.4 orc configuration file > : https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26932) Orc compatibility between hive and spark
[ https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777648#comment-16777648 ] Dongjoon Hyun commented on SPARK-26932: --- Thank you for updating, [~haiboself]. So, does Apache Hive also has a document for this? For example, Hive 2.3.x generates some ORC tables which Hive 2.2.1 cannot read. We can add a reference to that Hive document if it exists. In general, this is Hive-side read issue, isn't it? BTW, as I wrote in the mailing list, Spark 2.3.x has `spark.sql.orc.impl=hive` by default. So, I don't think we need a document for that. For Spark 2.4, please make a PR. I'm +1. > Orc compatibility between hive and spark > > > Key: SPARK-26932 > URL: https://issues.apache.org/jira/browse/SPARK-26932 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0 >Reporter: Bo Hai >Priority: Minor > > As of Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer > and reader. In older version of Hive, orc reader(isn't forward-compitaient) > implemented by its own. > So Hive 2.2 and older can not read orc table created by spark 2.3 and newer > which using apache/orc instead of Hive orc. > I think we should add these information into Spark2.4 orc configuration file > : https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26932) Orc compatibility between hive and spark
[ https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776192#comment-16776192 ] Bo Hai commented on SPARK-26932: Relevant hive jiras: * https://jira.apache.org/jira/browse/SPARK-24322 > Orc compatibility between hive and spark > > > Key: SPARK-26932 > URL: https://issues.apache.org/jira/browse/SPARK-26932 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0 >Reporter: Bo Hai >Priority: Minor > > As of Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer > and reader. In older version of Hive, orc reader(isn't forward-compitaient) > implemented by its own. > So Hive 2.2 and older can not read orc table created by spark 2.3 and newer > which using apache/orc instead of Hive orc. > I think we should add these information into Spark2.4 orc configuration file > : https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26932) Orc compatibility between hive and spark
[ https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776188#comment-16776188 ] Bo Hai commented on SPARK-26932: We discuss this issue in dev mail list before, refer to http://apache-spark-developers-list.1001551.n3.nabble.com/Time-to-cut-an-Apache-2-4-1-release-tt26381.html#a26428 > Orc compatibility between hive and spark > > > Key: SPARK-26932 > URL: https://issues.apache.org/jira/browse/SPARK-26932 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0 >Reporter: Bo Hai >Priority: Minor > > As of Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer > and reader. In older version of Hive, orc reader(isn't forward-compitaient) > implemented by its own. > So Hive 2.2 and older can not read orc table created by spark 2.3 and newer > which using apache/orc instead of Hive orc. > I think we should add these information into Spark2.4 orc configuration file > : https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26932) Orc compatibility between hive and spark
[ https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776181#comment-16776181 ] Bo Hai commented on SPARK-26932: To reproduce this issue, please create ORC table by Spark 2.4 and read by Hive 2.1.1 like : spark-sql -e 'CREATE TABLE tmp.orcTable2 USING orc AS SELECT * FROM tmp.orcTable1 limit 10;' hive -e 'select * from tmp.orcTable2;' Hive will throw exception showing below: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 6 at org.apache.orc.OrcFile$WriterVersion.from(OrcFile.java:145) at org.apache.orc.impl.OrcTail.getWriterVersion(OrcTail.java:74) at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:385) at org.apache.orc.OrcFile.createReader(OrcFile.java:222) at org.apache.orc.tools.FileDump.getReader(FileDump.java:255) at org.apache.orc.tools.FileDump.printMetaDataImpl(FileDump.java:328) at org.apache.orc.tools.FileDump.printMetaData(FileDump.java:307) at org.apache.orc.tools.FileDump.main(FileDump.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Orc compatibility between hive and spark > > > Key: SPARK-26932 > URL: https://issues.apache.org/jira/browse/SPARK-26932 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0 >Reporter: Bo Hai >Priority: Minor > > As of Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer > and reader. In older version of Hive, orc reader(isn't forward-compitaient) > implemented by its own. > So Hive 2.2 and older can not read orc table created by spark 2.3 and newer > which using apache/orc instead of Hive orc. > I think we should add these information into Spark2.4 orc configuration file > : https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26932) Orc compatibility between hive and spark
[ https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772692#comment-16772692 ] Hyukjin Kwon commented on SPARK-26932: -- Also, can you know the reproducer please? How did you verify they are not compatible? > Orc compatibility between hive and spark > > > Key: SPARK-26932 > URL: https://issues.apache.org/jira/browse/SPARK-26932 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0 >Reporter: Bo Hai >Priority: Minor > > Since Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer > and reader. In older version of Hive, orc reader(isn't forward-compitaient) > implemented by its own. > So Hive 2.2 and older can not read orc table created by spark 2.3 and newer > which using apache/orc instead of Hive orc. > Spark2.4 orc configuration: > https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26932) Orc compatibility between hive and spark
[ https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772652#comment-16772652 ] Dongjoon Hyun commented on SPARK-26932: --- Hi, [~haiboself]. Could you link the corresponding Hive JIRA issue here? > Orc compatibility between hive and spark > > > Key: SPARK-26932 > URL: https://issues.apache.org/jira/browse/SPARK-26932 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0 >Reporter: Bo Hai >Priority: Minor > > Since Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer > and reader. In older version of Hive, orc reader(isn't forward-compitaient) > implemented by its own. > So Hive 2.2 and older can not read orc table created by spark 2.3 and newer > which using apache/orc instead of Hive orc. > Spark2.4 orc configuration: > https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org