[jira] [Commented] (SPARK-26932) Orc compatibility between hive and spark

2019-03-03 Thread Bo Hai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782635#comment-16782635
 ] 

Bo Hai commented on SPARK-26932:


Thanks for your patience and guide, [~dongjoon]

I am a newcomer to spark and open source community and I would like to do 
something useful.

> Orc compatibility between hive and spark
> 
>
> Key: SPARK-26932
> URL: https://issues.apache.org/jira/browse/SPARK-26932
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0
>Reporter: Bo Hai
>Priority: Minor
>
> As of Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer 
> and reader. In older version of Hive, orc reader(isn't forward-compitaient) 
> implemented by its own.
> So Hive 2.2 and older can not read orc table created by spark 2.3 and newer 
> which using apache/orc instead of Hive orc.
> I think we should add these information into Spark2.4 orc configuration file 
> : https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26932) Orc compatibility between hive and spark

2019-02-25 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777654#comment-16777654
 ] 

Dongjoon Hyun commented on SPARK-26932:
---

`Migration Guide` might be the best place for that. Please use the migration 
guide from 2.3 to 2.4.
- 
https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html#upgrading-from-spark-sql-23-to-24

> Orc compatibility between hive and spark
> 
>
> Key: SPARK-26932
> URL: https://issues.apache.org/jira/browse/SPARK-26932
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0
>Reporter: Bo Hai
>Priority: Minor
>
> As of Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer 
> and reader. In older version of Hive, orc reader(isn't forward-compitaient) 
> implemented by its own.
> So Hive 2.2 and older can not read orc table created by spark 2.3 and newer 
> which using apache/orc instead of Hive orc.
> I think we should add these information into Spark2.4 orc configuration file 
> : https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26932) Orc compatibility between hive and spark

2019-02-25 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777648#comment-16777648
 ] 

Dongjoon Hyun commented on SPARK-26932:
---

Thank you for updating, [~haiboself]. 

So, does Apache Hive also has a document for this? For example, Hive 2.3.x 
generates some ORC tables which Hive 2.2.1 cannot read. We can add a reference 
to that Hive document if it exists. In general, this is Hive-side read issue, 
isn't it?

BTW, as I wrote in the mailing list, Spark 2.3.x has `spark.sql.orc.impl=hive` 
by default. So, I don't think we need a document for that. For Spark 2.4, 
please make a PR. I'm +1.

> Orc compatibility between hive and spark
> 
>
> Key: SPARK-26932
> URL: https://issues.apache.org/jira/browse/SPARK-26932
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0
>Reporter: Bo Hai
>Priority: Minor
>
> As of Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer 
> and reader. In older version of Hive, orc reader(isn't forward-compitaient) 
> implemented by its own.
> So Hive 2.2 and older can not read orc table created by spark 2.3 and newer 
> which using apache/orc instead of Hive orc.
> I think we should add these information into Spark2.4 orc configuration file 
> : https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26932) Orc compatibility between hive and spark

2019-02-24 Thread Bo Hai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776192#comment-16776192
 ] 

Bo Hai commented on SPARK-26932:


Relevant hive jiras:
* https://jira.apache.org/jira/browse/SPARK-24322

> Orc compatibility between hive and spark
> 
>
> Key: SPARK-26932
> URL: https://issues.apache.org/jira/browse/SPARK-26932
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0
>Reporter: Bo Hai
>Priority: Minor
>
> As of Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer 
> and reader. In older version of Hive, orc reader(isn't forward-compitaient) 
> implemented by its own.
> So Hive 2.2 and older can not read orc table created by spark 2.3 and newer 
> which using apache/orc instead of Hive orc.
> I think we should add these information into Spark2.4 orc configuration file 
> : https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26932) Orc compatibility between hive and spark

2019-02-24 Thread Bo Hai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776188#comment-16776188
 ] 

Bo Hai commented on SPARK-26932:


We discuss this issue in dev mail list before, refer to 
http://apache-spark-developers-list.1001551.n3.nabble.com/Time-to-cut-an-Apache-2-4-1-release-tt26381.html#a26428

> Orc compatibility between hive and spark
> 
>
> Key: SPARK-26932
> URL: https://issues.apache.org/jira/browse/SPARK-26932
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0
>Reporter: Bo Hai
>Priority: Minor
>
> As of Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer 
> and reader. In older version of Hive, orc reader(isn't forward-compitaient) 
> implemented by its own.
> So Hive 2.2 and older can not read orc table created by spark 2.3 and newer 
> which using apache/orc instead of Hive orc.
> I think we should add these information into Spark2.4 orc configuration file 
> : https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26932) Orc compatibility between hive and spark

2019-02-24 Thread Bo Hai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776181#comment-16776181
 ] 

Bo Hai commented on SPARK-26932:


To reproduce this issue, please create ORC table by Spark 2.4 and read by Hive 
2.1.1 like :

spark-sql -e 'CREATE TABLE tmp.orcTable2 USING orc AS SELECT * FROM 
tmp.orcTable1 limit 10;'

hive -e 'select * from tmp.orcTable2;'

Hive will throw exception showing below:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 6
at org.apache.orc.OrcFile$WriterVersion.from(OrcFile.java:145)
at org.apache.orc.impl.OrcTail.getWriterVersion(OrcTail.java:74)
at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:385)
at org.apache.orc.OrcFile.createReader(OrcFile.java:222)
at org.apache.orc.tools.FileDump.getReader(FileDump.java:255)
at org.apache.orc.tools.FileDump.printMetaDataImpl(FileDump.java:328)
at org.apache.orc.tools.FileDump.printMetaData(FileDump.java:307)
at org.apache.orc.tools.FileDump.main(FileDump.java:154)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

> Orc compatibility between hive and spark
> 
>
> Key: SPARK-26932
> URL: https://issues.apache.org/jira/browse/SPARK-26932
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0
>Reporter: Bo Hai
>Priority: Minor
>
> As of Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer 
> and reader. In older version of Hive, orc reader(isn't forward-compitaient) 
> implemented by its own.
> So Hive 2.2 and older can not read orc table created by spark 2.3 and newer 
> which using apache/orc instead of Hive orc.
> I think we should add these information into Spark2.4 orc configuration file 
> : https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26932) Orc compatibility between hive and spark

2019-02-19 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772692#comment-16772692
 ] 

Hyukjin Kwon commented on SPARK-26932:
--

Also, can you know the reproducer please? How did you verify they are not 
compatible?

> Orc compatibility between hive and spark
> 
>
> Key: SPARK-26932
> URL: https://issues.apache.org/jira/browse/SPARK-26932
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0
>Reporter: Bo Hai
>Priority: Minor
>
> Since Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer 
> and reader. In older version of Hive, orc reader(isn't forward-compitaient) 
> implemented by its own.
> So Hive 2.2 and older can not read orc table created by spark 2.3 and newer 
> which using apache/orc instead of Hive orc.
> Spark2.4 orc configuration: 
> https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26932) Orc compatibility between hive and spark

2019-02-19 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772652#comment-16772652
 ] 

Dongjoon Hyun commented on SPARK-26932:
---

Hi, [~haiboself]. Could you link the corresponding Hive JIRA issue here?

> Orc compatibility between hive and spark
> 
>
> Key: SPARK-26932
> URL: https://issues.apache.org/jira/browse/SPARK-26932
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0
>Reporter: Bo Hai
>Priority: Minor
>
> Since Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer 
> and reader. In older version of Hive, orc reader(isn't forward-compitaient) 
> implemented by its own.
> So Hive 2.2 and older can not read orc table created by spark 2.3 and newer 
> which using apache/orc instead of Hive orc.
> Spark2.4 orc configuration: 
> https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org