Well, it is not a bug for Spark 2.4 but a bug for Hive 2.1.1 . My colleague 
will report it on the Spark JIRA later.



Presto works fine when reading the ORC table created by Spark 2.4.



We've decided to fix it in Hive 2.1.1 . Since Hive 2.1.1 is widely used, I 
suggest that we should keep a good interop with "legacy" hive. Otherwise, 
newbies will complain about it and decide not to use Spark SQL for CTAS 
scenarios. And for now, there are several cases Apache Hive does better than 
Spark SQL.



We are on the way replacing Hive 2.1.1 with Spark 2.4.x (SQL) and won't 
consider upgrading Hive.



---- On Thu, 14 Feb 2019 21:36:08 +0800 Wenchen Fan <cloud0...@gmail.com> wrote 
----




Do you know which bug ORC 1.5.2 introduced? Or is it because Hive uses a legacy 
version of ORC which has a bug?



On Thu, Feb 14, 2019 at 2:35 PM Darcy Shen <mailto:sad...@zoho.com.invalid> 
wrote:









We found that ORC table created by Spark 2.4 failed to be read by Hive 2.1.1.





spark-sql -e 'CREATE TABLE tmp.orcTable2 USING orc  AS SELECT * FROM 
tmp.orcTable1 limit 10;'

hive -e 'select * from tmp.orcTable2'



The ERROR messages by Hive:



Failed with exception java.io.IOException:java.lang.RuntimeException: ORC split 
generation failed with exception: java.lang.ArrayIndexOutOfBoundsException: 6



And Spark 2.3.2 (or below) works fine.



I think we should git revert [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2 
by Dongjoon Hyun





---- On Tue, 12 Feb 2019 16:56:09 +0800 Dongjin Lee <mailto:dong...@apache.org> 
wrote ----




> SPARK-23539 is a non-trivial improvement, so probably would not be 
> back-ported to 2.4.x.



Got it. It seems reasonable.



Committers:



Please don't omit SPARK-23539 from 2.5.0. Kafka community needs this feature.



Thanks,

Dongjin





On Tue, Feb 12, 2019 at 1:50 PM Takeshi Yamamuro <mailto:linguin....@gmail.com> 
wrote:








-- 

Dongjin Lee




A hitchhiker in the mathematical world.




github:http://goog_969573159/https://github.com/dongjinleekr

linkedin: https://kr.linkedin.com/in/dongjinleekr


speakerdeck: https://speakerdeck.com/dongjin










+1, too.

branch-2.4 accumulates too many commits..:

https://github.com/apache/spark/compare/0a4c03f7d084f1d2aa48673b99f3b9496893ce8d...af3c7111efd22907976fc8bbd7810fe3cfd92092





On Tue, Feb 12, 2019 at 12:36 PM Dongjoon Hyun <mailto:dongj...@apache.org> 
wrote:

Thank you, DB.



+1, Yes. It's time for preparing 2.4.1 release.



Bests,

Dongjoon.



On 2019/02/12 03:16:05, Sean Owen <mailto:sro...@gmail.com> wrote: 

> I support a 2.4.1 release now, yes.

> 

> SPARK-23539 is a non-trivial improvement, so probably would not be

> back-ported to 2.4.x.SPARK-26154 does look like a bug whose fix could

> be back-ported, but that's a big change. I wouldn't hold up 2.4.1 for

> it, but it could go in if otherwise ready.

> 

> 

> On Mon, Feb 11, 2019 at 5:20 PM Dongjin Lee <mailto:dong...@apache.org> wrote:

> >

> > Hi DB,

> >

> > Could you add SPARK-23539[^1] into 2.4.1? I opened the PR[^2] a little bit 
> > ago, but it has not included in 2.3.0 nor get enough review.

> >

> > Thanks,

> > Dongjin

> >

> > [^1]: https://issues.apache.org/jira/browse/SPARK-23539

> > [^2]: https://github.com/apache/spark/pull/22282

> >

> > On Tue, Feb 12, 2019 at 6:28 AM Jungtaek Lim <mailto:kabh...@gmail.com> 
> > wrote:

> >>

> >> Given SPARK-26154 [1] is a correctness issue and PR [2] is submitted, I 
> >> hope it can be reviewed and included within Spark 2.4.1 - otherwise it 
> >> will be a long-live correctness issue.

> >>

> >> Thanks,

> >> Jungtaek Lim (HeartSaVioR)

> >>

> >> 1. https://issues.apache.org/jira/browse/SPARK-26154

> >> 2. https://github.com/apache/spark/pull/23634

> >>

> >>

> >> 2019년 2월 12일 (화) 오전 6:17, DB Tsai <mailto:d_t...@apple.com.invalid>님이 작성:

> >>>

> >>> Hello all,

> >>>

> >>> I am preparing to cut a new Apache 2.4.1 release as there are many bugs 
> >>> and correctness issues fixed in branch-2.4.

> >>>

> >>> The list of addressed issues are 
> >>> https://issues.apache.org/jira/browse/SPARK-26583?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.4.1%20order%20by%20updated%20DESC

> >>>

> >>> Let me know if you have any concern or any PR you would like to get in.

> >>>

> >>> Thanks!

> >>>

> >>> ---------------------------------------------------------------------

> >>> To unsubscribe e-mail: mailto:dev-unsubscr...@spark.apache.org

> >>>

> >

> >

> > --

> > Dongjin Lee

> >

> > A hitchhiker in the mathematical world.

> >

> > github: http://github.com/dongjinleekr

> > linkedin: http://kr.linkedin.com/in/dongjinleekr

> > speakerdeck: http://speakerdeck.com/dongjin

> 

> ---------------------------------------------------------------------

> To unsubscribe e-mail: mailto:dev-unsubscr...@spark.apache.org

> 

> 



---------------------------------------------------------------------

To unsubscribe e-mail: mailto:dev-unsubscr...@spark.apache.org









-- 

---

Takeshi Yamamuro

Reply via email to