subject:"Joining a compressed ORC table with a non compressed text table"

Re: Joining a compressed ORC table with a non compressed text table

2016-06-29 Thread Michael Segel

Hi, I’m not sure I understand your initial question… Depending on the compression algo, you may or may not be able to split the file. So if its not splittable, you have a single long running thread. My guess is that you end up with a very long single partition. If so, if you repartition,

Re: Joining a compressed ORC table with a non compressed text table

2016-06-29 Thread Jörn Franke

Does the same happen if all the tables are in ORC format? It might be just simpler to convert the text table to ORC since it is rather small > On 29 Jun 2016, at 15:14, Mich Talebzadeh wrote: > > Hi all, > > It finished in 2 hours 18 minutes! > > Started at >

Re: Joining a compressed ORC table with a non compressed text table

2016-06-29 Thread Jörn Franke

I think the TEZ engine is much more maintained with respect to optimizations related to Orc , hive , vectorizing, querying than the mr engine. It will be definitely better to use it. Mr is also deprecated in hive 2.0. For me it does not make sense to use mr with hive larger than 1.1. As I

Re: Joining a compressed ORC table with a non compressed text table

2016-06-28 Thread Timur Shenkao

Hi, guys! As far as I remember, Spark does not use all peculiarities and optimizations of ORC. Moreover, the possibility to read ORC files appeared not so long time ago in Spark. So, despite "victorious" results announced in http://hortonworks.com/blog/bringing-orc-support-into-apache-spark/ ,

Re: Joining a compressed ORC table with a non compressed text table

2016-06-28 Thread Mich Talebzadeh

This is what I am getting in the container log for mr 2016-06-28 23:25:53,808 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS

Re: Joining a compressed ORC table with a non compressed text table

2016-06-28 Thread Mich Talebzadeh

That is a good point. The ORC table property is as follows TBLPROPERTIES ( "orc.compress"="SNAPPY", "orc.stripe.size"="268435456", "orc.row.index.stride"="1") which puts each stripe at 256MB Just to clarify this is spark running on Hive tables. I don't think the use of TEZ, MR or Spark as

Re: Joining a compressed ORC table with a non compressed text table

2016-06-28 Thread Jörn Franke

Bzip2 is splittable for text files. Btw in Orc the question of splittable does not matter because each stripe is compressed individually. Have you tried tez? As far as I recall (at least it was in the first version of Hive) mr uses for order by a single reducer which is a bottleneck. Do you

Joining a compressed ORC table with a non compressed text table

2016-06-28 Thread Mich Talebzadeh

Hi, I have a simple join between table sales2 a compressed (snappy) ORC with 22 million rows and another simple table sales_staging under a million rows stored as a text file with no compression. The join is very simple val s2 = HiveContext.table("sales2").select("PROD_ID") val s =

Re: Joining a compressed ORC table with a non compressed text table

Re: Joining a compressed ORC table with a non compressed text table

Re: Joining a compressed ORC table with a non compressed text table

Re: Joining a compressed ORC table with a non compressed text table

Re: Joining a compressed ORC table with a non compressed text table

Re: Joining a compressed ORC table with a non compressed text table

Re: Joining a compressed ORC table with a non compressed text table

Joining a compressed ORC table with a non compressed text table

8 matches

Site Navigation

Mail list logo

Footer information