Re: Hive 2 performance

2016-02-25 Thread Alan Gates
HPLSQL is part of Hive, but it is not fully integrated into Hive itself yet. It is still an external module that handles the control flow while passing Hive SQL into Hive via JDBC. We’d like to integrate it fully with Hive’s parser but we’re not there yet. Alan. > On Feb 25, 2016, at 14:26,

Re: Hive 2 performance

2016-02-25 Thread Mich Talebzadeh
Hi Gopal, Is HPLSQL is integrated into Hive 2 as part of its SQL? Thanks, Mich On 25/02/2016 10:38, Mich Talebzadeh wrote: > Apologies the job on Spark using Functional programming was run on a bigger > table. > > The correct timing is 42 seconds for Spark > > On 25/02/2016

Re: ORC file split calculation problems

2016-02-25 Thread Prasanth Jayachandran
> On Feb 25, 2016, at 3:15 PM, Prasanth Jayachandran > wrote: > > Hi Patrick > > Can you paste entire stacktrace? Looks like NPE happened during split > generation but stack trace is incomplete to know what caused it. > > In Hive 0.14.0, the stripe size is

Re: ORC file split calculation problems

2016-02-25 Thread Prasanth Jayachandran
Hi Patrick Can you paste entire stacktrace? Looks like NPE happened during split generation but stack trace is incomplete to know what caused it. In Hive 0.14.0, the stripe size is changed to 64MB. The default block size for ORC files is 256MB. 4 stripes can fit a block. ORC does padding to

ORC file split calculation problems

2016-02-25 Thread Patrick Duin
Hi, We've recently moved one of our datasets to ORC and we use Cascading and Hive to read this data. We've had problems reading the data via Cascading, because of the generation of splits. We read in a large number of files (thousands) and they are about 1GB each. We found that the split

Re: Hive 2 performance

2016-02-25 Thread Mich Talebzadeh
Apologies the job on Spark using Functional programming was run on a bigger table. The correct timing is 42 seconds for Spark On 25/02/2016 10:15, Mich Talebzadeh wrote: > hanks Gopal I made the following observation so far: > > Using the old MR you get this message now which is fine >

Re: Hive 2 performance

2016-02-25 Thread Mich Talebzadeh
hanks Gopal I made the following observation so far: Using the old MR you get this message now which is fine Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases. use

Beeline select query

2016-02-25 Thread Pooja Chawda
Hi All, I am trying to switch to Beeline from hive, but I have one issue. *Hive : * Query : select * from table; Output : The output is in readable format i.e we can scroll down to see complete data on my screen(terminal). *Beeline : * Query : select * from table; Output : The output is

Re: Hive 2 performance

2016-02-25 Thread Gopal Vijayaraghavan
> Correct hence the question as I have done some preliminary tests on Hive >2. > I want to share insights with other people who have performed the same If you have feedback on Hive-2.0, I'm all ears. I'm building up 2.1 features & fixes, so now would be a good time to bring stuff up. Speed

simple sql query failing in Hive 2

2016-02-25 Thread Mich Talebzadeh
Hi, This query works fine in Oracle scratch...@mydb.mich.LOCAL> select * from dummy where id = (select max(id) from dummy); ID CLUSTERED SCATTERED RANDOMISED RANDOM_STRING SMALL_VC PADDING -- -- -- -- --