Re: [ANNOUNCE] New PMC Member : Jesus

2016-07-18 Thread Tushar Marne
Congrats Jesus!!

On Tue, Jul 19, 2016 at 11:23 AM, Vaibhav Gumashta <
vgumas...@hortonworks.com> wrote:

> Congrats Jesüs!
>
> --Vaibhav
> 
> From: Vineet Garg 
> Sent: Monday, July 18, 2016 6:51 PM
> To: d...@hive.apache.org; user@hive.apache.org
> Subject: Re: [ANNOUNCE] New PMC Member : Jesus
>
> Congrats Jesus !
>
>
>
>
> On 7/18/16, 10:27 AM, "Jesus Camacho Rodriguez" <
> jcamachorodrig...@hortonworks.com> wrote:
>
> >Thanks everybody! Looking forward to continue contributing to the project!
> >
> >--
> >Jesús
> >
> >
> >
> >
> >On 7/18/16, 6:21 PM, "Prasanth Jayachandran" <
> pjayachand...@hortonworks.com> wrote:
> >
> >>Congratulations Jesus!
> >>
> >>> On Jul 18, 2016, at 10:10 AM, Jimmy Xiang  wrote:
> >>>
> >>> Congrats!!
> >>>
> >>> On Mon, Jul 18, 2016 at 9:54 AM, Vihang Karajgaonkar
> >>>  wrote:
>  Congratulations Jesus!
> 
> > On Jul 18, 2016, at 8:30 AM, Sergio Pena 
> wrote:
> >
> > Congrats Jesus !!!
> >
> > On Mon, Jul 18, 2016 at 7:28 AM, Peter Vary 
> wrote:
> >
> >> Congratulations Jesus!
> >>
> >>> On Jul 18, 2016, at 6:55 AM, Wei Zheng 
> wrote:
> >>>
> >>> Congrats Jesus!
> >>>
> >>> Thanks,
> >>>
> >>> Wei
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On 7/17/16, 14:29, "Sushanth Sowmyan"  wrote:
> >>>
>  Good to have you onboard, Jesus! :)
> 
>  On Jul 17, 2016 12:00, "Lefty Leverenz" 
> >> wrote:
> 
> > Congratulations Jesus!
> >
> > -- Lefty
> >
> > On Sun, Jul 17, 2016 at 1:01 PM, Ashutosh Chauhan <
> >> hashut...@apache.org>
> > wrote:
> >
> >> Hello Hive community,
> >>
> >> I'm pleased to announce that Jesus Camacho Rodriguez has
> accepted the
> >> Apache Hive PMC's
> >> invitation, and is now our newest PMC member. Many thanks to
> Jesus for
> >> all of
> >> his hard work.
> >>
> >> Please join me congratulating Jesus!
> >>
> >> Best,
> >> Ashutosh
> >> (On behalf of the Apache Hive PMC)
> >>
> >
> >
> >>
> >>
> 
> >>>
> >>
> >>
>



-- 
Tushar Marne
9011062432


Re: Hive External Storage Handlers

2016-07-18 Thread Mich Talebzadeh
"So not use a self-compiled hive or Spark version, but only the ones
supplied by distributions (cloudera, Hortonworks, Bigtop...) You will face
performance problems, strange errors etc when building and testing your
code using self-compiled versions."

This comment does not make sense and is meaningless without any evidence.
Either you provide evidence that you have done this work and you
encountered errors or better not mention it. Sounds like scaremongering.








Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 19 July 2016 at 06:51, Jörn Franke  wrote:

> So not use a self-compiled hive or Spark version, but only the ones
> supplied by distributions (cloudera, Hortonworks, Bigtop...) You will face
> performance problems, strange errors etc when building and testing your
> code using self-compiled versions.
>
> If you use the Hive APIs then the engine should not be relevant for your
> storage handler. Nevertheless, the APIs of the storage handler might have
> changed.
>
> However, I wonder why a 1-1 mapping does not work for you.
>
> On 18 Jul 2016, at 22:46, Mich Talebzadeh 
> wrote:
>
> Hi,
>
> You can move up to Hive 2 that works fine and pretty stable. You can opt
> for Hive 1.2.1 if yoy wish.
>
> If you want to use Spark (the replacement for Shark) as the execution
> engine for Hive then the version that works (that I have managed to make it
> work with Hive is Spark 1.3.1) that you will need to build from source.
>
> It works and it is table.
>
> Otherwise you may decide to use Spark Thrift Server (STS) that allows JDBC
> access to Spark SQL (through beeline, Squirrel , Zeppelin) that has Hive
> SQL context built into it as if you were using Hive Thrift Server (HSS)
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 18 July 2016 at 21:38, Lavelle, Shawn  wrote:
>
>> Hello,
>>
>>
>>
>> I am working with an external storage handler written for Hive 0.11
>> and run on a Shark execution engine.  I’d like to move forward and upgrade
>> to hive 1.2.1 on spark 1.6 or even 2.0.
>>
>>This storage has a need to run queries across tables existing in
>> different databases in the external data store, so existing drivers that
>> map hive to external storage in 1 to 1 mappings are insufficient. I have
>> attempted this upgrade already, but found out that predicate pushdown was
>> not occurring.  Was this changed in 1.2?
>>
>>Can I update and use the same storage handler in Hive or has this
>> concept been replaced by the RDDs and DataFrame API?
>>
>>
>>Are these questions better for the Spark list?
>>
>>
>>
>>Thank you,
>>
>>
>>
>> ~ Shawn M Lavelle
>>
>>
>>
>>
>> 
>>
>> Shawn Lavelle
>> Software Development
>>
>> 4101 Arrowhead Drive
>> Medina, Minnesota 55340-9457
>> Phone: 763 551 0559
>> Fax: 763 551 0750
>> *Email:* shawn.lave...@osii.com
>> *Website: **www.osii.com* 
>>
>
>


Re: [ANNOUNCE] New PMC Member : Jesus

2016-07-18 Thread Vaibhav Gumashta
Congrats Jesüs!

--Vaibhav

From: Vineet Garg 
Sent: Monday, July 18, 2016 6:51 PM
To: d...@hive.apache.org; user@hive.apache.org
Subject: Re: [ANNOUNCE] New PMC Member : Jesus

Congrats Jesus !




On 7/18/16, 10:27 AM, "Jesus Camacho Rodriguez" 
 wrote:

>Thanks everybody! Looking forward to continue contributing to the project!
>
>--
>Jesús
>
>
>
>
>On 7/18/16, 6:21 PM, "Prasanth Jayachandran"  
>wrote:
>
>>Congratulations Jesus!
>>
>>> On Jul 18, 2016, at 10:10 AM, Jimmy Xiang  wrote:
>>>
>>> Congrats!!
>>>
>>> On Mon, Jul 18, 2016 at 9:54 AM, Vihang Karajgaonkar
>>>  wrote:
 Congratulations Jesus!

> On Jul 18, 2016, at 8:30 AM, Sergio Pena  wrote:
>
> Congrats Jesus !!!
>
> On Mon, Jul 18, 2016 at 7:28 AM, Peter Vary  wrote:
>
>> Congratulations Jesus!
>>
>>> On Jul 18, 2016, at 6:55 AM, Wei Zheng  wrote:
>>>
>>> Congrats Jesus!
>>>
>>> Thanks,
>>>
>>> Wei
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 7/17/16, 14:29, "Sushanth Sowmyan"  wrote:
>>>
 Good to have you onboard, Jesus! :)

 On Jul 17, 2016 12:00, "Lefty Leverenz" 
>> wrote:

> Congratulations Jesus!
>
> -- Lefty
>
> On Sun, Jul 17, 2016 at 1:01 PM, Ashutosh Chauhan <
>> hashut...@apache.org>
> wrote:
>
>> Hello Hive community,
>>
>> I'm pleased to announce that Jesus Camacho Rodriguez has accepted the
>> Apache Hive PMC's
>> invitation, and is now our newest PMC member. Many thanks to Jesus 
>> for
>> all of
>> his hard work.
>>
>> Please join me congratulating Jesus!
>>
>> Best,
>> Ashutosh
>> (On behalf of the Apache Hive PMC)
>>
>
>
>>
>>

>>>
>>
>>


Re: Hive External Storage Handlers

2016-07-18 Thread Jörn Franke
So not use a self-compiled hive or Spark version, but only the ones supplied by 
distributions (cloudera, Hortonworks, Bigtop...) You will face performance 
problems, strange errors etc when building and testing your code using 
self-compiled versions.

If you use the Hive APIs then the engine should not be relevant for your 
storage handler. Nevertheless, the APIs of the storage handler might have 
changed. 

However, I wonder why a 1-1 mapping does not work for you.

> On 18 Jul 2016, at 22:46, Mich Talebzadeh  wrote:
> 
> Hi,
> 
> You can move up to Hive 2 that works fine and pretty stable. You can opt for 
> Hive 1.2.1 if yoy wish.
> 
> If you want to use Spark (the replacement for Shark) as the execution engine 
> for Hive then the version that works (that I have managed to make it work 
> with Hive is Spark 1.3.1) that you will need to build from source.
> 
> It works and it is table.
> 
> Otherwise you may decide to use Spark Thrift Server (STS) that allows JDBC 
> access to Spark SQL (through beeline, Squirrel , Zeppelin) that has Hive SQL 
> context built into it as if you were using Hive Thrift Server (HSS)
> 
> HTH
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
>> On 18 July 2016 at 21:38, Lavelle, Shawn  wrote:
>> Hello,
>> 
>>  
>> 
>> I am working with an external storage handler written for Hive 0.11 and 
>> run on a Shark execution engine.  I’d like to move forward and upgrade to 
>> hive 1.2.1 on spark 1.6 or even 2.0.  
>> 
>>This storage has a need to run queries across tables existing in 
>> different databases in the external data store, so existing drivers that map 
>> hive to external storage in 1 to 1 mappings are insufficient. I have 
>> attempted this upgrade already, but found out that predicate pushdown was 
>> not occurring.  Was this changed in 1.2?
>> 
>>Can I update and use the same storage handler in Hive or has this concept 
>> been replaced by the RDDs and DataFrame API?  
>>
>> 
>>Are these questions better for the Spark list?
>> 
>>  
>> 
>>Thank you,
>> 
>>  
>> 
>> ~ Shawn M Lavelle
>> 
>>  
>> 
>> 
>> 
>> 
>> 
>> Shawn Lavelle
>> Software Development
>> 
>> 4101 Arrowhead Drive
>> Medina, Minnesota 55340-9457
>> Phone: 763 551 0559
>> Fax: 763 551 0750
>> Email: shawn.lave...@osii.com
>> Website: www.osii.com
> 


Re: Hive on TEZ + LLAP

2016-07-18 Thread Gopal Vijayaraghavan

> These looks pretty impressive. What execution mode were you running
>these? Yarn client may be?

There is no other mode - everything runs on YARN.

> 53 times


The factor is actually bigger in actual execution.

The MRv2 version takes 2.47s to prep a query, while the LLAP version takes
1.64s.

The MRv2 version takes 200.319s to execute the query, while the LLAP
version takes 1.02s.

The execution factor is nearly ~200x, but the compile becomes significant
as you scale down the latencies.

> My calculations on Hive 2 on Spark 1.3.1

Not sure where Hive2-on-Spark is going - the last commit to SparkCompiler
was late last year, before there was a Hive2.

On the speed front, I'm pretty sure you have got most of the Hive2
optimizations disabled, even the most basic of the Stinger optimizations
might be missing for you.

Check if you have

set hive.vectorized.execution.enabled=true;


Some of these new optimizations don't work on H-o-S, because Hive-on-Spark
does not implement a true broadcast join - instead it uses a
SparkHashTableSinkOperatorwhich actually writes to HDFS instead of sending
it directy to the downstream task.


I don't understand why that is the case instead of RDD brodcast, but that
prevents the JOIN optimizations which convert the 34 sec query into a 3.8
sec query from applying to Spark execution.

A couple of examples would be

set hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled=true;
set hive.vectorized.execution.mapjoin.minmax.enabled=true;

Those two make easy work of joins in LLAP, particularly semi-joins which
are common in BI queries.


Once LLAP is out of tech preview, we can enable most of them by default
for Tez+LLAP, but that would not mean all of it applies to
Hive-on-(Spark/MR).

Getting these new features onto another engine takes active effort from
the engine's devs.

Cheers,
Gopal












Re: Hive on TEZ + LLAP

2016-07-18 Thread Mich Talebzadeh
These looks pretty impressive. What execution mode were you running these?
Yarn client may be?

 *QueryMR/sec
TEZ/sec TEZ+LLAP/sec*
  203.317   13.681
3.809
*Order of Magnitude*---   15
times53 times
  *faster*


My calculations on Hive 2 on Spark 1.3.1 (obviously we are comparing
different bases but it is interesting as a sample) reflects the following:

Table MR/sec Spark/sec  Order of Magnitude
faster
Parquet   239.53214.38   16 times
ORC   202.33317.77   11 times

So the hybrid engine seems to make much difference which if I just consider
Tez only and Tez + LLAP the gain is more than 3 times

Cheers,


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 18 July 2016 at 23:53, Gopal Vijayaraghavan  wrote:

>
> > Also has there been simple benchmarks to compare:
> >
> > 1. Hive on MR
> > 2. Hine on Tez
> > 3. Hive on Tez with LLAP
>
> I ran one today, with a small BI query in my test suite against a 1Tb
> data-set.
>
> TL;DR - MRv2 (203.317 seconds), Tez (13.681s), LLAP (3.809s).
>
> *Warning*: This is not a historical view, all engines are using the same
> new & improved vectorized operators from 2.2.0-SNAPSHOT, only the physical
> planner and the physical scheduling is different between runs.
>
> The difference between pre-Stinger, Stinger and Stinger.next is much much
> larger than this.
>
> <
> https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-t
> pcds/query55.sql>
>
>
> select  i_brand_id brand_id, i_brand brand,
> sum(ss_ext_sales_price) ext_price
>  from date_dim, store_sales, item
>  where date_dim.d_date_sk = store_sales.ss_sold_date_sk
> and store_sales.ss_item_sk = item.i_item_sk
> and i_manager_id=36
> and d_moy=12
> and d_year=2001
>  group by i_brand, i_brand_id
>  order by ext_price desc, i_brand_id
> limit 100 ;
>
>
> =MRv2==
>
>
> set hive.execution.engine=mr;
>
> ...
> 2016-07-18 22:22:57 Uploaded 1 File to:
> file:/tmp/gopal/b58a60d6-ff05-47bc-ad02-428aaa15779d/hive_2016-07-18_22-22-
> 43_389_3112118969207749230-1/-local-10007/HashTable-Stage-3/MapJoin-mapfile
> 131--.hashtable (914 bytes)
>
> 2016-07-18 22:22:57 End of local task; Time Taken: 2.47 sec.
> ...
> Time taken: 203.317 seconds, Fetched: 100 row(s)
>
> =Tez===
>
>
>
> set hive.execution.engine=tez;
> set hive.llap.execution.mode=none;
>
> Time taken: 13.681 seconds, Fetched: 100 row(s)
>
> =LLAP==
>
>
> set hive.llap.execution.mode=all;
>
>
>
> Task Execution Summary
> ---
> ---
>   VERTICES   DURATION(ms)  CPU_TIME(ms)  GC_TIME(ms)  INPUT_RECORDS
> OUTPUT_RECORDS
> ---
> ---
>  Map 11016.00 00 93,123,704
>9,048
>  Map 4   0.00 00 10,000
>   31
>  Map 5   0.00 00296,344
>2,675
>  Reducer 2 207.00 00  9,048
>  100
>  Reducer 3   0.00 00100
>0
> ---
> ---
>
>
> Query Execution Summary
> ---
> ---
> OPERATIONDURATION
> ---
> ---
> Compile Query   1.64s
> Prepare Plan0.32s
> Submit Plan 0.57s
> Start DAG   0.21s
> Run DAG 1.02s
> ---
> ---
>
>
> Time taken: 3.809 seconds, Fetched: 100 row(s)
>
>
> Annoyingly now, the 1.64s to compile the query is a huge fraction, since
> it only takes 1.02s to execute the join+aggregate over 93 million rows.
>
> Hopefully in a couple of weeks, we'll cut that 1.64s into nearly nothing
> once we me

Re: ORC does not support type conversion from INT to STRING.

2016-07-18 Thread Mich Talebzadeh
Hi Mathew,

In layman's term if I create the source ORC table column as INT and then
create a target ORC table but that column has now been defined as STRING
and do an INSERT/SELECT from source table how data is internally stored?

Is it implicitly converted into new format using CAST function or it is
stored as is and just masked?

The version of Hive I am using is 2 and it works OK for primitive data
types (insert/select from INT to String)

However, I believe Mahender is referring to Complex types?

Thanks



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 18 July 2016 at 22:31, Matthew McCline  wrote:

>
> Hi Mahender,
>
>
> Schema Evolution is available on the latest recent version of Hive.
>
>
> For example, if you set
> hive.metastore.disallow.incompatible.col.type.changes=false;​ on master
> (i.e. hive2) it will support INT to STRING conversion.
>
>
> If you need to remain on an older version, then you are out of luck.
>
>
> Thanks,
>
> Matt
>
>
> --
> *From:* Mahender Sarangam 
> *Sent:* Monday, July 18, 2016 1:59 PM
> *To:* user@hive.apache.org
> *Subject:* Re: ORC does not support type conversion from INT to STRING.
>
>
> Hi Mich,
>
> Sorry for delay in responding. here is the scenario,
>
> We have created new cluster  and we have moved all ORC File data into new
> cluster. We have re-created table pointing to ORC location. We have
> modified data type of ORC table from *INT *to *String.* From then onward,
> we were unable to fire select statement against this ORC table, hive keep
> throwing exception, "Orc table select. Unable to convert Int to String".
> Looks like it is bug in ORC table only. Where in we modify the datatype
> from *int to string,* is causing problem with ORC reading/select
> statement, it throws exceptio. Please let me know if there are any
> workaround for this scenario. Is this behavior expected previously also.
>
>
> */Mahender*
>
>
>
>
>
>
> On 6/14/2016 11:47 AM, Mich Talebzadeh wrote:
>
> you must excuse my ignorance
>
> can you please elaborate on this as there seems something has gone wrong
> somewhere?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 14 June 2016 at 19:42, Mahender Sarangam 
> wrote:
>
>> Yes Mich. We have restored cluster from metastore.
>>
>> On 6/14/2016 11:35 AM, Mich Talebzadeh wrote:
>>
>> Hi Mahendar,
>>
>>
>> Did you load the meta-data DB/schema from backup and now seeing this error
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 14 June 2016 at 19:04, Mahender Sarangam > > wrote:
>>
>>> ping.
>>>
>>> On 6/13/2016 1:19 PM, Mahender Sarangam wrote:
>>>
>>> Hi,
>>>
>>> We are facing issue while reading data from ORC table. We have created
>>> ORC table and dumped data into it. We have deleted cluster due to some
>>> reason. When we recreated cluster (using Metastore) and table pointing to
>>> same location. When we perform reading from ORC table. We see below error.
>>>
>>> SELECT col2, Col1,
>>>   reflect("java.util.UUID", "randomUUID") AS ID,
>>>   Source,
>>>  1 ,
>>> SDate,
>>> EDate
>>> FROM Table ORC  JOIN Table2 _surr;
>>>
>>> ERROR : Vertex failed, vertexName=Map 1,
>>> vertexId=vertex_1465411930667_0212_1_01, diagnostics=[Task failed,
>>> taskId=task_1465411930667_0212_1_01_00, diagnostics=[TaskAttempt 0
>>> failed, info=[Error: Failure while running task:java.lang.RuntimeException:
>>> java.lang.RuntimeException: java.io.IOException: java.io.IOException: ORC
>>> does not support type conversion from INT to STRING.
>>>
>>>
>>> I think issue is reflect("java.util.UUID", "randomUUID") AS ID
>>>
>>>
>>> I know there is Bug raised while reading data from ORC table. Is there
>>> any workaround apart from reloading data.
>>>
>>> -MS
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>


Re: Hive on TEZ + LLAP

2016-07-18 Thread Gopal Vijayaraghavan

> Also has there been simple benchmarks to compare:
> 
> 1. Hive on MR
> 2. Hine on Tez
> 3. Hive on Tez with LLAP

I ran one today, with a small BI query in my test suite against a 1Tb
data-set.

TL;DR - MRv2 (203.317 seconds), Tez (13.681s), LLAP (3.809s).

*Warning*: This is not a historical view, all engines are using the same
new & improved vectorized operators from 2.2.0-SNAPSHOT, only the physical
planner and the physical scheduling is different between runs.

The difference between pre-Stinger, Stinger and Stinger.next is much much
larger than this.




select  i_brand_id brand_id, i_brand brand,
sum(ss_ext_sales_price) ext_price
 from date_dim, store_sales, item
 where date_dim.d_date_sk = store_sales.ss_sold_date_sk
and store_sales.ss_item_sk = item.i_item_sk
and i_manager_id=36
and d_moy=12
and d_year=2001
 group by i_brand, i_brand_id
 order by ext_price desc, i_brand_id
limit 100 ;


=MRv2==


set hive.execution.engine=mr;

...
2016-07-18 22:22:57 Uploaded 1 File to:
file:/tmp/gopal/b58a60d6-ff05-47bc-ad02-428aaa15779d/hive_2016-07-18_22-22-
43_389_3112118969207749230-1/-local-10007/HashTable-Stage-3/MapJoin-mapfile
131--.hashtable (914 bytes)

2016-07-18 22:22:57 End of local task; Time Taken: 2.47 sec.
...
Time taken: 203.317 seconds, Fetched: 100 row(s)

=Tez===



set hive.execution.engine=tez;
set hive.llap.execution.mode=none;

Time taken: 13.681 seconds, Fetched: 100 row(s)

=LLAP==


set hive.llap.execution.mode=all;



Task Execution Summary
---
---
  VERTICES   DURATION(ms)  CPU_TIME(ms)  GC_TIME(ms)  INPUT_RECORDS
OUTPUT_RECORDS
---
---
 Map 11016.00 00 93,123,704
   9,048
 Map 4   0.00 00 10,000
  31
 Map 5   0.00 00296,344
   2,675
 Reducer 2 207.00 00  9,048
 100
 Reducer 3   0.00 00100
   0
---
---


Query Execution Summary
---
---
OPERATIONDURATION
---
---
Compile Query   1.64s
Prepare Plan0.32s
Submit Plan 0.57s
Start DAG   0.21s
Run DAG 1.02s
---
---


Time taken: 3.809 seconds, Fetched: 100 row(s)


Annoyingly now, the 1.64s to compile the query is a huge fraction, since
it only takes 1.02s to execute the join+aggregate over 93 million rows.

Hopefully in a couple of weeks, we'll cut that 1.64s into nearly nothing
once we merge HIVE-13995 into master.


More about the historical view, the new Vectorization codepaths are a big
part of this speed up, when you compare historically or against an
incompletely vectorized format like Parquet (HIVE-8128 looks abandoned).

set hive.vectorized.execution.mapjoin.native.enabled=false;


Time taken: 34.372 seconds, Fetched: 100 row(s)
hive>


Cheers,
Gopal











Re: ORC does not support type conversion from INT to STRING.

2016-07-18 Thread Matthew McCline

Hi Mahender,


Schema Evolution is available on the latest recent version of Hive.


For example, if you set 
hive.metastore.disallow.incompatible.col.type.changes=false;? on master (i.e. 
hive2) it will support INT to STRING conversion.


If you need to remain on an older version, then you are out of luck.


Thanks,

Matt



From: Mahender Sarangam 
Sent: Monday, July 18, 2016 1:59 PM
To: user@hive.apache.org
Subject: Re: ORC does not support type conversion from INT to STRING.


Hi Mich,

Sorry for delay in responding. here is the scenario,

We have created new cluster  and we have moved all ORC File data into new 
cluster. We have re-created table pointing to ORC location. We have modified 
data type of ORC table from INT to String. From then onward, we were unable to 
fire select statement against this ORC table, hive keep throwing exception, 
"Orc table select. Unable to convert Int to String". Looks like it is bug in 
ORC table only. Where in we modify the datatype from int to string, is causing 
problem with ORC reading/select statement, it throws exceptio. Please let me 
know if there are any workaround for this scenario. Is this behavior expected 
previously also.


/Mahender





On 6/14/2016 11:47 AM, Mich Talebzadeh wrote:
you must excuse my ignorance

can you please elaborate on this as there seems something has gone wrong 
somewhere?


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



On 14 June 2016 at 19:42, Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:

Yes Mich. We have restored cluster from metastore.

On 6/14/2016 11:35 AM, Mich Talebzadeh wrote:
Hi Mahendar,


Did you load the meta-data DB/schema from backup and now seeing this error




Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



On 14 June 2016 at 19:04, Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:

ping.

On 6/13/2016 1:19 PM, Mahender Sarangam wrote:

Hi,

We are facing issue while reading data from ORC table. We have created ORC 
table and dumped data into it. We have deleted cluster due to some reason. When 
we recreated cluster (using Metastore) and table pointing to same location. 
When we perform reading from ORC table. We see below error.

SELECT col2, Col1,
  reflect("java.util.UUID", "randomUUID") AS ID,
  Source,
 1 ,
SDate,
EDate
FROM Table ORC  JOIN Table2 _surr;

ERROR : Vertex failed, vertexName=Map 1, 
vertexId=vertex_1465411930667_0212_1_01, diagnostics=[Task failed, 
taskId=task_1465411930667_0212_1_01_00, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Failure while running task:java.lang.RuntimeException: 
java.lang.RuntimeException: java.io.IOException: java.io.IOException: ORC does 
not support type conversion from INT to STRING.


I think issue is reflect("java.util.UUID", "randomUUID") AS ID

I know there is Bug raised while reading data from ORC table. Is there any 
workaround apart from reloading data.

-MS









Re: ORC does not support type conversion from INT to STRING.

2016-07-18 Thread Mahender Sarangam
Hi Mich,

Sorry for delay in responding. here is the scenario,

We have created new cluster  and we have moved all ORC File data into new 
cluster. We have re-created table pointing to ORC location. We have modified 
data type of ORC table from INT to String. From then onward, we were unable to 
fire select statement against this ORC table, hive keep throwing exception, 
"Orc table select. Unable to convert Int to String". Looks like it is bug in 
ORC table only. Where in we modify the datatype from int to string, is causing 
problem with ORC reading/select statement, it throws exceptio. Please let me 
know if there are any workaround for this scenario. Is this behavior expected 
previously also.


/Mahender





On 6/14/2016 11:47 AM, Mich Talebzadeh wrote:
you must excuse my ignorance

can you please elaborate on this as there seems something has gone wrong 
somewhere?


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



On 14 June 2016 at 19:42, Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:

Yes Mich. We have restored cluster from metastore.

On 6/14/2016 11:35 AM, Mich Talebzadeh wrote:
Hi Mahendar,


Did you load the meta-data DB/schema from backup and now seeing this error




Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



On 14 June 2016 at 19:04, Mahender Sarangam 
mailto:mahender.bigd...@outlook.com>> wrote:

ping.

On 6/13/2016 1:19 PM, Mahender Sarangam wrote:

Hi,

We are facing issue while reading data from ORC table. We have created ORC 
table and dumped data into it. We have deleted cluster due to some reason. When 
we recreated cluster (using Metastore) and table pointing to same location. 
When we perform reading from ORC table. We see below error.

SELECT col2, Col1,
  reflect("java.util.UUID", "randomUUID") AS ID,
  Source,
 1 ,
SDate,
EDate
FROM Table ORC  JOIN Table2 _surr;

ERROR : Vertex failed, vertexName=Map 1, 
vertexId=vertex_1465411930667_0212_1_01, diagnostics=[Task failed, 
taskId=task_1465411930667_0212_1_01_00, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Failure while running task:java.lang.RuntimeException: 
java.lang.RuntimeException: java.io.IOException: java.io.IOException: ORC does 
not support type conversion from INT to STRING.


I think issue is reflect("java.util.UUID", "randomUUID") AS ID

I know there is Bug raised while reading data from ORC table. Is there any 
workaround apart from reloading data.

-MS









Re: Hive External Storage Handlers

2016-07-18 Thread Mich Talebzadeh
Hi,

You can move up to Hive 2 that works fine and pretty stable. You can opt
for Hive 1.2.1 if yoy wish.

If you want to use Spark (the replacement for Shark) as the execution
engine for Hive then the version that works (that I have managed to make it
work with Hive is Spark 1.3.1) that you will need to build from source.

It works and it is table.

Otherwise you may decide to use Spark Thrift Server (STS) that allows JDBC
access to Spark SQL (through beeline, Squirrel , Zeppelin) that has Hive
SQL context built into it as if you were using Hive Thrift Server (HSS)

HTH


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 18 July 2016 at 21:38, Lavelle, Shawn  wrote:

> Hello,
>
>
>
> I am working with an external storage handler written for Hive 0.11
> and run on a Shark execution engine.  I’d like to move forward and upgrade
> to hive 1.2.1 on spark 1.6 or even 2.0.
>
>This storage has a need to run queries across tables existing in
> different databases in the external data store, so existing drivers that
> map hive to external storage in 1 to 1 mappings are insufficient. I have
> attempted this upgrade already, but found out that predicate pushdown was
> not occurring.  Was this changed in 1.2?
>
>Can I update and use the same storage handler in Hive or has this
> concept been replaced by the RDDs and DataFrame API?
>
>
>Are these questions better for the Spark list?
>
>
>
>Thank you,
>
>
>
> ~ Shawn M Lavelle
>
>
>
>
>
>
> Shawn Lavelle
> Software Development
>
> 4101 Arrowhead Drive
> Medina, Minnesota 55340-9457
> Phone: 763 551 0559
> Fax: 763 551 0750
> *Email:* shawn.lave...@osii.com
> *Website: **www.osii.com* 
>


Hive External Storage Handlers

2016-07-18 Thread Lavelle, Shawn
Hello,

I am working with an external storage handler written for Hive 0.11 and run 
on a Shark execution engine.  I’d like to move forward and upgrade to hive 
1.2.1 on spark 1.6 or even 2.0.

   This storage has a need to run queries across tables existing in different 
databases in the external data store, so existing drivers that map hive to 
external storage in 1 to 1 mappings are insufficient. I have attempted this 
upgrade already, but found out that predicate pushdown was not occurring.  Was 
this changed in 1.2?

   Can I update and use the same storage handler in Hive or has this concept 
been replaced by the RDDs and DataFrame API?

   Are these questions better for the Spark list?

   Thank you,

~ Shawn M Lavelle



[cid:image2a7f96.GIF@6678ebd7.468dcc41]

Shawn Lavelle
Software Development

4101 Arrowhead Drive
Medina, Minnesota 55340-9457
Phone: 763 551 0559
Fax: 763 551 0750
Email: shawn.lave...@osii.com
Website: www.osii.com



Re: Yarn Application ID for Hive query

2016-07-18 Thread Gopal Vijayaraghavan

> be nice to have access to a command or API call in HiveServer2 similar
>to MySQL¹s ³SHOW PROCESSLIST² (and equivalent commands in most other
>databases).
 

There is one - if you have the HiveServer2 UI (in 2.0), that can be seen.

It would take 10-15 line JSP script to export that as a JSON API.

The reason that's not very interesting is that single machine information
like the MYSQL one is useless in a properly configured HA environment for
Hive.

Cheers,
Gopal




RE: Yarn Application ID for Hive query

2016-07-18 Thread Amit Bajpai
I am running hive on Tez. I am able to get the Yarn application ID for the hive 
query by submitting the query through Hive JDBC and using HiveStatement.

Connection con = 
DriverManager.getConnection("jdbc:hive2://abc:1/default","xyz", "");
HiveStatement stmt = (HiveStatement) con.createStatement();
String sql = " SELECT COMP_ID, COUNT(1) FROM tableA GROUP BY COMP_ID ";
ResultSet res = stmt.executeQuery(sql);
String yarn_app_id = new String();

for (String log : stmt.getQueryLog()) {
if (log.contains("App id")){
yarn_app_id = log.substring(log.indexOf("App id") +7, 
log.length()-1);
}
}

System.out.println("YARN Application ID: " + yarn_app_id);

Now I am trying to find the Tez DAG ID for the query.


From: Gerber, Bryan W [mailto:bryan.ger...@pnnl.gov]
Sent: Monday, July 18, 2016 1:47 PM
To: user@hive.apache.org
Subject: RE: Yarn Application ID for Hive query

Making Hive look like a normal SQL database is the goal of libraries like this, 
so it make sense that that abstraction wouldn't leak a concept like application 
ID. Especially because not all Hive queries generate a YARN application.

That said, we went through this with JDBC access to Hive a while back to allow 
our user interface to cancel a query. Only relevant discussion I found was 
here: 
http://grokbase.com/t/cloudera/hue-user/1373c258xg/how-hue-beeswax-is-able-to-read-the-hadoop-job-id-that-gets-generated-by-hiveserver2

We are using this method, plus a background task that polls the YARN resource 
manager API to find the job with the corresponding hive.session.id. It is a lot 
of work for something that seems very simple. It would be nice to have access 
to a command or API call in HiveServer2 similar to MySQL's "SHOW PROCESSLIST" 
(and equivalent commands in most other databases).

From: Amit Bajpai [mailto:amit.baj...@flextronics.com]
Sent: Thursday, July 14, 2016 10:22 PM
To: user@hive.apache.org
Subject: Yarn Application ID for Hive query

Hi,

I am using the below python program to run a hive query. How can I get the Yarn 
application ID using the python program for the hive query execution.

import pyhs2

with pyhs2.connect(host='abc.sac.com',
   port=1,
   authMechanism="PLAIN",
   user='amit',
   password='amit',
   database='default') as conn:
with conn.cursor() as cur:
#Execute query
cur.execute("SELECT COMP_ID, COUNT(1) FROM tableA GROUP BY COMP_ID")

#Fetch table results
for i in cur.fetch():
print i

Thanks
Amit


Legal Disclaimer:
The information contained in this message may be privileged and confidential. 
It is intended to be read only by the individual or entity to whom it is 
addressed or by their designee. If the reader of this message is not the 
intended recipient, you are on notice that any distribution of this message, in 
any form, is strictly prohibited. If you have received this message in error, 
please immediately notify the sender and delete or destroy any copy of this 
message!

Legal Disclaimer:
The information contained in this message may be privileged and confidential. 
It is intended to be read only by the individual or entity to whom it is 
addressed or by their designee. If the reader of this message is not the 
intended recipient, you are on notice that any distribution of this message, in 
any form, is strictly prohibited. If you have received this message in error, 
please immediately notify the sender and delete or destroy any copy of this 
message!

RE: Yarn Application ID for Hive query

2016-07-18 Thread Gerber, Bryan W
Making Hive look like a normal SQL database is the goal of libraries like this, 
so it make sense that that abstraction wouldn't leak a concept like application 
ID. Especially because not all Hive queries generate a YARN application.

That said, we went through this with JDBC access to Hive a while back to allow 
our user interface to cancel a query. Only relevant discussion I found was 
here: 
http://grokbase.com/t/cloudera/hue-user/1373c258xg/how-hue-beeswax-is-able-to-read-the-hadoop-job-id-that-gets-generated-by-hiveserver2

We are using this method, plus a background task that polls the YARN resource 
manager API to find the job with the corresponding hive.session.id. It is a lot 
of work for something that seems very simple. It would be nice to have access 
to a command or API call in HiveServer2 similar to MySQL's "SHOW PROCESSLIST" 
(and equivalent commands in most other databases).

From: Amit Bajpai [mailto:amit.baj...@flextronics.com]
Sent: Thursday, July 14, 2016 10:22 PM
To: user@hive.apache.org
Subject: Yarn Application ID for Hive query

Hi,

I am using the below python program to run a hive query. How can I get the Yarn 
application ID using the python program for the hive query execution.

import pyhs2

with pyhs2.connect(host='abc.sac.com',
   port=1,
   authMechanism="PLAIN",
   user='amit',
   password='amit',
   database='default') as conn:
with conn.cursor() as cur:
#Execute query
cur.execute("SELECT COMP_ID, COUNT(1) FROM tableA GROUP BY COMP_ID")

#Fetch table results
for i in cur.fetch():
print i

Thanks
Amit


Legal Disclaimer:
The information contained in this message may be privileged and confidential. 
It is intended to be read only by the individual or entity to whom it is 
addressed or by their designee. If the reader of this message is not the 
intended recipient, you are on notice that any distribution of this message, in 
any form, is strictly prohibited. If you have received this message in error, 
please immediately notify the sender and delete or destroy any copy of this 
message!


Re: [ANNOUNCE] New PMC Member : Jesus

2016-07-18 Thread Pengcheng Xiong
Jesus, thanks for the tremendous contributions to the project. Congrats for the
well deserved PMC membership! :)

On Mon, Jul 18, 2016 at 11:16 AM, Hari Sivarama Subramaniyan <
hsubramani...@hortonworks.com> wrote:

> Hi Jesus
> Congrats for the well deserved achievement.
>
> Regards
> Hari
> 
> From: Jesus Camacho Rodriguez 
> Sent: Monday, July 18, 2016 10:27 AM
> To: user@hive.apache.org
> Cc: d...@hive.apache.org
> Subject: Re: [ANNOUNCE] New PMC Member : Jesus
>
> Thanks everybody! Looking forward to continue contributing to the project!
>
> --
> Jesús
>
>
>
>
> On 7/18/16, 6:21 PM, "Prasanth Jayachandran" <
> pjayachand...@hortonworks.com> wrote:
>
> >Congratulations Jesus!
> >
> >> On Jul 18, 2016, at 10:10 AM, Jimmy Xiang  wrote:
> >>
> >> Congrats!!
> >>
> >> On Mon, Jul 18, 2016 at 9:54 AM, Vihang Karajgaonkar
> >>  wrote:
> >>> Congratulations Jesus!
> >>>
>  On Jul 18, 2016, at 8:30 AM, Sergio Pena 
> wrote:
> 
>  Congrats Jesus !!!
> 
>  On Mon, Jul 18, 2016 at 7:28 AM, Peter Vary 
> wrote:
> 
> > Congratulations Jesus!
> >
> >> On Jul 18, 2016, at 6:55 AM, Wei Zheng 
> wrote:
> >>
> >> Congrats Jesus!
> >>
> >> Thanks,
> >>
> >> Wei
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 7/17/16, 14:29, "Sushanth Sowmyan"  wrote:
> >>
> >>> Good to have you onboard, Jesus! :)
> >>>
> >>> On Jul 17, 2016 12:00, "Lefty Leverenz" 
> > wrote:
> >>>
>  Congratulations Jesus!
> 
>  -- Lefty
> 
>  On Sun, Jul 17, 2016 at 1:01 PM, Ashutosh Chauhan <
> > hashut...@apache.org>
>  wrote:
> 
> > Hello Hive community,
> >
> > I'm pleased to announce that Jesus Camacho Rodriguez has
> accepted the
> > Apache Hive PMC's
> > invitation, and is now our newest PMC member. Many thanks to
> Jesus for
> > all of
> > his hard work.
> >
> > Please join me congratulating Jesus!
> >
> > Best,
> > Ashutosh
> > (On behalf of the Apache Hive PMC)
> >
> 
> 
> >
> >
> >>>
> >>
> >
> >
>


Re: [ANNOUNCE] New PMC Member : Pengcheng

2016-07-18 Thread Pengcheng Xiong
Thanks for everyone. I feel really excited. Wish to contribute more and
more to the community. Thanks again.

On Mon, Jul 18, 2016 at 11:17 AM, Hari Sivarama Subramaniyan <
hsubramani...@hortonworks.com> wrote:

> Hi Pengcheng
> Congrats for the well deserved achievement!
>
> Regards
> Hari
> 
> From: Jesus Camacho Rodriguez 
> Sent: Monday, July 18, 2016 10:26 AM
> To: user@hive.apache.org
> Cc: d...@hive.apache.org
> Subject: Re: [ANNOUNCE] New PMC Member : Pengcheng
>
> Congrats Pengcheng, well deserved! :)
>
>
>
> On 7/18/16, 6:25 PM, "Vaibhav Gumashta"  wrote:
>
> >Congrats Pengcheng!
> >
> >From: Prasanth Jayachandran 
> >Sent: Monday, July 18, 2016 10:21 AM
> >To: user@hive.apache.org
> >Cc: d...@hive.apache.org
> >Subject: Re: [ANNOUNCE] New PMC Member : Pengcheng
> >
> >Congratulations Pengcheng!
> >
> >> On Jul 18, 2016, at 10:10 AM, Jimmy Xiang  wrote:
> >>
> >> Congrats!!
> >>
> >> On Mon, Jul 18, 2016 at 9:55 AM, Vihang Karajgaonkar
> >>  wrote:
> >>> Congratulations!
> >>>
>  On Jul 18, 2016, at 5:28 AM, Peter Vary  wrote:
> 
>  Congratulations Pengcheng!
> 
> 
> > On Jul 18, 2016, at 6:55 AM, Wei Zheng 
> wrote:
> >
> > Congrats Pengcheng!
> >
> > Thanks,
> >
> > Wei
> >
> >
> >
> >
> >
> >
> > On 7/17/16, 16:01, "Xuefu Zhang"  wrote:
> >
> >> Congrats, PengCheng!
> >>
> >> On Sun, Jul 17, 2016 at 2:28 PM, Sushanth Sowmyan <
> khorg...@gmail.com>
> >> wrote:
> >>
> >>> Welcome aboard Pengcheng! :)
> >>>
> >>> On Jul 17, 2016 12:01, "Lefty Leverenz" 
> wrote:
> >>>
>  Congratulations Pengcheng!
> 
>  -- Lefty
> 
>  On Sun, Jul 17, 2016 at 1:03 PM, Ashutosh Chauhan <
> hashut...@apache.org>
>  wrote:
> 
> >>
> >> Hello Hive community,
> >>
> >> I'm pleased to announce that Pengcheng Xiong has accepted the
> Apache
> > Hive
> >> PMC's
> >> invitation, and is now our newest PMC member. Many thanks to
> Pengcheng
> > for
> >> all of his hard work.
> >>
> >> Please join me congratulating Pengcheng!
> >>
> >> Best,
> >> Ashutosh
> >> (On behalf of the Apache Hive PMC)
> >>
> >
> 
> 
> >>>
> 
> >>>
> >>
> >
> >
> >
>


Re: Want to be one contributor

2016-07-18 Thread Alan Gates
I believe the answer is yes, you need cygwin to develop Hive on Windows.  Many 
of the Hadoop family of projects run on Windows natively, but require Cygwin 
for development.

Alan.

> On Jul 16, 2016, at 18:15, Alpesh Patel  wrote:
> 
> ​I am facing below mentioned issue while running. Do we really need cygwin on 
> windows development machine if you have windows as dev machine ? 
> 
> Kindly advise something on this ? 
> 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-antrun-plugin:1.7:run 
> (generate-version-annotation) on project hive-common: An Ant BuildException 
> has occured: Execute failed: java.io.IOException: Cannot run program "bash" 
> (in directory "F:\workspace\hive\common"): CreateProcess error=2, The system 
> cannot find the file specified
> [ERROR] around Ant part .. @ 
> 4:46 in F:\workspace\hive\common\target\antrun\build-main.xml
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-common​
> 
> 
> Rgds,
> Alpesh
> 
> On Thu, Jul 14, 2016 at 5:11 PM, Alan Gates  wrote:
> https://cwiki.apache.org/confluence/display/Hive/Home#Home-ResourcesforContributors
>  is a good place to start.
> 
> Welcome to Hive.
> 
> Alan.
> 
> > On Jul 14, 2016, at 16:01, Alpesh Patel  wrote:
> >
> > Hi Guys,
> >
> > I am part of this group since 1 year. Just an audience and now want to be 
> > contributor in Hive code base.
> >
> > Can you please guide me like how can i be contributor ? Is there any wiki 
> > which i can read for this ?
> >
> > Rgds,
> > Alpesh
> >
> 
> 



Test failure

2016-07-18 Thread Zhu Li
Hi all,

I went through Hive developer guide, built hive code successfully and then
ran "mvn test". But I got the following error:

---

 T E S T S

---


---

 T E S T S

---

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
MaxPermSize=512m; support was removed in 8.0

Running org.apache.hadoop.hive.conf.TestHiveConf

Tests run: 5, Failures: 0, Errors: 5, Skipped: 0, Time elapsed: 0.379 sec
<<< FAILURE! - in org.apache.hadoop.hive.conf.TestHiveConf

testHiveSitePath(org.apache.hadoop.hive.conf.TestHiveConf)  Time elapsed:
0.29 sec  <<< ERROR!

java.lang.NoClassDefFoundError:
org/apache/hadoop/mapreduce/TaskAttemptContext

at java.net.URLClassLoader.findClass(URLClassLoader.java:381)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:264)

at org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:146)

at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:141)

at
org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)

at org.apache.hadoop.hive.conf.HiveConf$ConfVars.(HiveConf.java:372)

at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:109)

at
org.apache.hadoop.hive.conf.TestHiveConf.testHiveSitePath(TestHiveConf.java:41)


testUnitFor(org.apache.hadoop.hive.conf.TestHiveConf)  Time elapsed: 0 sec
<<< ERROR!

java.lang.NoClassDefFoundError: Could not initialize class
org.apache.hadoop.hive.conf.HiveConf

at
org.apache.hadoop.hive.conf.TestHiveConf.testUnitFor(TestHiveConf.java:104)


testHiddenConfig(org.apache.hadoop.hive.conf.TestHiveConf)  Time elapsed:
0.001 sec  <<< ERROR!

java.lang.NoClassDefFoundError: Could not initialize class
org.apache.hadoop.hive.conf.HiveConf

at
org.apache.hadoop.hive.conf.TestHiveConf.testHiddenConfig(TestHiveConf.java:124)


testColumnNameMapping(org.apache.hadoop.hive.conf.TestHiveConf)  Time
elapsed: 0.002 sec  <<< ERROR!

java.lang.NoClassDefFoundError: Could not initialize class
org.apache.hadoop.hive.conf.HiveConf

at
org.apache.hadoop.hive.conf.TestHiveConf.testColumnNameMapping(TestHiveConf.java:98)


testConfProperties(org.apache.hadoop.hive.conf.TestHiveConf)  Time elapsed:
0 sec  <<< ERROR!

java.lang.NoClassDefFoundError: Could not initialize class
org.apache.hadoop.hive.conf.HiveConf$ConfVars

at
org.apache.hadoop.hive.conf.TestHiveConf.testConfProperties(TestHiveConf.java:73)


Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
MaxPermSize=512m; support was removed in 8.0

Running org.apache.hadoop.hive.conf.TestHiveLogging

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.374 sec
<<< FAILURE! - in org.apache.hadoop.hive.conf.TestHiveLogging

testHiveLogging(org.apache.hadoop.hive.conf.TestHiveLogging)  Time elapsed:
0.311 sec  <<< ERROR!

java.lang.NoClassDefFoundError:
org/apache/hadoop/mapreduce/TaskAttemptContext

at java.net.URLClassLoader.findClass(URLClassLoader.java:381)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:264)

at org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:146)

at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:141)

at
org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)

at org.apache.hadoop.hive.conf.HiveConf$ConfVars.(HiveConf.java:372)

at
org.apache.hadoop.hive.conf.TestHiveLogging.configLog(TestHiveLogging.java:50)

at
org.apache.hadoop.hive.conf.TestHiveLogging.RunTest(TestHiveLogging.java:97)

at
org.apache.hadoop.hive.conf.TestHiveLogging.testHiveLogging(TestHiveLogging.java:109)


Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
MaxPermSize=512m; support was removed in 8.0

Running org.apache.hadoop.hive.conf.TestHiveConfRestrictList

Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 0.346 sec
<<< FAILURE! - in org.apache.hadoop.hive.conf.TestHiveConfRestrictList

testAppendRestriction(org.apache.hadoop.hive.conf.TestHiveConfRestrictList)
Time elapsed: 0.283 sec  <<< ERROR!

java.lang.NoClassDefFoundError:
org/apache/hadoop/mapreduce/TaskAttemptContext

at java.net.URLClassLoader.findClass(URLClassLoader.java:381)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:264)

at org.apache.hadoop.hive.shims.ShimLoader.cr

Re: [ANNOUNCE] New PMC Member : Pengcheng

2016-07-18 Thread Hari Sivarama Subramaniyan
Hi Pengcheng
Congrats for the well deserved achievement!

Regards
Hari

From: Jesus Camacho Rodriguez 
Sent: Monday, July 18, 2016 10:26 AM
To: user@hive.apache.org
Cc: d...@hive.apache.org
Subject: Re: [ANNOUNCE] New PMC Member : Pengcheng

Congrats Pengcheng, well deserved! :)



On 7/18/16, 6:25 PM, "Vaibhav Gumashta"  wrote:

>Congrats Pengcheng!
>
>From: Prasanth Jayachandran 
>Sent: Monday, July 18, 2016 10:21 AM
>To: user@hive.apache.org
>Cc: d...@hive.apache.org
>Subject: Re: [ANNOUNCE] New PMC Member : Pengcheng
>
>Congratulations Pengcheng!
>
>> On Jul 18, 2016, at 10:10 AM, Jimmy Xiang  wrote:
>>
>> Congrats!!
>>
>> On Mon, Jul 18, 2016 at 9:55 AM, Vihang Karajgaonkar
>>  wrote:
>>> Congratulations!
>>>
 On Jul 18, 2016, at 5:28 AM, Peter Vary  wrote:

 Congratulations Pengcheng!


> On Jul 18, 2016, at 6:55 AM, Wei Zheng  wrote:
>
> Congrats Pengcheng!
>
> Thanks,
>
> Wei
>
>
>
>
>
>
> On 7/17/16, 16:01, "Xuefu Zhang"  wrote:
>
>> Congrats, PengCheng!
>>
>> On Sun, Jul 17, 2016 at 2:28 PM, Sushanth Sowmyan 
>> wrote:
>>
>>> Welcome aboard Pengcheng! :)
>>>
>>> On Jul 17, 2016 12:01, "Lefty Leverenz"  wrote:
>>>
 Congratulations Pengcheng!

 -- Lefty

 On Sun, Jul 17, 2016 at 1:03 PM, Ashutosh Chauhan 
 
 wrote:

>>
>> Hello Hive community,
>>
>> I'm pleased to announce that Pengcheng Xiong has accepted the Apache
> Hive
>> PMC's
>> invitation, and is now our newest PMC member. Many thanks to 
>> Pengcheng
> for
>> all of his hard work.
>>
>> Please join me congratulating Pengcheng!
>>
>> Best,
>> Ashutosh
>> (On behalf of the Apache Hive PMC)
>>
>


>>>

>>>
>>
>
>
>


Re: [ANNOUNCE] New PMC Member : Jesus

2016-07-18 Thread Hari Sivarama Subramaniyan
Hi Jesus
Congrats for the well deserved achievement.

Regards
Hari

From: Jesus Camacho Rodriguez 
Sent: Monday, July 18, 2016 10:27 AM
To: user@hive.apache.org
Cc: d...@hive.apache.org
Subject: Re: [ANNOUNCE] New PMC Member : Jesus

Thanks everybody! Looking forward to continue contributing to the project!

--
Jesús




On 7/18/16, 6:21 PM, "Prasanth Jayachandran"  
wrote:

>Congratulations Jesus!
>
>> On Jul 18, 2016, at 10:10 AM, Jimmy Xiang  wrote:
>>
>> Congrats!!
>>
>> On Mon, Jul 18, 2016 at 9:54 AM, Vihang Karajgaonkar
>>  wrote:
>>> Congratulations Jesus!
>>>
 On Jul 18, 2016, at 8:30 AM, Sergio Pena  wrote:

 Congrats Jesus !!!

 On Mon, Jul 18, 2016 at 7:28 AM, Peter Vary  wrote:

> Congratulations Jesus!
>
>> On Jul 18, 2016, at 6:55 AM, Wei Zheng  wrote:
>>
>> Congrats Jesus!
>>
>> Thanks,
>>
>> Wei
>>
>>
>>
>>
>>
>>
>>
>> On 7/17/16, 14:29, "Sushanth Sowmyan"  wrote:
>>
>>> Good to have you onboard, Jesus! :)
>>>
>>> On Jul 17, 2016 12:00, "Lefty Leverenz" 
> wrote:
>>>
 Congratulations Jesus!

 -- Lefty

 On Sun, Jul 17, 2016 at 1:01 PM, Ashutosh Chauhan <
> hashut...@apache.org>
 wrote:

> Hello Hive community,
>
> I'm pleased to announce that Jesus Camacho Rodriguez has accepted the
> Apache Hive PMC's
> invitation, and is now our newest PMC member. Many thanks to Jesus for
> all of
> his hard work.
>
> Please join me congratulating Jesus!
>
> Best,
> Ashutosh
> (On behalf of the Apache Hive PMC)
>


>
>
>>>
>>
>
>


Re: [ANNOUNCE] New PMC Member : Jesus

2016-07-18 Thread Jesus Camacho Rodriguez
Thanks everybody! Looking forward to continue contributing to the project!

--
Jesús




On 7/18/16, 6:21 PM, "Prasanth Jayachandran"  
wrote:

>Congratulations Jesus!
>
>> On Jul 18, 2016, at 10:10 AM, Jimmy Xiang  wrote:
>> 
>> Congrats!!
>> 
>> On Mon, Jul 18, 2016 at 9:54 AM, Vihang Karajgaonkar
>>  wrote:
>>> Congratulations Jesus!
>>> 
 On Jul 18, 2016, at 8:30 AM, Sergio Pena  wrote:
 
 Congrats Jesus !!!
 
 On Mon, Jul 18, 2016 at 7:28 AM, Peter Vary  wrote:
 
> Congratulations Jesus!
> 
>> On Jul 18, 2016, at 6:55 AM, Wei Zheng  wrote:
>> 
>> Congrats Jesus!
>> 
>> Thanks,
>> 
>> Wei
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On 7/17/16, 14:29, "Sushanth Sowmyan"  wrote:
>> 
>>> Good to have you onboard, Jesus! :)
>>> 
>>> On Jul 17, 2016 12:00, "Lefty Leverenz" 
> wrote:
>>> 
 Congratulations Jesus!
 
 -- Lefty
 
 On Sun, Jul 17, 2016 at 1:01 PM, Ashutosh Chauhan <
> hashut...@apache.org>
 wrote:
 
> Hello Hive community,
> 
> I'm pleased to announce that Jesus Camacho Rodriguez has accepted the
> Apache Hive PMC's
> invitation, and is now our newest PMC member. Many thanks to Jesus for
> all of
> his hard work.
> 
> Please join me congratulating Jesus!
> 
> Best,
> Ashutosh
> (On behalf of the Apache Hive PMC)
> 
 
 
> 
> 
>>> 
>> 
>
>


Re: [ANNOUNCE] New PMC Member : Pengcheng

2016-07-18 Thread Jesus Camacho Rodriguez
Congrats Pengcheng, well deserved! :)



On 7/18/16, 6:25 PM, "Vaibhav Gumashta"  wrote:

>Congrats Pengcheng!
>
>From: Prasanth Jayachandran 
>Sent: Monday, July 18, 2016 10:21 AM
>To: user@hive.apache.org
>Cc: d...@hive.apache.org
>Subject: Re: [ANNOUNCE] New PMC Member : Pengcheng
>
>Congratulations Pengcheng!
>
>> On Jul 18, 2016, at 10:10 AM, Jimmy Xiang  wrote:
>>
>> Congrats!!
>>
>> On Mon, Jul 18, 2016 at 9:55 AM, Vihang Karajgaonkar
>>  wrote:
>>> Congratulations!
>>>
 On Jul 18, 2016, at 5:28 AM, Peter Vary  wrote:

 Congratulations Pengcheng!


> On Jul 18, 2016, at 6:55 AM, Wei Zheng  wrote:
>
> Congrats Pengcheng!
>
> Thanks,
>
> Wei
>
>
>
>
>
>
> On 7/17/16, 16:01, "Xuefu Zhang"  wrote:
>
>> Congrats, PengCheng!
>>
>> On Sun, Jul 17, 2016 at 2:28 PM, Sushanth Sowmyan 
>> wrote:
>>
>>> Welcome aboard Pengcheng! :)
>>>
>>> On Jul 17, 2016 12:01, "Lefty Leverenz"  wrote:
>>>
 Congratulations Pengcheng!

 -- Lefty

 On Sun, Jul 17, 2016 at 1:03 PM, Ashutosh Chauhan 
 
 wrote:

>>
>> Hello Hive community,
>>
>> I'm pleased to announce that Pengcheng Xiong has accepted the Apache
> Hive
>> PMC's
>> invitation, and is now our newest PMC member. Many thanks to 
>> Pengcheng
> for
>> all of his hard work.
>>
>> Please join me congratulating Pengcheng!
>>
>> Best,
>> Ashutosh
>> (On behalf of the Apache Hive PMC)
>>
>


>>>

>>>
>>
>
>
>


Re: [ANNOUNCE] New PMC Member : Pengcheng

2016-07-18 Thread Vaibhav Gumashta
Congrats Pengcheng!

From: Prasanth Jayachandran 
Sent: Monday, July 18, 2016 10:21 AM
To: user@hive.apache.org
Cc: d...@hive.apache.org
Subject: Re: [ANNOUNCE] New PMC Member : Pengcheng

Congratulations Pengcheng!

> On Jul 18, 2016, at 10:10 AM, Jimmy Xiang  wrote:
>
> Congrats!!
>
> On Mon, Jul 18, 2016 at 9:55 AM, Vihang Karajgaonkar
>  wrote:
>> Congratulations!
>>
>>> On Jul 18, 2016, at 5:28 AM, Peter Vary  wrote:
>>>
>>> Congratulations Pengcheng!
>>>
>>>
 On Jul 18, 2016, at 6:55 AM, Wei Zheng  wrote:

 Congrats Pengcheng!

 Thanks,

 Wei






 On 7/17/16, 16:01, "Xuefu Zhang"  wrote:

> Congrats, PengCheng!
>
> On Sun, Jul 17, 2016 at 2:28 PM, Sushanth Sowmyan 
> wrote:
>
>> Welcome aboard Pengcheng! :)
>>
>> On Jul 17, 2016 12:01, "Lefty Leverenz"  wrote:
>>
>>> Congratulations Pengcheng!
>>>
>>> -- Lefty
>>>
>>> On Sun, Jul 17, 2016 at 1:03 PM, Ashutosh Chauhan 
>>> wrote:
>>>
>
> Hello Hive community,
>
> I'm pleased to announce that Pengcheng Xiong has accepted the Apache
 Hive
> PMC's
> invitation, and is now our newest PMC member. Many thanks to Pengcheng
 for
> all of his hard work.
>
> Please join me congratulating Pengcheng!
>
> Best,
> Ashutosh
> (On behalf of the Apache Hive PMC)
>

>>>
>>>
>>
>>>
>>
>




Re: [ANNOUNCE] New PMC Member : Jesus

2016-07-18 Thread Prasanth Jayachandran
Congratulations Jesus!

> On Jul 18, 2016, at 10:10 AM, Jimmy Xiang  wrote:
> 
> Congrats!!
> 
> On Mon, Jul 18, 2016 at 9:54 AM, Vihang Karajgaonkar
>  wrote:
>> Congratulations Jesus!
>> 
>>> On Jul 18, 2016, at 8:30 AM, Sergio Pena  wrote:
>>> 
>>> Congrats Jesus !!!
>>> 
>>> On Mon, Jul 18, 2016 at 7:28 AM, Peter Vary  wrote:
>>> 
 Congratulations Jesus!
 
> On Jul 18, 2016, at 6:55 AM, Wei Zheng  wrote:
> 
> Congrats Jesus!
> 
> Thanks,
> 
> Wei
> 
> 
> 
> 
> 
> 
> 
> On 7/17/16, 14:29, "Sushanth Sowmyan"  wrote:
> 
>> Good to have you onboard, Jesus! :)
>> 
>> On Jul 17, 2016 12:00, "Lefty Leverenz" 
 wrote:
>> 
>>> Congratulations Jesus!
>>> 
>>> -- Lefty
>>> 
>>> On Sun, Jul 17, 2016 at 1:01 PM, Ashutosh Chauhan <
 hashut...@apache.org>
>>> wrote:
>>> 
 Hello Hive community,
 
 I'm pleased to announce that Jesus Camacho Rodriguez has accepted the
 Apache Hive PMC's
 invitation, and is now our newest PMC member. Many thanks to Jesus for
 all of
 his hard work.
 
 Please join me congratulating Jesus!
 
 Best,
 Ashutosh
 (On behalf of the Apache Hive PMC)
 
>>> 
>>> 
 
 
>> 
> 



Re: [ANNOUNCE] New PMC Member : Pengcheng

2016-07-18 Thread Prasanth Jayachandran
Congratulations Pengcheng!

> On Jul 18, 2016, at 10:10 AM, Jimmy Xiang  wrote:
> 
> Congrats!!
> 
> On Mon, Jul 18, 2016 at 9:55 AM, Vihang Karajgaonkar
>  wrote:
>> Congratulations!
>> 
>>> On Jul 18, 2016, at 5:28 AM, Peter Vary  wrote:
>>> 
>>> Congratulations Pengcheng!
>>> 
>>> 
 On Jul 18, 2016, at 6:55 AM, Wei Zheng  wrote:
 
 Congrats Pengcheng!
 
 Thanks,
 
 Wei
 
 
 
 
 
 
 On 7/17/16, 16:01, "Xuefu Zhang"  wrote:
 
> Congrats, PengCheng!
> 
> On Sun, Jul 17, 2016 at 2:28 PM, Sushanth Sowmyan 
> wrote:
> 
>> Welcome aboard Pengcheng! :)
>> 
>> On Jul 17, 2016 12:01, "Lefty Leverenz"  wrote:
>> 
>>> Congratulations Pengcheng!
>>> 
>>> -- Lefty
>>> 
>>> On Sun, Jul 17, 2016 at 1:03 PM, Ashutosh Chauhan 
>>> wrote:
>>> 
> 
> Hello Hive community,
> 
> I'm pleased to announce that Pengcheng Xiong has accepted the Apache
 Hive
> PMC's
> invitation, and is now our newest PMC member. Many thanks to Pengcheng
 for
> all of his hard work.
> 
> Please join me congratulating Pengcheng!
> 
> Best,
> Ashutosh
> (On behalf of the Apache Hive PMC)
> 
 
>>> 
>>> 
>> 
>>> 
>> 
> 



Re: [ANNOUNCE] New PMC Member : Pengcheng

2016-07-18 Thread Jimmy Xiang
Congrats!!

On Mon, Jul 18, 2016 at 9:55 AM, Vihang Karajgaonkar
 wrote:
> Congratulations!
>
>> On Jul 18, 2016, at 5:28 AM, Peter Vary  wrote:
>>
>> Congratulations Pengcheng!
>>
>>
>>> On Jul 18, 2016, at 6:55 AM, Wei Zheng  wrote:
>>>
>>> Congrats Pengcheng!
>>>
>>> Thanks,
>>>
>>> Wei
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 7/17/16, 16:01, "Xuefu Zhang"  wrote:
>>>
 Congrats, PengCheng!

 On Sun, Jul 17, 2016 at 2:28 PM, Sushanth Sowmyan 
 wrote:

> Welcome aboard Pengcheng! :)
>
> On Jul 17, 2016 12:01, "Lefty Leverenz"  wrote:
>
>> Congratulations Pengcheng!
>>
>> -- Lefty
>>
>> On Sun, Jul 17, 2016 at 1:03 PM, Ashutosh Chauhan 
>> wrote:
>>

 Hello Hive community,

 I'm pleased to announce that Pengcheng Xiong has accepted the Apache
>>> Hive
 PMC's
 invitation, and is now our newest PMC member. Many thanks to Pengcheng
>>> for
 all of his hard work.

 Please join me congratulating Pengcheng!

 Best,
 Ashutosh
 (On behalf of the Apache Hive PMC)

>>>
>>
>>
>
>>
>


Re: [ANNOUNCE] New PMC Member : Jesus

2016-07-18 Thread Jimmy Xiang
Congrats!!

On Mon, Jul 18, 2016 at 9:54 AM, Vihang Karajgaonkar
 wrote:
> Congratulations Jesus!
>
>> On Jul 18, 2016, at 8:30 AM, Sergio Pena  wrote:
>>
>> Congrats Jesus !!!
>>
>> On Mon, Jul 18, 2016 at 7:28 AM, Peter Vary  wrote:
>>
>>> Congratulations Jesus!
>>>
 On Jul 18, 2016, at 6:55 AM, Wei Zheng  wrote:

 Congrats Jesus!

 Thanks,

 Wei







 On 7/17/16, 14:29, "Sushanth Sowmyan"  wrote:

> Good to have you onboard, Jesus! :)
>
> On Jul 17, 2016 12:00, "Lefty Leverenz" 
>>> wrote:
>
>> Congratulations Jesus!
>>
>> -- Lefty
>>
>> On Sun, Jul 17, 2016 at 1:01 PM, Ashutosh Chauhan <
>>> hashut...@apache.org>
>> wrote:
>>
>>> Hello Hive community,
>>>
>>> I'm pleased to announce that Jesus Camacho Rodriguez has accepted the
>>> Apache Hive PMC's
>>> invitation, and is now our newest PMC member. Many thanks to Jesus for
>>> all of
>>> his hard work.
>>>
>>> Please join me congratulating Jesus!
>>>
>>> Best,
>>> Ashutosh
>>> (On behalf of the Apache Hive PMC)
>>>
>>
>>
>>>
>>>
>


Re: [ANNOUNCE] New PMC Member : Pengcheng

2016-07-18 Thread Vihang Karajgaonkar
Congratulations!

> On Jul 18, 2016, at 5:28 AM, Peter Vary  wrote:
> 
> Congratulations Pengcheng!
> 
> 
>> On Jul 18, 2016, at 6:55 AM, Wei Zheng  wrote:
>> 
>> Congrats Pengcheng!
>> 
>> Thanks,
>> 
>> Wei
>> 
>> 
>> 
>> 
>> 
>> 
>> On 7/17/16, 16:01, "Xuefu Zhang"  wrote:
>> 
>>> Congrats, PengCheng!
>>> 
>>> On Sun, Jul 17, 2016 at 2:28 PM, Sushanth Sowmyan 
>>> wrote:
>>> 
 Welcome aboard Pengcheng! :)
 
 On Jul 17, 2016 12:01, "Lefty Leverenz"  wrote:
 
> Congratulations Pengcheng!
> 
> -- Lefty
> 
> On Sun, Jul 17, 2016 at 1:03 PM, Ashutosh Chauhan 
> wrote:
> 
>>> 
>>> Hello Hive community,
>>> 
>>> I'm pleased to announce that Pengcheng Xiong has accepted the Apache
>> Hive
>>> PMC's
>>> invitation, and is now our newest PMC member. Many thanks to Pengcheng
>> for
>>> all of his hard work.
>>> 
>>> Please join me congratulating Pengcheng!
>>> 
>>> Best,
>>> Ashutosh
>>> (On behalf of the Apache Hive PMC)
>>> 
>> 
> 
> 
 
> 



Re: [ANNOUNCE] New PMC Member : Jesus

2016-07-18 Thread Vihang Karajgaonkar
Congratulations Jesus!

> On Jul 18, 2016, at 8:30 AM, Sergio Pena  wrote:
> 
> Congrats Jesus !!!
> 
> On Mon, Jul 18, 2016 at 7:28 AM, Peter Vary  wrote:
> 
>> Congratulations Jesus!
>> 
>>> On Jul 18, 2016, at 6:55 AM, Wei Zheng  wrote:
>>> 
>>> Congrats Jesus!
>>> 
>>> Thanks,
>>> 
>>> Wei
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 7/17/16, 14:29, "Sushanth Sowmyan"  wrote:
>>> 
 Good to have you onboard, Jesus! :)
 
 On Jul 17, 2016 12:00, "Lefty Leverenz" 
>> wrote:
 
> Congratulations Jesus!
> 
> -- Lefty
> 
> On Sun, Jul 17, 2016 at 1:01 PM, Ashutosh Chauhan <
>> hashut...@apache.org>
> wrote:
> 
>> Hello Hive community,
>> 
>> I'm pleased to announce that Jesus Camacho Rodriguez has accepted the
>> Apache Hive PMC's
>> invitation, and is now our newest PMC member. Many thanks to Jesus for
>> all of
>> his hard work.
>> 
>> Please join me congratulating Jesus!
>> 
>> Best,
>> Ashutosh
>> (On behalf of the Apache Hive PMC)
>> 
> 
> 
>> 
>> 



Re: [ANNOUNCE] New PMC Member : Jesus

2016-07-18 Thread Chao Sun
Congratulations Jesus!

On Mon, Jul 18, 2016 at 8:30 AM, Sergio Pena 
wrote:

> Congrats Jesus !!!
>
> On Mon, Jul 18, 2016 at 7:28 AM, Peter Vary  wrote:
>
> > Congratulations Jesus!
> >
> > > On Jul 18, 2016, at 6:55 AM, Wei Zheng  wrote:
> > >
> > > Congrats Jesus!
> > >
> > > Thanks,
> > >
> > > Wei
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 7/17/16, 14:29, "Sushanth Sowmyan"  wrote:
> > >
> > >> Good to have you onboard, Jesus! :)
> > >>
> > >> On Jul 17, 2016 12:00, "Lefty Leverenz" 
> > wrote:
> > >>
> > >>> Congratulations Jesus!
> > >>>
> > >>> -- Lefty
> > >>>
> > >>> On Sun, Jul 17, 2016 at 1:01 PM, Ashutosh Chauhan <
> > hashut...@apache.org>
> > >>> wrote:
> > >>>
> >  Hello Hive community,
> > 
> >  I'm pleased to announce that Jesus Camacho Rodriguez has accepted
> the
> >  Apache Hive PMC's
> >  invitation, and is now our newest PMC member. Many thanks to Jesus
> for
> >  all of
> >  his hard work.
> > 
> >  Please join me congratulating Jesus!
> > 
> >  Best,
> >  Ashutosh
> >  (On behalf of the Apache Hive PMC)
> > 
> > >>>
> > >>>
> >
> >
>


Re: Query Performance Issue : Group By and Distinct and load on reducer

2016-07-18 Thread @Sanjiv Singh
Hi Dudu,

Thanks for your help and proactive response on it.

Today I have verified all solution you provided. it worked for me for given
table with 6 billion records.

Before I conclude anything, i want to check if there is any reference
document/link  available for these algorithm / approach

It would be good if you can share with him.

Any help really appreciated.
thanks much again



Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Fri, Jul 1, 2016 at 2:06 PM, Markovitz, Dudu 
wrote:

> My pleasure.
>
>
>
> Just to make clear –
>
> The version with the non-consecutive values (1) is much more efficient
> than the version with the consecutive values (3), so if possible, go with
> (1).
>
>
>
> Dudu
>
>
>
> *From:* @Sanjiv Singh [mailto:sanjiv.is...@gmail.com]
> *Sent:* Friday, July 01, 2016 8:24 PM
>
> *To:* Markovitz, Dudu 
> *Cc:* user@hive.apache.org
> *Subject:* Re: Query Performance Issue : Group By and Distinct and load
> on reducer
>
>
>
> Thanks, really appreciate.
>
>
>
> I will try this. will respond with results.
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
>
>
> On Fri, Jul 1, 2016 at 6:50 AM, Markovitz, Dudu 
> wrote:
>
> 3.
>
> This is a working code for consecutive values.
>
> MyColumn should be a column (or list of columns) with good uniformed
> distribution.
>
>
>
>
>
> withgroup_rows
>
> as
>
> (
>
> select  abs(hash(MyColumn))%1 as group_id
>
>,count (*)   as cnt
>
>
>
> fromINTER_ETL
>
>
>
> group byabs(hash(MyColumn))%1
>
> )
>
>
>
>,group_rows_accumulated
>
> as
>
> (
>
> select  g1.group_id
>
>,sum (g2.cnt) - min (g1.cnt)   as
> accumulated_rows
>
>
>
> from
>
> group_rows   as g1
>
>
>
> cross join  group_rows   as g2
>
>
>
> where   g2.group_id <= g1.group_id
>
>
>
> group byg1.group_id
>
> )
>
>
>
>  select t.*
>
>,row_number () over (partition by a.group_id order by null) +
> a.accumulated_rows as ETL_ROW_ID
>
>
>
> from   INTER_ETL   as t
>
>
>
> joingroup_rows_accumulated  as a
>
>
>
> on  a.group_id  =
>
> abs(hash(MyColumn))%1
>
> ;
>
>
>
> *From:* Markovitz, Dudu [mailto:dmarkov...@paypal.com]
> *Sent:* Thursday, June 30, 2016 12:43 PM
> *To:* user@hive.apache.org; sanjiv.is...@gmail.com
>
>
> *Subject:* RE: Query Performance Issue : Group By and Distinct and load
> on reducer
>
>
>
> 1.
>
> This works.
>
> I’ve recalled that the CAST is needed since FLOOR defaults to FLOAT.
>
>
>
> select  (cast (floor(r*100) as bigint)+ 1)  + 100L *
> (row_number () over (partition by (cast (floor(r*100) as bigint) + 1)
> order by null) - 1)  as ETL_ROW_ID
>
>
>
> from(select *,rand() as r from INTER_ETL) as t
>
> ;
>
>
>
>
>
>
>
> Here is a test result from our dev system
>
>
>
> select  min (ETL_ROW_ID)as min_ETL_ROW_ID
>
>,count   (ETL_ROW_ID)as count_ETL_ROW_ID
>
>,max (ETL_ROW_ID)as max_ETL_ROW_ID
>
>
>
> from   (select  (cast (floor(r*100) as bigint)+ 1)  + 100L
> * (row_number () over (partition by (cast (floor(r*100) as bigint) + 1)
> order by null) - 1)  as ETL_ROW_ID
>
>
>
> from(select *,rand() as r from INTER_ETL) as t
>
> )
>
> as t
>
> ;
>
>
>
>
>
> min_ETL_ROW_ID
>
> count_ETL_ROW_ID
>
> max_ETL_ROW_ID
>
>1
>
>39567412227
>
>  40529759537
>
>
>
>
>
>
>
> *From:* Markovitz, Dudu [mailto:dmarkov...@paypal.com
> ]
> *Sent:* Wednesday, June 29, 2016 11:37 PM
> *To:* sanjiv.is...@gmail.com
> *Cc:* user@hive.apache.org
> *Subject:* RE: Query Performance Issue : Group By and Distinct and load
> on reducer
>
>
>
> 1.
>
> This is strange.
>
> The negative numbers are due to overflow of the ‘int’ type, but for that
> reason exactly I’ve casted the expressions in my code to ‘bigint’.
>
> I’ve tested this code before sending it to you and it worked fine,
> returning results that are beyond the range of the ‘int’ type.
>
>
>
> Please try this:
>
>
>
> select  *
>
>   ,(floor(r*100) + 1)  + (100L * (row_number () over
> (partition by (floor(r*100) + 1) order by null) - 1)  as ETL_ROW_ID
>
>
>
> from(select *,rand() as r from INTER_ETL) as t
>
> ;
>
>
>
> 2.
>
> Great
>
>
>
> 3.
>
> Sorry, hadn’t had the time to test it (nor the change I’m going to suggest
> now…J)
>
> Please check if the following code works and if so, replace the ‘a’
> subquery code with it.
>
>
>
>
>
> select  a1.group_id
>
>,sum (a2.cnt) - a1.cnt   as accum_rows
>
>
>
> from   (

Re: [ANNOUNCE] New PMC Member : Pengcheng

2016-07-18 Thread Peter Vary
Congratulations Pengcheng!


> On Jul 18, 2016, at 6:55 AM, Wei Zheng  wrote:
> 
> Congrats Pengcheng!
> 
> Thanks,
> 
> Wei
> 
> 
> 
> 
> 
> 
> On 7/17/16, 16:01, "Xuefu Zhang"  wrote:
> 
>> Congrats, PengCheng!
>> 
>> On Sun, Jul 17, 2016 at 2:28 PM, Sushanth Sowmyan 
>> wrote:
>> 
>>> Welcome aboard Pengcheng! :)
>>> 
>>> On Jul 17, 2016 12:01, "Lefty Leverenz"  wrote:
>>> 
 Congratulations Pengcheng!
 
 -- Lefty
 
 On Sun, Jul 17, 2016 at 1:03 PM, Ashutosh Chauhan 
 wrote:
 
>> 
>> Hello Hive community,
>> 
>> I'm pleased to announce that Pengcheng Xiong has accepted the Apache
> Hive
>> PMC's
>> invitation, and is now our newest PMC member. Many thanks to Pengcheng
> for
>> all of his hard work.
>> 
>> Please join me congratulating Pengcheng!
>> 
>> Best,
>> Ashutosh
>> (On behalf of the Apache Hive PMC)
>> 
> 
 
 
>>> 



Re: [ANNOUNCE] New PMC Member : Jesus

2016-07-18 Thread Peter Vary
Congratulations Jesus!

> On Jul 18, 2016, at 6:55 AM, Wei Zheng  wrote:
> 
> Congrats Jesus!
> 
> Thanks,
> 
> Wei
> 
> 
> 
> 
> 
> 
> 
> On 7/17/16, 14:29, "Sushanth Sowmyan"  wrote:
> 
>> Good to have you onboard, Jesus! :)
>> 
>> On Jul 17, 2016 12:00, "Lefty Leverenz"  wrote:
>> 
>>> Congratulations Jesus!
>>> 
>>> -- Lefty
>>> 
>>> On Sun, Jul 17, 2016 at 1:01 PM, Ashutosh Chauhan 
>>> wrote:
>>> 
 Hello Hive community,
 
 I'm pleased to announce that Jesus Camacho Rodriguez has accepted the
 Apache Hive PMC's
 invitation, and is now our newest PMC member. Many thanks to Jesus for
 all of
 his hard work.
 
 Please join me congratulating Jesus!
 
 Best,
 Ashutosh
 (On behalf of the Apache Hive PMC)
 
>>> 
>>>