Re: Run hive queries, and collect job information

2013-01-30 Thread Mathieu Despriee
Fantastic.
Thanks !


2013/1/30 Qiang Wang 

> Every hive query has a history file, and you can get these info from hive
> history file
>
> Following java code can be an example:
>
> https://github.com/anjuke/hwi/blob/master/src/main/java/org/apache/hadoop/hive/hwi/util/QueryUtil.java
>
> Regard,
> Qiang
>
>
> 2013/1/30 Mathieu Despriee 
>
>> Hi folks,
>>
>> I would like to run a list of generated HIVE queries. For each, I would
>> like to retrieve the MR job_id (or ids, in case of multiple stages). And
>> then, with this job_id, collect statistics from job tracker (cumulative
>> CPU, read bytes...)
>>
>> How can I send HIVE queries from a bash or python script, and retrieve
>> the job_id(s) ?
>>
>> For the 2nd part (collecting stats for the job), we're using a MRv1
>> Hadoop cluster, so I don't have the AppMaster REST 
>> API<http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/MapredAppMasterRest.html>.
>> I'm about to collect data from the jobtracker web UI. Any better idea ?
>>
>> Mathieu
>>
>>
>>
>


Run hive queries, and collect job information

2013-01-30 Thread Mathieu Despriee
Hi folks,

I would like to run a list of generated HIVE queries. For each, I would
like to retrieve the MR job_id (or ids, in case of multiple stages). And
then, with this job_id, collect statistics from job tracker (cumulative
CPU, read bytes...)

How can I send HIVE queries from a bash or python script, and retrieve the
job_id(s) ?

For the 2nd part (collecting stats for the job), we're using a MRv1 Hadoop
cluster, so I don't have the AppMaster REST
API.
I'm about to collect data from the jobtracker web UI. Any better idea ?

Mathieu


Re: Real-life experience of forcing smaller input splits?

2013-01-24 Thread Mathieu Despriee
Hi David,

What file format and compression type are you using ?

Mathieu

Le 25 janv. 2013 à 07:16, David Morel  a écrit :

> Hello,
>
> I have seen many posts on various sites and MLs, but didn't find a firm
> answer anywhere: is it possible yes or no to force a smaller split size
> than a block on the mappers, from the client side? I'm not after
> pointers to the docs (unless you're very very sure :-) but after
> real-life experience along the lines of 'yes, it works this way, I've
> done it like this...'
>
> All the parameters that I could find (especially specifying a max input
> split size) seem to have no effect, and the files that I have are so
> heavily compressed that they completely saturate the mappers' memory
> when processed.
>
> A solution I could imagine for this specific issue is reducing the block
> size, but for now I simply went with disabling in-file compression for
> those. And changing the block size on a per-file basis is something I'd
> like to avoid if at all possible.
>
> All the hive settings that we tried only got me as far as raising the
> number of mappers from 5 to 6 (yay!) where I would have needed at least
> ten times more.
>
> Thanks!
>
> D.Morel


specification of SERDE in RCFile

2013-01-22 Thread Mathieu Despriee
Hi folks,

Through samples here and there, I've seen tables definitions using RCFile
storage specifying a SERDE in somecase, and sometimes not.

ie : sometimes ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS RCFILE

some other times, only :  STORED AS RCFILE

My question : what happens if SERDE is not specified ? what's the default
behavior ?


Thank you

Mathieu


Re: Stack function in Hive : how to specify multiple aliases?

2013-01-11 Thread Mathieu Despriee
Yep, that worked.

As suggested by Nitin, LATERAL VIEW is a helpful too, and here's the syntax
:
SELECT  Id, Name, App, Byte, Packet FROM testApp2
LATERAL VIEW stack(2,AppWeb, ByteWeb, PacketWeb, AppP2P, ByteP2P,
PacketP2P) T AS App,Byte,Packet ;

Thanks for your help guys,

Mathieu


2013/1/10 Dean Wampler 

> Try "as (alias1, alias2, ...)"
>
>
> On Thu, Jan 10, 2013 at 3:42 AM, Mathieu Despriee wrote:
>
>> Not working either :
>>
>> SELECT stack(2,AppWeb, ByteWeb, PacketWeb, AppP2P, ByteP2P, PacketP2P) AS
>> App,Byte,Packet FROM testApp2;
>> > FAILED: SemanticException 1:76 Only a single expression in the SELECT
>> clause is supported with UDTF's. Error encountered near token 'Byte'
>>
>> I tried to quote the aliases or to use array-style with no luck.
>>
>> Is there any description of hive grammar somewhere ?
>> I only found this doc :
>> https://cwiki.apache.org/Hive/languagemanual-select.html, but "
>> select_expr" is not described 
>>
>>
>>
>>
>>
>> 2013/1/10 Nitin Pawar 
>>
>>> I never ran into this kind of problem but can you try select as A,B,C
>>>
>>>
>>> On Thu, Jan 10, 2013 at 12:58 AM, Mathieu Despriee 
>>> wrote:
>>>
>>>> SELECT stack(2,AppWeb, ByteWeb, PacketWeb, AppP2P, ByteP2P, PacketP2P)
>>>> AS A FROM testApp2;
>>>>
>>>>
>>>> 2013/1/10 Nitin Pawar 
>>>>
>>>>> can you provide your query ?
>>>>>
>>>>>
>>>>> On Thu, Jan 10, 2013 at 12:39 AM, Mathieu Despriee >>>> > wrote:
>>>>>
>>>>>> Hi folks,
>>>>>>
>>>>>> I want to use the stack function, described here :
>>>>>> https://cwiki.apache.org/Hive/languagemanual-udf.html#LanguageManualUDF-BuiltinTableGeneratingFunctions%2528UDTF%2529
>>>>>>
>>>>>> Hive asks me to provide the multiple aliases for the resulting
>>>>>> columns ("The number of aliases in the AS clause does not match the 
>>>>>> number
>>>>>> of colums output by the UDTF, expected 3 aliases but got 1").
>>>>>>
>>>>>> What's the syntax to provide multiple aliases ?
>>>>>>
>>>>>> Thanks,
>>>>>> Mathieu
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nitin Pawar
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> *Dean Wampler, Ph.D.*
> thinkbiganalytics.com
> +1-312-339-1330
>
>


Re: Stack function in Hive : how to specify multiple aliases?

2013-01-10 Thread Mathieu Despriee
Not working either :

SELECT stack(2,AppWeb, ByteWeb, PacketWeb, AppP2P, ByteP2P, PacketP2P) AS
App,Byte,Packet FROM testApp2;
> FAILED: SemanticException 1:76 Only a single expression in the SELECT
clause is supported with UDTF's. Error encountered near token 'Byte'

I tried to quote the aliases or to use array-style with no luck.

Is there any description of hive grammar somewhere ?
I only found this doc :
https://cwiki.apache.org/Hive/languagemanual-select.html, but "select_expr"
is not described 





2013/1/10 Nitin Pawar 

> I never ran into this kind of problem but can you try select as A,B,C
>
>
> On Thu, Jan 10, 2013 at 12:58 AM, Mathieu Despriee wrote:
>
>> SELECT stack(2,AppWeb, ByteWeb, PacketWeb, AppP2P, ByteP2P, PacketP2P) AS
>> A FROM testApp2;
>>
>>
>> 2013/1/10 Nitin Pawar 
>>
>>> can you provide your query ?
>>>
>>>
>>> On Thu, Jan 10, 2013 at 12:39 AM, Mathieu Despriee 
>>> wrote:
>>>
>>>> Hi folks,
>>>>
>>>> I want to use the stack function, described here :
>>>> https://cwiki.apache.org/Hive/languagemanual-udf.html#LanguageManualUDF-BuiltinTableGeneratingFunctions%2528UDTF%2529
>>>>
>>>> Hive asks me to provide the multiple aliases for the resulting columns
>>>> ("The number of aliases in the AS clause does not match the number of
>>>> colums output by the UDTF, expected 3 aliases but got 1").
>>>>
>>>> What's the syntax to provide multiple aliases ?
>>>>
>>>> Thanks,
>>>> Mathieu
>>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>


Re: Stack function in Hive : how to specify multiple aliases?

2013-01-10 Thread Mathieu Despriee
SELECT stack(2,AppWeb, ByteWeb, PacketWeb, AppP2P, ByteP2P, PacketP2P) AS A
FROM testApp2;


2013/1/10 Nitin Pawar 

> can you provide your query ?
>
>
> On Thu, Jan 10, 2013 at 12:39 AM, Mathieu Despriee wrote:
>
>> Hi folks,
>>
>> I want to use the stack function, described here :
>> https://cwiki.apache.org/Hive/languagemanual-udf.html#LanguageManualUDF-BuiltinTableGeneratingFunctions%2528UDTF%2529
>>
>> Hive asks me to provide the multiple aliases for the resulting columns
>> ("The number of aliases in the AS clause does not match the number of
>> colums output by the UDTF, expected 3 aliases but got 1").
>>
>> What's the syntax to provide multiple aliases ?
>>
>> Thanks,
>> Mathieu
>>
>
>
>
> --
> Nitin Pawar
>


Stack function in Hive : how to specify multiple aliases?

2013-01-10 Thread Mathieu Despriee
Hi folks,

I want to use the stack function, described here :
https://cwiki.apache.org/Hive/languagemanual-udf.html#LanguageManualUDF-BuiltinTableGeneratingFunctions%2528UDTF%2529

Hive asks me to provide the multiple aliases for the resulting columns
("The number of aliases in the AS clause does not match the number of
colums output by the UDTF, expected 3 aliases but got 1").

What's the syntax to provide multiple aliases ?

Thanks,
Mathieu