RE: Fail to Increase Hive Mapper Tasks?

2014-01-02 Thread Sun, Rui
As for the number of 22, I guess that your table have multiple files, probably 
2.
HIVE will divide the desired number of map tasks evenly among the files of the 
table. And the number of map tasks for a file may be increased because the file 
size can't be divided exactly by it.

-Original Message-
From: Ji ZHANG [mailto:zhangj...@gmail.com] 
Sent: Friday, January 03, 2014 12:27 PM
To: user@hive.apache.org
Subject: Re: Fail to Increase Hive Mapper Tasks?

Hi Rui,

I combined your suggestion with the answer from 
SO(http://stackoverflow.com/questions/20816726/fail-to-increase-hive-mapper-tasks),
and it works:

set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set mapred.map.tasks = 20;
select count(*) from dw_stage.st_dw_marketing_touch_pi_metrics_basic;

It'll use 22 mappers, though I don't know why it's not an exact 20.
And I'm using Hive 0.9 + Hadoop 1.01.

Thank you very much.

Jerry

On Fri, Jan 3, 2014 at 10:51 AM, Sun, Rui  wrote:
> Hi, You can try set mapred.map.tasks = 19.
> It seems that HIVE is using the old Hadoop MapReduce API and so 
> mapred.max.split.size won't work.
>
> -Original Message-
> From: Ji Zhang [mailto:zhangj...@gmail.com]
> Sent: Thursday, January 02, 2014 3:56 PM
> To: user@hive.apache.org
> Subject: Fail to Increase Hive Mapper Tasks?
>
> Hi,
>
> I have a managed Hive table, which contains only one 150MB file. I then do 
> "select count(*) from tbl" to it, and it uses 2 mappers. I want to set it to 
> a bigger number.
>
> First I tried 'set mapred.max.split.size=8388608;', so hopefully it will use 
> 19 mappers. But it only uses 3. Somehow it still split the input by 64MB. I 
> also used 'set dfs.block.size=8388608;', not working either.
>
> Then I tried a vanilla map-reduce job to do the same thing. It initially uses 
> 3 mappers, and when I set mapred.max.split.size, it uses 19. So the problem 
> lies in Hive, I suppose.
>
> I read some of the Hive source code, like CombineHiveInputFormat, ExecDriver, 
> etc. can't find a clue.
>
> What else settings can I use?
>
> Thanks in advance.
>
> Jerry


Re: Fail to Increase Hive Mapper Tasks?

2014-01-02 Thread Ji ZHANG
Hi Rui,

I combined your suggestion with the answer from
SO(http://stackoverflow.com/questions/20816726/fail-to-increase-hive-mapper-tasks),
and it works:

set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set mapred.map.tasks = 20;
select count(*) from dw_stage.st_dw_marketing_touch_pi_metrics_basic;

It'll use 22 mappers, though I don't know why it's not an exact 20.
And I'm using Hive 0.9 + Hadoop 1.01.

Thank you very much.

Jerry

On Fri, Jan 3, 2014 at 10:51 AM, Sun, Rui  wrote:
> Hi, You can try set mapred.map.tasks = 19.
> It seems that HIVE is using the old Hadoop MapReduce API and so 
> mapred.max.split.size won't work.
>
> -Original Message-
> From: Ji Zhang [mailto:zhangj...@gmail.com]
> Sent: Thursday, January 02, 2014 3:56 PM
> To: user@hive.apache.org
> Subject: Fail to Increase Hive Mapper Tasks?
>
> Hi,
>
> I have a managed Hive table, which contains only one 150MB file. I then do 
> "select count(*) from tbl" to it, and it uses 2 mappers. I want to set it to 
> a bigger number.
>
> First I tried 'set mapred.max.split.size=8388608;', so hopefully it will use 
> 19 mappers. But it only uses 3. Somehow it still split the input by 64MB. I 
> also used 'set dfs.block.size=8388608;', not working either.
>
> Then I tried a vanilla map-reduce job to do the same thing. It initially uses 
> 3 mappers, and when I set mapred.max.split.size, it uses 19. So the problem 
> lies in Hive, I suppose.
>
> I read some of the Hive source code, like CombineHiveInputFormat, ExecDriver, 
> etc. can't find a clue.
>
> What else settings can I use?
>
> Thanks in advance.
>
> Jerry


RE: Fail to Increase Hive Mapper Tasks?

2014-01-02 Thread Sun, Rui
Hi, You can try set mapred.map.tasks = 19.
It seems that HIVE is using the old Hadoop MapReduce API and so 
mapred.max.split.size won't work.

-Original Message-
From: Ji Zhang [mailto:zhangj...@gmail.com] 
Sent: Thursday, January 02, 2014 3:56 PM
To: user@hive.apache.org
Subject: Fail to Increase Hive Mapper Tasks?

Hi,

I have a managed Hive table, which contains only one 150MB file. I then do 
"select count(*) from tbl" to it, and it uses 2 mappers. I want to set it to a 
bigger number.

First I tried 'set mapred.max.split.size=8388608;', so hopefully it will use 19 
mappers. But it only uses 3. Somehow it still split the input by 64MB. I also 
used 'set dfs.block.size=8388608;', not working either.

Then I tried a vanilla map-reduce job to do the same thing. It initially uses 3 
mappers, and when I set mapred.max.split.size, it uses 19. So the problem lies 
in Hive, I suppose.

I read some of the Hive source code, like CombineHiveInputFormat, ExecDriver, 
etc. can't find a clue.

What else settings can I use?

Thanks in advance.

Jerry


Help on loading data stream to hive table.

2014-01-02 Thread Chen Wang
Guys,
I am using storm to read data stream from our socket server, entry by
entry, and then write them to file: one entry per file.  At some point, i
need to import the data into my hive table. There are several approaches i
could think of:
1. directly write to hive hdfs file whenever I get the entry(from our
socket server). The problem is that this could be very inefficient,  since
we have huge amount of data stream, and I would not want to write to hive
hdfs one by one.
Or
2 i can write the entries to files(normal file or hdfs file) on the disk,
and then have a separate job to merge those small files into big one, and
then load them into hive table.
The problem with this is, a) how can I merge small files into big files for
hive? b) what is the best file size to upload to hive?

I am seeking advice on both approaches, and appreciate your insight.
Thanks,
Chen


working with HIVE VARIALBE: Pls suggest

2014-01-02 Thread yogesh dhari
Hello Hive Champs,


I have a case statement, where I need to check the date passed through
parameter,

If date is 1st date of the month then keep it as it as
else
set the parameter date to 1st date of the month.

and then later opretation are being performed on that date into hive quries,


I have wrote this Hive QL

*select case when as_of_dt = ${hivevar:as_of_dt} then ${hivevar:as_of_dt}
else date_sub(${hivevar:as_of_dt} , (day(${hivevar:as_of_dt} )) -1 ) end as
as_of_dt from TABLE group by as_of_dt ;*

O/P of this query is, lets say =  2012-08-01

I want to store the value of this Query into a variable.

like

MY_VARIABLE = (*select case when as_of_dt = ${hivevar:as_of_dt} then
${hivevar:as_of_dt} else date_sub(${hivevar:as_of_dt} ,
(day(${hivevar:as_of_dt} )) -1 ) end as as_of_dt from TABLE group by
as_of_dt; )*




How to achieve that.

Pls suggest,
Thanks in advance


Setting value into hive varialble

2014-01-02 Thread yogesh dhari
Hello All,


I have a case statement, where I need to check the date passed through
parameter,

If date is 1st date of the month then keep it as it as
else
set the parameter date to 1st date of the month.

and then later opretation are being performed on that date into hive quries,


I have wrote this Hive QL

   *select case when as_of_dt = ${hivevar:as_of_dt} then
${hivevar:as_of_dt} else date_sub(${hivevar:as_of_dt} ,
(day(${hivevar:as_of_dt} )) -1 ) end as as_of_dt from TABLE group by
as_of_dt ;*

I want to store the value of this Query int a variable.

like MY_VARIABLE = output of this query;

How to achive that.

Pls suggest,
Thanks in advance


some