Re: help on failed MR jobs (big hive files)

2012-12-12 Thread Mark Grover
Elaine,
Nitin raises some good points.

Continuing on the same lines, let's take a closer look at the query:
insert overwrite table B select a, b, c from table A where
datediff(to_date(from_unixtime(unix_timestamp('${logdate}'))),
request_date) <= 30

In the above query,
"datediff(to_date(from_unixtime(unix_timestamp('${logdate}'))),
request_date)" would cause this set of nested functions to be
evaluated for every record in your 6 GB dataset on the server. It
would be best if this computation was done in your client (bash script
or java code issuing hive queries) so that the query that gets sent to
the Hive server looks like: "request_date >= '2012-01-01' and
request_date < '2012-06-01'"

That would shave off a lot of time. If the performance is still poor,
consider partitioning your data (based on date?), also make sure you
don't suffer from the small file problem:
http://www.cloudera.com/blog/2009/02/the-small-files-problem/

Good luck!
Mark


On Wed, Dec 12, 2012 at 11:36 PM, Nitin Pawar  wrote:
> 6GB size is nothing. We have done it with few TB of data in hive.
> Error you are seeing is on the hadoop side.
>
> You can always optimize your query based on the hadoop compute capacity you
> have got and also based on the pattern in the data you will need to design
> your schema.
>
> The problem here can be you have got a fucntion to execute in the where
> clause. Can you try hard coding them to data range and see if you can get
> any improvements.
>
> Alternatively if you can partition your data on date basis, smaller dataset
> you will have to read.
>
> If you got good size hadoop cluster then lower the split size and launch
> many maps that way it will get executed quickly
>
> by the heapsize increase did you mean increase hive heapsize or hadoop
> mapred heapsize ?  You will need to increase the heapsize on mapred by
> setting the property
> set mapred.job.map.memory.mb=6000;
> set mapred.job.reduce.memory.mb=4000;
>
>
>
> On Wed, Dec 12, 2012 at 3:13 PM, Elaine Gan  wrote:
>>
>> Hi,
>>
>> I'm trying to run a program on Hadoop.
>>
>> [Input] tsv file
>>
>> My program does the following.
>> (1) Load tsv into hive
>>   load data local inpath 'tsvfile' overwrite into table A partitioned
>> by xx
>> (2) insert overwrite table B select a, b, c from table A where
>> datediff(to_date(from_unixtime(unix_timestamp('${logdate}'))), request_date)
>> <= 30
>> (3) Running Mahout
>>
>> In step 2, i am trying to retrieve data from hive for the past month.
>> My hadoop work always stopped here.
>> When i check through my browser utility it says that
>>
>> Diagnostic Info:
>> # of failed Map Tasks exceeded allowed limit. FailedCount: 1.
>> LastFailedTask: task_201211291541_0262_m_001800
>>
>> Task attempt_201211291541_0262_m_001800_0 failed to report status for 1802
>> seconds. Killing!
>> Error: Java heap space
>> Task attempt_201211291541_0262_m_001800_2 failed to report status for 1800
>> seconds. Killing!
>> Task attempt_201211291541_0262_m_001800_3 failed to report status for 1801
>> seconds. Killing!
>>
>>
>>
>> Each hive table is big, around 6 GB.
>>
>> (1) Is it too big to have around 6GB for each hive table?
>> (2) I've increased by HEAPSIZE to 50G,which i think is far more than
>> enough. Any else
>> where i can do the tuning?
>>
>>
>> Thank you.
>>
>>
>>
>> rei
>>
>>
>
>
>
> --
> Nitin Pawar


Re: help on failed MR jobs (big hive files)

2012-12-12 Thread Nitin Pawar
6GB size is nothing. We have done it with few TB of data in hive.
Error you are seeing is on the hadoop side.

You can always optimize your query based on the hadoop compute capacity you
have got and also based on the pattern in the data you will need to design
your schema.

The problem here can be you have got a fucntion to execute in the where
clause. Can you try hard coding them to data range and see if you can get
any improvements.

Alternatively if you can partition your data on date basis, smaller dataset
you will have to read.

If you got good size hadoop cluster then lower the split size and launch
many maps that way it will get executed quickly

by the heapsize increase did you mean increase hive heapsize or hadoop
mapred heapsize ?  You will need to increase the heapsize on mapred by
setting the property
set mapred.job.map.memory.mb=6000;
set mapred.job.reduce.memory.mb=4000;



On Wed, Dec 12, 2012 at 3:13 PM, Elaine Gan  wrote:

> Hi,
>
> I'm trying to run a program on Hadoop.
>
> [Input] tsv file
>
> My program does the following.
> (1) Load tsv into hive
>   load data local inpath 'tsvfile' overwrite into table A partitioned
> by xx
> (2) insert overwrite table B select a, b, c from table A where
> datediff(to_date(from_unixtime(unix_timestamp('${logdate}'))),
> request_date) <= 30
> (3) Running Mahout
>
> In step 2, i am trying to retrieve data from hive for the past month.
> My hadoop work always stopped here.
> When i check through my browser utility it says that
>
> Diagnostic Info:
> # of failed Map Tasks exceeded allowed limit. FailedCount: 1.
> LastFailedTask: task_201211291541_0262_m_001800
>
> Task attempt_201211291541_0262_m_001800_0 failed to report status for 1802
> seconds. Killing!
> Error: Java heap space
> Task attempt_201211291541_0262_m_001800_2 failed to report status for 1800
> seconds. Killing!
> Task attempt_201211291541_0262_m_001800_3 failed to report status for 1801
> seconds. Killing!
>
>
>
> Each hive table is big, around 6 GB.
>
> (1) Is it too big to have around 6GB for each hive table?
> (2) I've increased by HEAPSIZE to 50G,which i think is far more than
> enough. Any else
> where i can do the tuning?
>
>
> Thank you.
>
>
>
> rei
>
>
>


-- 
Nitin Pawar


Re: REST API for Hive queries?

2012-12-12 Thread Nitin Pawar
Hive takes a longer time to respond to queries as the data gets larger.

Best way to handle this is you process the data on hive and store in some
rdbms like mysql etc.
On top of that then you can write your own API or use pentaho like
interface where they can write the queries or see predefined reports.

Alternatively, pentaho does have hive connection as well. There are other
platforms such as talend, datameer etc. You can have a look at them


On Thu, Dec 13, 2012 at 1:15 AM, Leena Gupta  wrote:

> Hi,
>
> We are using Hive as our data warehouse to run various queries on large
> amounts of data. There are some users who would like to get access to the
> output of these queries and display the data on an existing UI application.
> What is the best way to give them the output of these queries? Should we
> write REST APIs that the Front end can call to get the data? How can this
> be done?
>  I'd like to know what have other people done to meet this requirement ?
> Any pointers would be very helpful.
> Thanks.
>



-- 
Nitin Pawar


Re: map side join with group by

2012-12-12 Thread Nitin Pawar
I think Chen wanted to know why this is two phased query if I understood it
correctly

When you run a mapside join .. it just performs the join query .. after
that to execute the group by part it launches the second job.
I may be wrong but this is how I saw it whenever I executed group by
queries


On Thu, Dec 13, 2012 at 7:11 AM, Mark Grover wrote:

> Hi Chen,
> I think we would need some more information.
>
> The query is referring to a table called "d" in the MAPJOIN hint but
> there is not such table in the query. Moreover, Map joins only make
> sense when the right table is the one being "mapped" (in other words,
> being kept in memory) in case of a Left Outer Join, similarly if the
> left table is the one being "mapped" in case of a Right Outer Join.
> Let me know if this is not clear, I'd be happy to offer a better
> explanation.
>
> In your query, the where clause on a column called "hour", at this
> point I am unsure if that's a column of table1 or table2. If it's
> column on table1, that predicate would get pushed up (if you have
> hive.optimize.ppd property set to true), so it could possibly be done
> in 1 MR job (I am not sure if that's presently the case, you will have
> to check the explain plan). If however, the where clause is on a
> column in the right table (table2 in your example), it can't be pushed
> up since a column of the right table can have different values before
> and after the LEFT OUTER JOIN. Therefore, the where clause would need
> to be applied in a separate MR job.
>
> This is just my understanding, the full proof answer would lie in
> checking out the explain plans and the Semantic Analyzer code.
>
> And for completeness, there is a conditional task (starting Hive 0.7)
> that will convert your joins automatically to map joins where
> applicable. This can be enabled by enabling hive.auto.convert.join
> property.
>
> Mark
>
> On Wed, Dec 12, 2012 at 3:32 PM, Chen Song  wrote:
> > I have a silly question on how Hive interpretes a simple query with both
> map
> > side join and group by.
> >
> > Below query will translate into two jobs, with the 1st one as a map only
> job
> > doing the join and storing the output in a intermediary location, and the
> > 2nd one as a map-reduce job taking the output of the 1st job as input and
> > doing the group by.
> >
> > SELECT
> > /*+ MAPJOIN(d) */
> > table.a, sum(table2.b)
> > from table
> > LEFT OUTER JOIN table2
> > ON table.id = table2.id
> > where hour = '2012-12-11 11'
> > group by table.a
> >
> > Why can't this be done within a single map reduce job? As what I can see
> > from the query plan is that all 2nd job mapper do is taking the 1st job's
> > mapper output.
> >
> > --
> > Chen Song
> >
> >
>



-- 
Nitin Pawar


Re: Array index support non-constant expresssion

2012-12-12 Thread Navis류승우
Different error messages but seemed from same problem.

Could you do that with later versions of hive? I think these kind of
bugs are fixed.

2012/12/13 java8964 java8964 :
> ExprNodeGenericFuncEvaluator


Re: Hive Thrift upgrade to 0.9.0

2012-12-12 Thread Shreepadma Venugopalan
On Tue, Dec 11, 2012 at 12:07 PM, Shangzhong zhu wrote:

> We are using Hive 0.9.0, and we have seen frequent Thrift Metastore
> timeout issues probably due to the Thrift memory leak reported in
> THRIFT-1468.
>
> The current solution is to upgrade Thrift to 0.9.0
>
> I am trying to use the patch (HIVE-2715). But seems the patch only works
> for Hive trunk (0.10.0). Saw a lot of missing files when I applied the
> patch to 0.9.0.
>

You probably saw new generated thrift files for features that were added to
trunk after  0.9.0.


>
> Do we have a patch available for Hive 0.9.0? Or what is the recommended
> apporach to upgrade to Thrift 0.9.0?
>
Currently, we dont have a patch for 0.9.0. The best way I can think of is
to regenerate the thrift files for 0.9.0 using the thrift 0.9 compiler.


>
> Thanks,
> Shanzhong
>

Thanks.
Shreepadma


RE: Array index support non-constant expresssion

2012-12-12 Thread java8964 java8964

Hi, Navis:
If I disable both CP/PPD, it will be worse, as neither 1) or 2) query works.
But interested thing is that for both queries, I got same error message, but 
different one comparing with my original error message:
2012-12-12 20:36:21,362 WARN org.apache.hadoop.mapred.Child: Error running 
childjava.lang.RuntimeException: Error in configuring objectat 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)  
  at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)at 
org.apache.hadoop.mapred.Child$4.run(Child.java:270)at 
java.security.AccessController.doPrivileged(Native Method)at 
javax.security.auth.Subject.doAs(Subject.java:396)at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)Caused by: 
java.lang.reflect.InvocationTargetExceptionat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)  
  ... 9 moreCaused by: java.lang.RuntimeException: Error in configuring object  
  at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)  
  at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 14 moreCaused by: java.lang.reflect.InvocationTargetExceptionat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)  
  ... 17 moreCaused by: java.lang.RuntimeException: Map operator initialization 
failedat 
org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
... 22 moreCaused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector
 cannot be cast to 
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspectorat 
org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:60)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:77)
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878)   
 at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)   
 at org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:374) 
   at 
org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.initializeOp(LateralViewJoinOperator.java:109)
at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)   
 at org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:374) 
   at 
org.apache.hadoop.hive.ql.exec.UDTFOperator.initializeOp(UDTFOperator.java:85)  
  at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)  
  at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)  
  at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)   
 at 
org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:62)
at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)   
 at org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:374) 
   at org.apache.hadoop.hive.ql.exec.Operator

Re: map side join with group by

2012-12-12 Thread Mark Grover
Hi Chen,
I think we would need some more information.

The query is referring to a table called "d" in the MAPJOIN hint but
there is not such table in the query. Moreover, Map joins only make
sense when the right table is the one being "mapped" (in other words,
being kept in memory) in case of a Left Outer Join, similarly if the
left table is the one being "mapped" in case of a Right Outer Join.
Let me know if this is not clear, I'd be happy to offer a better
explanation.

In your query, the where clause on a column called "hour", at this
point I am unsure if that's a column of table1 or table2. If it's
column on table1, that predicate would get pushed up (if you have
hive.optimize.ppd property set to true), so it could possibly be done
in 1 MR job (I am not sure if that's presently the case, you will have
to check the explain plan). If however, the where clause is on a
column in the right table (table2 in your example), it can't be pushed
up since a column of the right table can have different values before
and after the LEFT OUTER JOIN. Therefore, the where clause would need
to be applied in a separate MR job.

This is just my understanding, the full proof answer would lie in
checking out the explain plans and the Semantic Analyzer code.

And for completeness, there is a conditional task (starting Hive 0.7)
that will convert your joins automatically to map joins where
applicable. This can be enabled by enabling hive.auto.convert.join
property.

Mark

On Wed, Dec 12, 2012 at 3:32 PM, Chen Song  wrote:
> I have a silly question on how Hive interpretes a simple query with both map
> side join and group by.
>
> Below query will translate into two jobs, with the 1st one as a map only job
> doing the join and storing the output in a intermediary location, and the
> 2nd one as a map-reduce job taking the output of the 1st job as input and
> doing the group by.
>
> SELECT
> /*+ MAPJOIN(d) */
> table.a, sum(table2.b)
> from table
> LEFT OUTER JOIN table2
> ON table.id = table2.id
> where hour = '2012-12-11 11'
> group by table.a
>
> Why can't this be done within a single map reduce job? As what I can see
> from the query plan is that all 2nd job mapper do is taking the 1st job's
> mapper output.
>
> --
> Chen Song
>
>


Re: Array index support non-constant expresssion

2012-12-12 Thread Navis류승우
Could you try it with CP/PPD disabled?

set hive.optimize.cp=false;
set hive.optimize.ppd=false;

2012/12/13 java8964 java8964 :
> Hi,
>
> I played my query further, and found out it is very puzzle to explain the
> following behaviors:
>
> 1) The following query works:
>
> select c_poi.provider_str, c_poi.name from (select darray(search_results,
> c.rank) as c_poi from nulf_search lateral view explode(search_clicks)
> clickTable as c) a
>
> I get get all the result from the above query without any problem.
>
> 2) The following query NOT works:
>
> select c_poi.provider_str, c_poi.name from (select darray(search_results,
> c.rank) as c_poi from nulf_search lateral view explode(search_clicks)
> clickTable as c) a where c_poi.provider_str = 'POI'
>
> As long as I add the where criteria on provider_str, or even I added another
> level of sub query like following:
>
> select
> ps, name
> from
> (select c_poi.provider_str as ps, c_poi.name as name from (select
> darray(search_results, c.rank) as c_poi from nulf_search lateral view
> explode(search_clicks) clickTable as c) a ) b
> where ps = 'POI'
>
> any kind of criteria I tried to add on provider_str, the hive MR jobs failed
> in the same error I shown below.
>
> Any idea why this happened? Is it related to the data? But provider_str is
> just a simple String type.
>
> Thanks
>
> Yong
>
> 
> From: java8...@hotmail.com
> To: user@hive.apache.org
> Subject: RE: Array index support non-constant expresssion
> Date: Wed, 12 Dec 2012 12:15:27 -0500
>
>
> OK.
>
> I followed the hive source code of
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFArrayContains and wrote the
> UDF. It is quite simple.
>
> It works fine as I expected for simple case, but when I try to run it under
> some complex query, the hive MR jobs failed with some strange errors. What I
> mean is that it failed in HIVE code base, from stuck trace, I can not see
> this failure has anything to do with my custom code.
>
> I would like some help if some one can tell me what went wrong.
>
> For example, I created this UDF called darray, stand for dynamic array,
> which supports the non-constant value as the index location of the array.
>
> The following query works fine as I expected:
>
> hive> select c_poi.provider_str as provider_str, c_poi.name as name from
> (select darray(search_results, c.index_loc) as c_poi from search_table
> lateral view explode(search_clicks) clickTable as c) a limit 5;
> POI 
> ADDRESS   some address
> POI
> POI
> ADDRESSS some address
>
> Of course, in this case, I only want the provider_str = 'POI' returned, and
> filter out any rows with provider_str != 'POI', so it sounds simple, I
> changed the query to the following:
>
> hive> select c_poi.provider_str as provider_str, c_poi.name as name from
> (select darray(search_results, c.rank) as c_poi from search_table lateral
> view explode(search_clicks) clickTable as c) a where c_poi.provider_str =
> 'POI' limit 5;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Cannot run job locally: Input Size (= 178314025) is larger than
> hive.exec.mode.local.auto.inputbytes.max (= 134217728)
> Starting Job = job_201212031001_0100, Tracking URL =
> http://blevine-desktop:50030/jobdetails.jsp?jobid=job_201212031001_0100
> Kill Command = /home/yzhang/hadoop/bin/hadoop job
> -Dmapred.job.tracker=blevine-desktop:8021 -kill job_201212031001_0100
> 2012-12-12 11:45:24,090 Stage-1 map = 0%,  reduce = 0%
> 2012-12-12 11:45:43,173 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201212031001_0100 with errors
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
> I am only add a Where limitation, but to my surprise, the MR jobs generated
> by HIVE failed. I am testing this in my local standalone cluster, which is
> running CDH3U3 release. When I check the hadoop userlog, here is what I got:
>
> 2012-12-12 11:40:22,421 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
> SELECT
> struct<_col0:bigint,_col1:string,_col2:string,_col3:string,_col4:string,_col5:string,_col6:boolean,_col7:boolean,_col8:boolean,_col9:boolean,_col10:boolean,_col11:boolean,_col12:string,_col13:string,_col14:struct,categories_id:array,categories_name:array,lang_raw:string,lang_rose:string,lang:string,viewport:struct>,_col15:struct>>,_col16:array>,_col17:array>,_col18:string,_col19:struct>
> 2012-12-12 11:40:22,440 WARN org.apache.hadoop.mapred.Child: Error running
> child
> java.lang.RuntimeException: Error in configuring object
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.ha

map side join with group by

2012-12-12 Thread Chen Song
I have a silly question on how Hive interpretes a simple query with both
map side join and group by.

Below query will translate into two jobs, with the 1st one as a map only
job doing the join and storing the output in a intermediary location, and
the 2nd one as a map-reduce job taking the output of the 1st job as input
and doing the group by.

SELECT
/*+ MAPJOIN(d) */
table.a, sum(table2.b)
from table
LEFT OUTER JOIN table2
ON table.id = table2.id
where hour = '2012-12-11 11'
group by table.a

Why can't this be done within a single map reduce job? As what I can see
from the query plan is that all 2nd job mapper do is taking the 1st job's
mapper output.

-- 
Chen Song


Re: Map side join

2012-12-12 Thread Souvik Banerjee
Hi Bejoy,

Yes I ran the pi example. It was fine.
Regarding the HIVE Job what I found is that it took 4 hrs for the first map
job to get completed.
Those map tasks were doing their job and only reported status after
completion. It is indeed taking too long time to finish. Nothing I could
find relevant in the logs.

Thanks and regards,
Souvik.

On Wed, Dec 12, 2012 at 8:04 AM,  wrote:

> **
> Hi Souvik
>
> Apart from hive jobs is the normal mapreduce jobs like the wordcount
> running fine on your cluster?
>
> If it is working, for the hive jobs are you seeing anything skeptical in
> task, Tasktracker or jobtracker logs?
>
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> --
> *From: * Souvik Banerjee 
> *Date: *Tue, 11 Dec 2012 17:12:20 -0600
> *To: *; 
> *ReplyTo: * user@hive.apache.org
> *Subject: *Re: Map side join
>
> Hello Everybody,
>
> Need help in for on HIVE join. As we were talking about the Map side join
> I tried that.
> I set the flag set hive.auto.convert.join=true;
>
> I saw Hive converts the same to map join while launching the job. But the
> problem is that none of the map job progresses in my case. I made the
> dataset smaller. Now it's only 512 MB cross 25 MB. I was expecting it to be
> done very quickly.
> No luck with any change of settings.
> Failing to progress with the default setting changes these settings.
> set hive.mapred.local.mem=1024; // Initially it was 216 I guess
> set hive.join.cache.size=10; // Initialliu it was 25000
>
> Also on Hadoop side I made this changes
>
> mapred.child.java.opts -Xmx1073741824
>
> But I don't see any progress. After more than 40 minutes of run I am at 0%
> map completion state.
> Can you please throw some light on this?
>
> Thanks a lot once again.
>
> Regards,
> Souvik.
>
>
>
> On Fri, Dec 7, 2012 at 2:32 PM, Souvik Banerjee 
> wrote:
>
>> Hi Bejoy,
>>
>> That's wonderful. Thanks for your reply.
>> What I was wondering if HIVE can do map side join with more than one
>> condition on JOIN clause.
>> I'll simply try it out and post the result.
>>
>> Thanks once again.
>>
>> Regards,
>> Souvik.
>>
>>  On Fri, Dec 7, 2012 at 2:10 PM,  wrote:
>>
>>> **
>>> Hi Souvik
>>>
>>> In earlier versions of hive you had to give the map join hint. But in
>>> later versions just set hive.auto.convert.join = true;
>>> Hive automatically selects the smaller table. It is better to give the
>>> smaller table as the first one in join.
>>>
>>> You can use a map join if you are joining a small table with a large
>>> one, in terms of data size. By small, better to have the smaller table size
>>> in range of MBs.
>>> Regards
>>> Bejoy KS
>>>
>>> Sent from remote device, Please excuse typos
>>> --
>>> *From: *Souvik Banerjee 
>>> *Date: *Fri, 7 Dec 2012 13:58:25 -0600
>>> *To: *
>>> *ReplyTo: *user@hive.apache.org
>>> *Subject: *Map side join
>>>
>>> Hello everybody,
>>>
>>> I have got a question. I didn't came across any post which says
>>> somethign about this.
>>> I have got two tables. Lets say A and B.
>>> I want to join A & B in HIVE. I am currently using HIVE 0.9 version.
>>> The join would be on few columns. like on (A.id1 = B.id1) AND (A.id2 =
>>> B.id2) AND (A.id3 = B.id3)
>>>
>>> Can I ask HIVE to use map side join in this scenario? Should I give a
>>> hint to HIVE by saying /*+mapjoin(B)*/
>>>
>>> Get back to me if you want any more information in this regard.
>>>
>>> Thanks and regards,
>>> Souvik.
>>>
>>
>>
>


REST API for Hive queries?

2012-12-12 Thread Leena Gupta
Hi,

We are using Hive as our data warehouse to run various queries on large
amounts of data. There are some users who would like to get access to the
output of these queries and display the data on an existing UI application.
What is the best way to give them the output of these queries? Should we
write REST APIs that the Front end can call to get the data? How can this
be done?
 I'd like to know what have other people done to meet this requirement ?
Any pointers would be very helpful.
Thanks.


RE: Array index support non-constant expresssion

2012-12-12 Thread java8964 java8964

Hi,
I played my query further, and found out it is very puzzle to explain the 
following behaviors:
1) The following query works:
select c_poi.provider_str, c_poi.name from (select darray(search_results, 
c.rank) as c_poi from nulf_search lateral view explode(search_clicks) 
clickTable as c) a
I get get all the result from the above query without any problem.
2) The following query NOT works:
select c_poi.provider_str, c_poi.name from (select darray(search_results, 
c.rank) as c_poi from nulf_search lateral view explode(search_clicks) 
clickTable as c) a where c_poi.provider_str = 'POI'
As long as I add the where criteria on provider_str, or even I added another 
level of sub query like following:
selectps, namefrom (select c_poi.provider_str as ps, c_poi.name as name from 
(select darray(search_results, c.rank) as c_poi from nulf_search lateral view 
explode(search_clicks) clickTable as c) a ) bwhere ps = 'POI'
any kind of criteria I tried to add on provider_str, the hive MR jobs failed in 
the same error I shown below.
Any idea why this happened? Is it related to the data? But provider_str is just 
a simple String type.
Thanks
Yong
From: java8...@hotmail.com
To: user@hive.apache.org
Subject: RE: Array index support non-constant expresssion
Date: Wed, 12 Dec 2012 12:15:27 -0500





OK. 
I followed the hive source code of 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFArrayContains and wrote the 
UDF. It is quite simple. 
It works fine as I expected for simple case, but when I try to run it under 
some complex query, the hive MR jobs failed with some strange errors. What I 
mean is that it failed in HIVE code base, from stuck trace, I can not see this 
failure has anything to do with my custom code.
I would like some help if some one can tell me what went wrong.
For example, I created this UDF called darray, stand for dynamic array, which 
supports the non-constant value as the index location of the array.
The following query works fine as I expected:
hive> select c_poi.provider_str as provider_str, c_poi.name as name from 
(select darray(search_results, c.index_loc) as c_poi from search_table lateral 
view explode(search_clicks) clickTable as c) a limit 5;POI  
   ADDRESS   some addressPOIPOI 
   ADDRESSS some address
Of course, in this case, I only want the provider_str = 'POI' returned, and 
filter out any rows with provider_str != 'POI', so it sounds simple, I changed 
the query to the following:
hive> select c_poi.provider_str as provider_str, c_poi.name as name from 
(select darray(search_results, c.rank) as c_poi from search_table lateral view 
explode(search_clicks) clickTable as c) a where c_poi.provider_str = 'POI' 
limit 5;Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks 
is set to 0 since there's no reduce operatorCannot run job locally: Input Size 
(= 178314025) is larger than hive.exec.mode.local.auto.inputbytes.max (= 
134217728)Starting Job = job_201212031001_0100, Tracking URL = 
http://blevine-desktop:50030/jobdetails.jsp?jobid=job_201212031001_0100Kill 
Command = /home/yzhang/hadoop/bin/hadoop job  
-Dmapred.job.tracker=blevine-desktop:8021 -kill job_201212031001_01002012-12-12 
11:45:24,090 Stage-1 map = 0%,  reduce = 0%2012-12-12 11:45:43,173 Stage-1 map 
= 100%,  reduce = 100%Ended Job = job_201212031001_0100 with errorsFAILED: 
Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
I am only add a Where limitation, but to my surprise, the MR jobs generated by 
HIVE failed. I am testing this in my local standalone cluster, which is running 
CDH3U3 release. When I check the hadoop userlog, here is what I got:
2012-12-12 11:40:22,421 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 
SELECT 
struct<_col0:bigint,_col1:string,_col2:string,_col3:string,_col4:string,_col5:string,_col6:boolean,_col7:boolean,_col8:boolean,_col9:boolean,_col10:boolean,_col11:boolean,_col12:string,_col13:string,_col14:struct,categories_id:array,categories_name:array,lang_raw:string,lang_rose:string,lang:string,viewport:struct>,_col15:struct>>,_col16:array>,_col17:array>,_col18:string,_col19:struct>2012-12-12
 11:40:22,440 WARN org.apache.hadoop.mapred.Child: Error running 
childjava.lang.RuntimeException: Error in configuring objectat 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)  
  at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)at 
org.apache.hadoop.mapred.Child$4.run(Child.java:270)at 
java.security.AccessController.doPrivileged(Native Method)at 
javax.security.auth.Subject.doAs(Subject.java:396)at 
org.apache.hadoop.security.UserGroup

RE: Array index support non-constant expresssion

2012-12-12 Thread java8964 java8964

OK. 
I followed the hive source code of 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFArrayContains and wrote the 
UDF. It is quite simple. 
It works fine as I expected for simple case, but when I try to run it under 
some complex query, the hive MR jobs failed with some strange errors. What I 
mean is that it failed in HIVE code base, from stuck trace, I can not see this 
failure has anything to do with my custom code.
I would like some help if some one can tell me what went wrong.
For example, I created this UDF called darray, stand for dynamic array, which 
supports the non-constant value as the index location of the array.
The following query works fine as I expected:
hive> select c_poi.provider_str as provider_str, c_poi.name as name from 
(select darray(search_results, c.index_loc) as c_poi from search_table lateral 
view explode(search_clicks) clickTable as c) a limit 5;POI  
   ADDRESS   some addressPOIPOI 
   ADDRESSS some address
Of course, in this case, I only want the provider_str = 'POI' returned, and 
filter out any rows with provider_str != 'POI', so it sounds simple, I changed 
the query to the following:
hive> select c_poi.provider_str as provider_str, c_poi.name as name from 
(select darray(search_results, c.rank) as c_poi from search_table lateral view 
explode(search_clicks) clickTable as c) a where c_poi.provider_str = 'POI' 
limit 5;Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks 
is set to 0 since there's no reduce operatorCannot run job locally: Input Size 
(= 178314025) is larger than hive.exec.mode.local.auto.inputbytes.max (= 
134217728)Starting Job = job_201212031001_0100, Tracking URL = 
http://blevine-desktop:50030/jobdetails.jsp?jobid=job_201212031001_0100Kill 
Command = /home/yzhang/hadoop/bin/hadoop job  
-Dmapred.job.tracker=blevine-desktop:8021 -kill job_201212031001_01002012-12-12 
11:45:24,090 Stage-1 map = 0%,  reduce = 0%2012-12-12 11:45:43,173 Stage-1 map 
= 100%,  reduce = 100%Ended Job = job_201212031001_0100 with errorsFAILED: 
Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
I am only add a Where limitation, but to my surprise, the MR jobs generated by 
HIVE failed. I am testing this in my local standalone cluster, which is running 
CDH3U3 release. When I check the hadoop userlog, here is what I got:
2012-12-12 11:40:22,421 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 
SELECT 
struct<_col0:bigint,_col1:string,_col2:string,_col3:string,_col4:string,_col5:string,_col6:boolean,_col7:boolean,_col8:boolean,_col9:boolean,_col10:boolean,_col11:boolean,_col12:string,_col13:string,_col14:struct,categories_id:array,categories_name:array,lang_raw:string,lang_rose:string,lang:string,viewport:struct>,_col15:struct>>,_col16:array>,_col17:array>,_col18:string,_col19:struct>2012-12-12
 11:40:22,440 WARN org.apache.hadoop.mapred.Child: Error running 
childjava.lang.RuntimeException: Error in configuring objectat 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)  
  at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)at 
org.apache.hadoop.mapred.Child$4.run(Child.java:270)at 
java.security.AccessController.doPrivileged(Native Method)at 
javax.security.auth.Subject.doAs(Subject.java:396)at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)Caused by: 
java.lang.reflect.InvocationTargetExceptionat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)  
  ... 9 moreCaused by: java.lang.RuntimeException: Error in configuring object  
  at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)  
  at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 14 moreCaused by: java.lang.reflect.InvocationTargetExceptionat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces

Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
have you a page in which you explain the steps.



2012/12/12 Mohammad Tariq 

> Hi Imen,
>
>  I am sorry, I didn't get the question. Are you asking about
> creating a distributed cluster? Yeah, I have done that.
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 7:45 PM, imen Megdiche wrote:
>
>> have you please commented the configuration of hadoop on cluster
>>
>> thanks
>>
>>
>> 2012/12/12 Mohammad Tariq 
>>
>>> You are always welcome. If you still need any help, you can go here :
>>> http://cloudfront.blogspot.in/2012/07/how-to-configure-hadoop.html
>>> I have outlined the entire process here along with few small(but
>>> necessary) explanations.
>>>
>>> Regards,
>>> Mohammad Tariq
>>>
>>>
>>>
>>> On Wed, Dec 12, 2012 at 7:31 PM, imen Megdiche 
>>> wrote:
>>>
 thank you very much you re awsome.

 Fixed


 2012/12/12 Mohammad Tariq 

> Uncomment the property in core-site.xml. That is a must. After doing
> this  you have to restart the daemons?
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 7:08 PM, imen Megdiche <
> imen.megdi...@gmail.com> wrote:
>
>> I changed the files
>> now when i run i have this response :
>>
>> 12/12/12 14:37:33 INFO ipc.Client: Retrying connect to server:
>> localhost/127.0.0.1:9001. Already tried 0 time(s).
>> 12/12/12 14:37:34 INFO ipc.Client: Retrying connect to server:
>> localhost/127.0.0.1:9001. Already tried 1 time(s).
>> 12/12/12 14:37:35 INFO ipc.Client: Retrying connect to server:
>> localhost/127.0.0.1:9001. Already tried 2 time(s).
>> 12/12/12 14:37:36 INFO ipc.Client: Retrying connect to server:
>> localhost/127.0.0.1:9001. Already tried 3 time(s).
>> 12/12/12 14:37:37 INFO ipc.Client: Retrying connect to server:
>> localhost/127.0.0.1:9001. Already tried 4 time(s).
>> 12/12/12 14:37:38 INFO ipc.Client: Retrying connect to server:
>> localhost/127.0.0.1:9001. Already tried 5 time(s).
>> 12/12/12 14:37:39 INFO ipc.Client: Retrying connect to server:
>> localhost/127.0.0.1:9001. Already tried 6 time(s).
>> 12/12/12 14:37:40 INFO ipc.Client: Retrying connect to server:
>> localhost/127.0.0.1:9001. Already tried 7 time(s).
>> 12/12/12 14:37:41 INFO ipc.Client: Retrying connect to server:
>> localhost/127.0.0.1:9001. Already tried 8 time(s).
>> 12/12/12 14:37:42 INFO ipc.Client: Retrying connect to server:
>> localhost/127.0.0.1:9001. Already tried 9 time(s).
>> Exception in thread "main" java.net.ConnectException: Call to
>> localhost/127.0.0.1:9001 failed on connection exception:
>> java.net.ConnectException: Connexion refusée
>> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1099)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1075)
>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>> at org.apache.hadoop.mapred.$Proxy1.getProtocolVersion(Unknown
>> Source)
>> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>> at
>> org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:480)
>> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:474)
>> at org.apache.hadoop.mapred.JobClient.(JobClient.java:457)
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1260)
>> at org.myorg.WordCount.run(WordCount.java:115)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at org.myorg.WordCount.main(WordCount.java:120)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>> at java.lang.reflect.Method.invoke(Unknown Source)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> Caused by: java.net.ConnectException: Connexion refusée
>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>> at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
>> at
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
>> at
>> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
>> at
>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
>> at
>> org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
>> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1206)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>> ... 16 more
>>
>>
>> 2012/12/12 Mohammad Tariq 
>>
>>> dfs.name.dir
>>
>>
>>
>

>>>
>>
>


Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
Hi Imen,

 I am sorry, I didn't get the question. Are you asking about
creating a distributed cluster? Yeah, I have done that.

Regards,
Mohammad Tariq



On Wed, Dec 12, 2012 at 7:45 PM, imen Megdiche wrote:

> have you please commented the configuration of hadoop on cluster
>
> thanks
>
>
> 2012/12/12 Mohammad Tariq 
>
>> You are always welcome. If you still need any help, you can go here :
>> http://cloudfront.blogspot.in/2012/07/how-to-configure-hadoop.html
>> I have outlined the entire process here along with few small(but
>> necessary) explanations.
>>
>> Regards,
>> Mohammad Tariq
>>
>>
>>
>> On Wed, Dec 12, 2012 at 7:31 PM, imen Megdiche 
>> wrote:
>>
>>> thank you very much you re awsome.
>>>
>>> Fixed
>>>
>>>
>>> 2012/12/12 Mohammad Tariq 
>>>
 Uncomment the property in core-site.xml. That is a must. After doing
 this  you have to restart the daemons?

 Regards,
 Mohammad Tariq



 On Wed, Dec 12, 2012 at 7:08 PM, imen Megdiche >>> > wrote:

> I changed the files
> now when i run i have this response :
>
> 12/12/12 14:37:33 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:9001. Already tried 0 time(s).
> 12/12/12 14:37:34 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:9001. Already tried 1 time(s).
> 12/12/12 14:37:35 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:9001. Already tried 2 time(s).
> 12/12/12 14:37:36 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:9001. Already tried 3 time(s).
> 12/12/12 14:37:37 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:9001. Already tried 4 time(s).
> 12/12/12 14:37:38 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:9001. Already tried 5 time(s).
> 12/12/12 14:37:39 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:9001. Already tried 6 time(s).
> 12/12/12 14:37:40 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:9001. Already tried 7 time(s).
> 12/12/12 14:37:41 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:9001. Already tried 8 time(s).
> 12/12/12 14:37:42 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:9001. Already tried 9 time(s).
> Exception in thread "main" java.net.ConnectException: Call to
> localhost/127.0.0.1:9001 failed on connection exception:
> java.net.ConnectException: Connexion refusée
> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1099)
> at org.apache.hadoop.ipc.Client.call(Client.java:1075)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
> at org.apache.hadoop.mapred.$Proxy1.getProtocolVersion(Unknown
> Source)
> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
> at
> org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:480)
> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:474)
> at org.apache.hadoop.mapred.JobClient.(JobClient.java:457)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1260)
> at org.myorg.WordCount.run(WordCount.java:115)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.myorg.WordCount.main(WordCount.java:120)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> at java.lang.reflect.Method.invoke(Unknown Source)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: java.net.ConnectException: Connexion refusée
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
> at
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
> at
> org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1206)
> at org.apache.hadoop.ipc.Client.call(Client.java:1050)
> ... 16 more
>
>
> 2012/12/12 Mohammad Tariq 
>
>> dfs.name.dir
>
>
>

>>>
>>
>


Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
have you please commented the configuration of hadoop on cluster

thanks


2012/12/12 Mohammad Tariq 

> You are always welcome. If you still need any help, you can go here :
> http://cloudfront.blogspot.in/2012/07/how-to-configure-hadoop.html
> I have outlined the entire process here along with few small(but
> necessary) explanations.
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 7:31 PM, imen Megdiche wrote:
>
>> thank you very much you re awsome.
>>
>> Fixed
>>
>>
>> 2012/12/12 Mohammad Tariq 
>>
>>> Uncomment the property in core-site.xml. That is a must. After doing
>>> this  you have to restart the daemons?
>>>
>>> Regards,
>>> Mohammad Tariq
>>>
>>>
>>>
>>> On Wed, Dec 12, 2012 at 7:08 PM, imen Megdiche 
>>> wrote:
>>>
 I changed the files
 now when i run i have this response :

 12/12/12 14:37:33 INFO ipc.Client: Retrying connect to server:
 localhost/127.0.0.1:9001. Already tried 0 time(s).
 12/12/12 14:37:34 INFO ipc.Client: Retrying connect to server:
 localhost/127.0.0.1:9001. Already tried 1 time(s).
 12/12/12 14:37:35 INFO ipc.Client: Retrying connect to server:
 localhost/127.0.0.1:9001. Already tried 2 time(s).
 12/12/12 14:37:36 INFO ipc.Client: Retrying connect to server:
 localhost/127.0.0.1:9001. Already tried 3 time(s).
 12/12/12 14:37:37 INFO ipc.Client: Retrying connect to server:
 localhost/127.0.0.1:9001. Already tried 4 time(s).
 12/12/12 14:37:38 INFO ipc.Client: Retrying connect to server:
 localhost/127.0.0.1:9001. Already tried 5 time(s).
 12/12/12 14:37:39 INFO ipc.Client: Retrying connect to server:
 localhost/127.0.0.1:9001. Already tried 6 time(s).
 12/12/12 14:37:40 INFO ipc.Client: Retrying connect to server:
 localhost/127.0.0.1:9001. Already tried 7 time(s).
 12/12/12 14:37:41 INFO ipc.Client: Retrying connect to server:
 localhost/127.0.0.1:9001. Already tried 8 time(s).
 12/12/12 14:37:42 INFO ipc.Client: Retrying connect to server:
 localhost/127.0.0.1:9001. Already tried 9 time(s).
 Exception in thread "main" java.net.ConnectException: Call to localhost/
 127.0.0.1:9001 failed on connection exception:
 java.net.ConnectException: Connexion refusée
 at org.apache.hadoop.ipc.Client.wrapException(Client.java:1099)
 at org.apache.hadoop.ipc.Client.call(Client.java:1075)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
 at org.apache.hadoop.mapred.$Proxy1.getProtocolVersion(Unknown
 Source)
 at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
 at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
 at
 org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:480)
 at org.apache.hadoop.mapred.JobClient.init(JobClient.java:474)
 at org.apache.hadoop.mapred.JobClient.(JobClient.java:457)
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1260)
 at org.myorg.WordCount.run(WordCount.java:115)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.myorg.WordCount.main(WordCount.java:120)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.net.ConnectException: Connexion refusée
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
 at
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
 at
 org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
 at
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
 at
 org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
 at org.apache.hadoop.ipc.Client.getConnection(Client.java:1206)
 at org.apache.hadoop.ipc.Client.call(Client.java:1050)
 ... 16 more


 2012/12/12 Mohammad Tariq 

> dfs.name.dir



>>>
>>
>


Re: Map side join

2012-12-12 Thread bejoy_ks
Hi Souvik

Apart from hive jobs is the normal mapreduce jobs like the wordcount running 
fine on your cluster?

If it is working, for the hive jobs are you seeing anything skeptical in task, 
Tasktracker or jobtracker logs?


Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-Original Message-
From: Souvik Banerjee 
Date: Tue, 11 Dec 2012 17:12:20 
To: ; 
Reply-To: user@hive.apache.org
Subject: Re: Map side join

Hello Everybody,

Need help in for on HIVE join. As we were talking about the Map side join I
tried that.
I set the flag set hive.auto.convert.join=true;

I saw Hive converts the same to map join while launching the job. But the
problem is that none of the map job progresses in my case. I made the
dataset smaller. Now it's only 512 MB cross 25 MB. I was expecting it to be
done very quickly.
No luck with any change of settings.
Failing to progress with the default setting changes these settings.
set hive.mapred.local.mem=1024; // Initially it was 216 I guess
set hive.join.cache.size=10; // Initialliu it was 25000

Also on Hadoop side I made this changes

mapred.child.java.opts -Xmx1073741824

But I don't see any progress. After more than 40 minutes of run I am at 0%
map completion state.
Can you please throw some light on this?

Thanks a lot once again.

Regards,
Souvik.



On Fri, Dec 7, 2012 at 2:32 PM, Souvik Banerjee wrote:

> Hi Bejoy,
>
> That's wonderful. Thanks for your reply.
> What I was wondering if HIVE can do map side join with more than one
> condition on JOIN clause.
> I'll simply try it out and post the result.
>
> Thanks once again.
>
> Regards,
> Souvik.
>
>  On Fri, Dec 7, 2012 at 2:10 PM,  wrote:
>
>> **
>> Hi Souvik
>>
>> In earlier versions of hive you had to give the map join hint. But in
>> later versions just set hive.auto.convert.join = true;
>> Hive automatically selects the smaller table. It is better to give the
>> smaller table as the first one in join.
>>
>> You can use a map join if you are joining a small table with a large one,
>> in terms of data size. By small, better to have the smaller table size in
>> range of MBs.
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> --
>> *From: *Souvik Banerjee 
>> *Date: *Fri, 7 Dec 2012 13:58:25 -0600
>> *To: *
>> *ReplyTo: *user@hive.apache.org
>> *Subject: *Map side join
>>
>> Hello everybody,
>>
>> I have got a question. I didn't came across any post which says somethign
>> about this.
>> I have got two tables. Lets say A and B.
>> I want to join A & B in HIVE. I am currently using HIVE 0.9 version.
>> The join would be on few columns. like on (A.id1 = B.id1) AND (A.id2 =
>> B.id2) AND (A.id3 = B.id3)
>>
>> Can I ask HIVE to use map side join in this scenario? Should I give a
>> hint to HIVE by saying /*+mapjoin(B)*/
>>
>> Get back to me if you want any more information in this regard.
>>
>> Thanks and regards,
>> Souvik.
>>
>
>



Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
thank you very much you re awsome.

Fixed


2012/12/12 Mohammad Tariq 

> Uncomment the property in core-site.xml. That is a must. After doing this
>  you have to restart the daemons?
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 7:08 PM, imen Megdiche wrote:
>
>> I changed the files
>> now when i run i have this response :
>>
>> 12/12/12 14:37:33 INFO ipc.Client: Retrying connect to server: localhost/
>> 127.0.0.1:9001. Already tried 0 time(s).
>> 12/12/12 14:37:34 INFO ipc.Client: Retrying connect to server: localhost/
>> 127.0.0.1:9001. Already tried 1 time(s).
>> 12/12/12 14:37:35 INFO ipc.Client: Retrying connect to server: localhost/
>> 127.0.0.1:9001. Already tried 2 time(s).
>> 12/12/12 14:37:36 INFO ipc.Client: Retrying connect to server: localhost/
>> 127.0.0.1:9001. Already tried 3 time(s).
>> 12/12/12 14:37:37 INFO ipc.Client: Retrying connect to server: localhost/
>> 127.0.0.1:9001. Already tried 4 time(s).
>> 12/12/12 14:37:38 INFO ipc.Client: Retrying connect to server: localhost/
>> 127.0.0.1:9001. Already tried 5 time(s).
>> 12/12/12 14:37:39 INFO ipc.Client: Retrying connect to server: localhost/
>> 127.0.0.1:9001. Already tried 6 time(s).
>> 12/12/12 14:37:40 INFO ipc.Client: Retrying connect to server: localhost/
>> 127.0.0.1:9001. Already tried 7 time(s).
>> 12/12/12 14:37:41 INFO ipc.Client: Retrying connect to server: localhost/
>> 127.0.0.1:9001. Already tried 8 time(s).
>> 12/12/12 14:37:42 INFO ipc.Client: Retrying connect to server: localhost/
>> 127.0.0.1:9001. Already tried 9 time(s).
>> Exception in thread "main" java.net.ConnectException: Call to localhost/
>> 127.0.0.1:9001 failed on connection exception:
>> java.net.ConnectException: Connexion refusée
>> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1099)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1075)
>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>> at org.apache.hadoop.mapred.$Proxy1.getProtocolVersion(Unknown Source)
>> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>> at
>> org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:480)
>> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:474)
>> at org.apache.hadoop.mapred.JobClient.(JobClient.java:457)
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1260)
>> at org.myorg.WordCount.run(WordCount.java:115)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at org.myorg.WordCount.main(WordCount.java:120)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>> at java.lang.reflect.Method.invoke(Unknown Source)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> Caused by: java.net.ConnectException: Connexion refusée
>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>> at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
>> at
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
>> at
>> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
>> at
>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
>> at
>> org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
>> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1206)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>> ... 16 more
>>
>>
>> 2012/12/12 Mohammad Tariq 
>>
>>> dfs.name.dir
>>
>>
>>
>


Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
I wonder how you are able to run the job without a JT. You must have this
on your mapred-site.xml file :

mapred.job.tracker
 localhost:9001


Also add "hadoop.tmp.dir" in core-site.xml, and "dfs.name.dir" &
"dfs.data.dir" in hdfs-site.xml.

Regards,
Mohammad Tariq



On Wed, Dec 12, 2012 at 6:46 PM, imen Megdiche wrote:

> For mapred-site.xml :
>
> 
>
> 
> mapred.map.tasks
> 6
> 
>
> 
>
> for core-site.xml :
> 
>
> 
>
> 
>
>  on hdfs-site.xml  nothing
>
>
>
>
>
> 2012/12/12 Mohammad Tariq 
>
>> Can I have a look at your config files?
>>
>> Regards,
>> Mohammad Tariq
>>
>>
>>
>> On Wed, Dec 12, 2012 at 6:31 PM, imen Megdiche 
>> wrote:
>>
>>> i run the start-all.sh and all daemons starts without problems. But i
>>> the log of the tasktracker look like this :
>>>
>>>
>>> 2012-12-12 13:53:45,495 INFO org.apache.hadoop.mapred.TaskTracker:
>>> STARTUP_MSG:
>>> /
>>> STARTUP_MSG: Starting TaskTracker
>>> STARTUP_MSG:   host = megdiche-OptiPlex-GX280/127.0.1.1
>>> STARTUP_MSG:   args = []
>>> STARTUP_MSG:   version = 1.0.4
>>> STARTUP_MSG:   build =
>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
>>> 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
>>> /
>>> 2012-12-12 13:53:47,009 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>>> hadoop-metrics2.properties
>>> 2012-12-12 13:53:47,331 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>>> MetricsSystem,sub=Stats registered.
>>> 2012-12-12 13:53:47,336 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>>> period at 10 second(s).
>>> 2012-12-12 13:53:47,336 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics
>>> system started
>>> 2012-12-12 13:53:48,165 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
>>> registered.
>>> 2012-12-12 13:53:48,192 WARN
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
>>> exists!
>>> 2012-12-12 13:53:48,513 ERROR org.apache.hadoop.mapred.TaskTracker: Can
>>> not start task tracker because java.lang.IllegalArgumentException: Does not
>>> contain a valid host:port authority: local
>>> at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162)
>>> at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
>>> at
>>> org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2560)
>>> at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:1426)
>>> at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3742)
>>>
>>> 2012-12-12 13:53:48,519 INFO org.apache.hadoop.mapred.TaskTracker:
>>> SHUTDOWN_MSG:
>>> /
>>> SHUTDOWN_MSG: Shutting down TaskTracker at megdiche-OptiPlex-GX280/
>>> 127.0.1.1
>>> /
>>>
>>>
>>>
>>>
>>> 2012/12/12 Mohammad Tariq 
>>>
 I would check if all the daemons are running properly or not, before
 anything else. If some problem is found, next place to track is the log of
 each daemon.

 The correct command to check the status of a job from command line is :
 hadoop job -status jobID.
 (Mind the 'space' after job and remove 'command' from the statement)

 HTH

 Regards,
 Mohammad Tariq



 On Wed, Dec 12, 2012 at 6:14 PM, imen Megdiche >>> > wrote:

> My goal is to analyze the response time of MapReduce depending on the size
> of the input files. I need to change the number of map and / or Reduce
> tasks and recover the execution time. S it turns out that nothing
> works locally on my pc :
> neither hadoop job-status command job_local_0001 (which return no job
> found )
> nor localhost: 50030
> I will be very grateful if you can help m better understand these
> problem
>
>
> 2012/12/12 Mohammad Tariq 
>
>> Are you working locally?What exactly is the issue?
>>
>> Regards,
>> Mohammad Tariq
>>
>>
>>
>> On Wed, Dec 12, 2012 at 6:00 PM, imen Megdiche <
>> imen.megdi...@gmail.com> wrote:
>>
>>> no
>>>
>>>
>>> 2012/12/12 Mohammad Tariq 
>>>
 Any luck with "localhost:50030"??

 Regards,
 Mohammad Tariq



 On Wed, Dec 12, 2012 at 5:53 PM, imen Megdiche <
 imen.megdi...@gmail.com> wrote:

> i run the job through the command line
>
>
> 2012/12/12 Mohammad Tariq 
>
>> You have to replace "JobTrackerHost" in "JobTrackerHost:50030"
>> with the actual name of the machine where JobTracker is running.
>> For example, If you are working on a local cluster, you have to use

Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
For mapred-site.xml :




mapred.map.tasks
6




for core-site.xml :






 on hdfs-site.xml  nothing





2012/12/12 Mohammad Tariq 

> Can I have a look at your config files?
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 6:31 PM, imen Megdiche wrote:
>
>> i run the start-all.sh and all daemons starts without problems. But i the
>> log of the tasktracker look like this :
>>
>>
>> 2012-12-12 13:53:45,495 INFO org.apache.hadoop.mapred.TaskTracker:
>> STARTUP_MSG:
>> /
>> STARTUP_MSG: Starting TaskTracker
>> STARTUP_MSG:   host = megdiche-OptiPlex-GX280/127.0.1.1
>> STARTUP_MSG:   args = []
>> STARTUP_MSG:   version = 1.0.4
>> STARTUP_MSG:   build =
>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
>> 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
>> /
>> 2012-12-12 13:53:47,009 INFO
>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>> hadoop-metrics2.properties
>> 2012-12-12 13:53:47,331 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>> MetricsSystem,sub=Stats registered.
>> 2012-12-12 13:53:47,336 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>> period at 10 second(s).
>> 2012-12-12 13:53:47,336 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics
>> system started
>> 2012-12-12 13:53:48,165 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
>> registered.
>> 2012-12-12 13:53:48,192 WARN
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
>> exists!
>> 2012-12-12 13:53:48,513 ERROR org.apache.hadoop.mapred.TaskTracker: Can
>> not start task tracker because java.lang.IllegalArgumentException: Does not
>> contain a valid host:port authority: local
>> at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162)
>> at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
>> at
>> org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2560)
>> at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:1426)
>> at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3742)
>>
>> 2012-12-12 13:53:48,519 INFO org.apache.hadoop.mapred.TaskTracker:
>> SHUTDOWN_MSG:
>> /
>> SHUTDOWN_MSG: Shutting down TaskTracker at megdiche-OptiPlex-GX280/
>> 127.0.1.1
>> /
>>
>>
>>
>>
>> 2012/12/12 Mohammad Tariq 
>>
>>> I would check if all the daemons are running properly or not, before
>>> anything else. If some problem is found, next place to track is the log of
>>> each daemon.
>>>
>>> The correct command to check the status of a job from command line is :
>>> hadoop job -status jobID.
>>> (Mind the 'space' after job and remove 'command' from the statement)
>>>
>>> HTH
>>>
>>> Regards,
>>> Mohammad Tariq
>>>
>>>
>>>
>>> On Wed, Dec 12, 2012 at 6:14 PM, imen Megdiche 
>>> wrote:
>>>
 My goal is to analyze the response time of MapReduce depending on the size
 of the input files. I need to change the number of map and / or Reduce
 tasks and recover the execution time. S it turns out that nothing works 
 locally
 on my pc :
 neither hadoop job-status command job_local_0001 (which return no job
 found )
 nor localhost: 50030
 I will be very grateful if you can help m better understand these
 problem


 2012/12/12 Mohammad Tariq 

> Are you working locally?What exactly is the issue?
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 6:00 PM, imen Megdiche <
> imen.megdi...@gmail.com> wrote:
>
>> no
>>
>>
>> 2012/12/12 Mohammad Tariq 
>>
>>> Any luck with "localhost:50030"??
>>>
>>> Regards,
>>> Mohammad Tariq
>>>
>>>
>>>
>>> On Wed, Dec 12, 2012 at 5:53 PM, imen Megdiche <
>>> imen.megdi...@gmail.com> wrote:
>>>
 i run the job through the command line


 2012/12/12 Mohammad Tariq 

> You have to replace "JobTrackerHost" in "JobTrackerHost:50030"
> with the actual name of the machine where JobTracker is running.
> For example, If you are working on a local cluster, you have to use
> "localhost:50030".
>
> Are you running your job through the command line or some IDE?
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 5:42 PM, imen Megdiche <
> imen.megdi...@gmail.com> wrote:
>
>> excuse me the data size is 98 MB
>>
>>
>> 2012/12/12 imen Megdiche 
>>
>>> the size of data 49 MB and n of map 4
>>> the web UI JobTrackerH

Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
Can I have a look at your config files?

Regards,
Mohammad Tariq



On Wed, Dec 12, 2012 at 6:31 PM, imen Megdiche wrote:

> i run the start-all.sh and all daemons starts without problems. But i the
> log of the tasktracker look like this :
>
>
> 2012-12-12 13:53:45,495 INFO org.apache.hadoop.mapred.TaskTracker:
> STARTUP_MSG:
> /
> STARTUP_MSG: Starting TaskTracker
> STARTUP_MSG:   host = megdiche-OptiPlex-GX280/127.0.1.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 1.0.4
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
> 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
> /
> 2012-12-12 13:53:47,009 INFO
> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
> hadoop-metrics2.properties
> 2012-12-12 13:53:47,331 INFO
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
> MetricsSystem,sub=Stats registered.
> 2012-12-12 13:53:47,336 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
> period at 10 second(s).
> 2012-12-12 13:53:47,336 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics
> system started
> 2012-12-12 13:53:48,165 INFO
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
> registered.
> 2012-12-12 13:53:48,192 WARN
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
> exists!
> 2012-12-12 13:53:48,513 ERROR org.apache.hadoop.mapred.TaskTracker: Can
> not start task tracker because java.lang.IllegalArgumentException: Does not
> contain a valid host:port authority: local
> at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162)
> at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
> at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2560)
> at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:1426)
> at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3742)
>
> 2012-12-12 13:53:48,519 INFO org.apache.hadoop.mapred.TaskTracker:
> SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down TaskTracker at megdiche-OptiPlex-GX280/
> 127.0.1.1
> /
>
>
>
>
> 2012/12/12 Mohammad Tariq 
>
>> I would check if all the daemons are running properly or not, before
>> anything else. If some problem is found, next place to track is the log of
>> each daemon.
>>
>> The correct command to check the status of a job from command line is :
>> hadoop job -status jobID.
>> (Mind the 'space' after job and remove 'command' from the statement)
>>
>> HTH
>>
>> Regards,
>> Mohammad Tariq
>>
>>
>>
>> On Wed, Dec 12, 2012 at 6:14 PM, imen Megdiche 
>> wrote:
>>
>>> My goal is to analyze the response time of MapReduce depending on the size
>>> of the input files. I need to change the number of map and / or Reduce
>>> tasks and recover the execution time. S it turns out that nothing works 
>>> locally
>>> on my pc :
>>> neither hadoop job-status command job_local_0001 (which return no job
>>> found )
>>> nor localhost: 50030
>>> I will be very grateful if you can help m better understand these
>>> problem
>>>
>>>
>>> 2012/12/12 Mohammad Tariq 
>>>
 Are you working locally?What exactly is the issue?

 Regards,
 Mohammad Tariq



 On Wed, Dec 12, 2012 at 6:00 PM, imen Megdiche >>> > wrote:

> no
>
>
> 2012/12/12 Mohammad Tariq 
>
>> Any luck with "localhost:50030"??
>>
>> Regards,
>> Mohammad Tariq
>>
>>
>>
>> On Wed, Dec 12, 2012 at 5:53 PM, imen Megdiche <
>> imen.megdi...@gmail.com> wrote:
>>
>>> i run the job through the command line
>>>
>>>
>>> 2012/12/12 Mohammad Tariq 
>>>
 You have to replace "JobTrackerHost" in "JobTrackerHost:50030"
 with the actual name of the machine where JobTracker is running.
 For example, If you are working on a local cluster, you have to use
 "localhost:50030".

 Are you running your job through the command line or some IDE?

 Regards,
 Mohammad Tariq



 On Wed, Dec 12, 2012 at 5:42 PM, imen Megdiche <
 imen.megdi...@gmail.com> wrote:

> excuse me the data size is 98 MB
>
>
> 2012/12/12 imen Megdiche 
>
>> the size of data 49 MB and n of map 4
>> the web UI JobTrackerHost:50030 does not wok, what should i do to
>> make this appear , i work on ubuntu
>>
>>
>> 2012/12/12 Mohammad Tariq 
>>
>>> Hi Imen,
>>>
>>>  You can visit the MR web UI at "JobTrackerHost:50030" and
>>> see all the useful information li

Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
i run the start-all.sh and all daemons starts without problems. But i the
log of the tasktracker look like this :


2012-12-12 13:53:45,495 INFO org.apache.hadoop.mapred.TaskTracker:
STARTUP_MSG:
/
STARTUP_MSG: Starting TaskTracker
STARTUP_MSG:   host = megdiche-OptiPlex-GX280/127.0.1.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.0.4
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
/
2012-12-12 13:53:47,009 INFO org.apache.hadoop.metrics2.impl.MetricsConfig:
loaded properties from hadoop-metrics2.properties
2012-12-12 13:53:47,331 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
MetricsSystem,sub=Stats registered.
2012-12-12 13:53:47,336 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
period at 10 second(s).
2012-12-12 13:53:47,336 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics
system started
2012-12-12 13:53:48,165 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
registered.
2012-12-12 13:53:48,192 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
exists!
2012-12-12 13:53:48,513 ERROR org.apache.hadoop.mapred.TaskTracker: Can not
start task tracker because java.lang.IllegalArgumentException: Does not
contain a valid host:port authority: local
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2560)
at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:1426)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3742)

2012-12-12 13:53:48,519 INFO org.apache.hadoop.mapred.TaskTracker:
SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down TaskTracker at megdiche-OptiPlex-GX280/127.0.1.1
/




2012/12/12 Mohammad Tariq 

> I would check if all the daemons are running properly or not, before
> anything else. If some problem is found, next place to track is the log of
> each daemon.
>
> The correct command to check the status of a job from command line is :
> hadoop job -status jobID.
> (Mind the 'space' after job and remove 'command' from the statement)
>
> HTH
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 6:14 PM, imen Megdiche wrote:
>
>> My goal is to analyze the response time of MapReduce depending on the size
>> of the input files. I need to change the number of map and / or Reduce
>> tasks and recover the execution time. S it turns out that nothing works 
>> locally
>> on my pc :
>> neither hadoop job-status command job_local_0001 (which return no job
>> found )
>> nor localhost: 50030
>> I will be very grateful if you can help m better understand these problem
>>
>>
>> 2012/12/12 Mohammad Tariq 
>>
>>> Are you working locally?What exactly is the issue?
>>>
>>> Regards,
>>> Mohammad Tariq
>>>
>>>
>>>
>>> On Wed, Dec 12, 2012 at 6:00 PM, imen Megdiche 
>>> wrote:
>>>
 no


 2012/12/12 Mohammad Tariq 

> Any luck with "localhost:50030"??
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 5:53 PM, imen Megdiche <
> imen.megdi...@gmail.com> wrote:
>
>> i run the job through the command line
>>
>>
>> 2012/12/12 Mohammad Tariq 
>>
>>> You have to replace "JobTrackerHost" in "JobTrackerHost:50030" with
>>> the actual name of the machine where JobTracker is running. For
>>> example, If you are working on a local cluster, you have to use
>>> "localhost:50030".
>>>
>>> Are you running your job through the command line or some IDE?
>>>
>>> Regards,
>>> Mohammad Tariq
>>>
>>>
>>>
>>> On Wed, Dec 12, 2012 at 5:42 PM, imen Megdiche <
>>> imen.megdi...@gmail.com> wrote:
>>>
 excuse me the data size is 98 MB


 2012/12/12 imen Megdiche 

> the size of data 49 MB and n of map 4
> the web UI JobTrackerHost:50030 does not wok, what should i do to
> make this appear , i work on ubuntu
>
>
> 2012/12/12 Mohammad Tariq 
>
>> Hi Imen,
>>
>>  You can visit the MR web UI at "JobTrackerHost:50030" and
>> see all the useful information like no. of mappers, no of reducers, 
>> time
>> taken  for the execution etc.
>>
>> One quick question for you, what is the size of your data and
>> what is the no of maps which you are getting right now?
>>
>> Regards,
>> Mohammad Tariq
>>
>>

Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
I would check if all the daemons are running properly or not, before
anything else. If some problem is found, next place to track is the log of
each daemon.

The correct command to check the status of a job from command line is :
hadoop job -status jobID.
(Mind the 'space' after job and remove 'command' from the statement)

HTH

Regards,
Mohammad Tariq



On Wed, Dec 12, 2012 at 6:14 PM, imen Megdiche wrote:

> My goal is to analyze the response time of MapReduce depending on the size
> of the input files. I need to change the number of map and / or Reduce
> tasks and recover the execution time. S it turns out that nothing works 
> locally
> on my pc :
> neither hadoop job-status command job_local_0001 (which return no job
> found )
> nor localhost: 50030
> I will be very grateful if you can help m better understand these problem
>
>
> 2012/12/12 Mohammad Tariq 
>
>> Are you working locally?What exactly is the issue?
>>
>> Regards,
>> Mohammad Tariq
>>
>>
>>
>> On Wed, Dec 12, 2012 at 6:00 PM, imen Megdiche 
>> wrote:
>>
>>> no
>>>
>>>
>>> 2012/12/12 Mohammad Tariq 
>>>
 Any luck with "localhost:50030"??

 Regards,
 Mohammad Tariq



 On Wed, Dec 12, 2012 at 5:53 PM, imen Megdiche >>> > wrote:

> i run the job through the command line
>
>
> 2012/12/12 Mohammad Tariq 
>
>> You have to replace "JobTrackerHost" in "JobTrackerHost:50030" with
>> the actual name of the machine where JobTracker is running. For
>> example, If you are working on a local cluster, you have to use
>> "localhost:50030".
>>
>> Are you running your job through the command line or some IDE?
>>
>> Regards,
>> Mohammad Tariq
>>
>>
>>
>> On Wed, Dec 12, 2012 at 5:42 PM, imen Megdiche <
>> imen.megdi...@gmail.com> wrote:
>>
>>> excuse me the data size is 98 MB
>>>
>>>
>>> 2012/12/12 imen Megdiche 
>>>
 the size of data 49 MB and n of map 4
 the web UI JobTrackerHost:50030 does not wok, what should i do to
 make this appear , i work on ubuntu


 2012/12/12 Mohammad Tariq 

> Hi Imen,
>
>  You can visit the MR web UI at "JobTrackerHost:50030" and see
> all the useful information like no. of mappers, no of reducers, time 
> taken
>  for the execution etc.
>
> One quick question for you, what is the size of your data and what
> is the no of maps which you are getting right now?
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 5:11 PM, imen Megdiche <
> imen.megdi...@gmail.com> wrote:
>
>> Thank you Mohammad but the number of map tasks still the same in
>> the execution. Do you know how to capture the time spent on 
>> execution.
>>
>>
>> 2012/12/12 Mohammad Tariq 
>>
>>> Hi Imen,
>>>
>>> You can add "mapred.map.tasks" property in your
>>> mapred-site.xml file.
>>>
>>> But, it is just a hint for the InputFormat. Actually no. of maps
>>> is actually determined by the no of InputSplits created by the 
>>> InputFormat.
>>>
>>> HTH
>>>
>>> Regards,
>>> Mohammad Tariq
>>>
>>>
>>>
>>> On Wed, Dec 12, 2012 at 4:11 PM, imen Megdiche <
>>> imen.megdi...@gmail.com> wrote:
>>>
 Hi,

 I try to force the number of map for the mapreduce job with the
 command :
   public static void main(String[] args) throws Exception {

   JobConf conf = new JobConf(WordCount.class);
  conf.set("mapred.job.tracker", "local");
  conf.set("fs.default.name", "local");
   conf.setJobName("wordcount");

   conf.setOutputKeyClass(Text.class);
  conf.setOutputValueClass(IntWritable.class);

conf.setNumMapTask(6);
   conf.setMapperClass(Map.class);
   conf.setCombinerClass(Reduce.class);
   conf.setReducerClass(Reduce.class);
 ...
 }

 But it doesn t work.
 What can i do to modify the number of map and reduce tasks.

 Thank you

>>>
>>>
>>
>

>>>
>>
>

>>>
>>
>


Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
My goal is to analyze the response time of MapReduce depending on the size
of the input files. I need to change the number of map and / or Reduce tasks
and recover the execution time. S it turns out that nothing works locally on my
pc :
neither hadoop job-status command job_local_0001 (which return no job found
)
nor localhost: 50030
I will be very grateful if you can help m better understand these problem


2012/12/12 Mohammad Tariq 

> Are you working locally?What exactly is the issue?
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 6:00 PM, imen Megdiche wrote:
>
>> no
>>
>>
>> 2012/12/12 Mohammad Tariq 
>>
>>> Any luck with "localhost:50030"??
>>>
>>> Regards,
>>> Mohammad Tariq
>>>
>>>
>>>
>>> On Wed, Dec 12, 2012 at 5:53 PM, imen Megdiche 
>>> wrote:
>>>
 i run the job through the command line


 2012/12/12 Mohammad Tariq 

> You have to replace "JobTrackerHost" in "JobTrackerHost:50030" with
> the actual name of the machine where JobTracker is running. For
> example, If you are working on a local cluster, you have to use
> "localhost:50030".
>
> Are you running your job through the command line or some IDE?
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 5:42 PM, imen Megdiche <
> imen.megdi...@gmail.com> wrote:
>
>> excuse me the data size is 98 MB
>>
>>
>> 2012/12/12 imen Megdiche 
>>
>>> the size of data 49 MB and n of map 4
>>> the web UI JobTrackerHost:50030 does not wok, what should i do to
>>> make this appear , i work on ubuntu
>>>
>>>
>>> 2012/12/12 Mohammad Tariq 
>>>
 Hi Imen,

  You can visit the MR web UI at "JobTrackerHost:50030" and see
 all the useful information like no. of mappers, no of reducers, time 
 taken
  for the execution etc.

 One quick question for you, what is the size of your data and what
 is the no of maps which you are getting right now?

 Regards,
 Mohammad Tariq



 On Wed, Dec 12, 2012 at 5:11 PM, imen Megdiche <
 imen.megdi...@gmail.com> wrote:

> Thank you Mohammad but the number of map tasks still the same in
> the execution. Do you know how to capture the time spent on execution.
>
>
> 2012/12/12 Mohammad Tariq 
>
>> Hi Imen,
>>
>> You can add "mapred.map.tasks" property in your
>> mapred-site.xml file.
>>
>> But, it is just a hint for the InputFormat. Actually no. of maps
>> is actually determined by the no of InputSplits created by the 
>> InputFormat.
>>
>> HTH
>>
>> Regards,
>> Mohammad Tariq
>>
>>
>>
>> On Wed, Dec 12, 2012 at 4:11 PM, imen Megdiche <
>> imen.megdi...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I try to force the number of map for the mapreduce job with the
>>> command :
>>>   public static void main(String[] args) throws Exception {
>>>
>>>   JobConf conf = new JobConf(WordCount.class);
>>>  conf.set("mapred.job.tracker", "local");
>>>  conf.set("fs.default.name", "local");
>>>   conf.setJobName("wordcount");
>>>
>>>   conf.setOutputKeyClass(Text.class);
>>>  conf.setOutputValueClass(IntWritable.class);
>>>
>>>conf.setNumMapTask(6);
>>>   conf.setMapperClass(Map.class);
>>>   conf.setCombinerClass(Reduce.class);
>>>   conf.setReducerClass(Reduce.class);
>>> ...
>>> }
>>>
>>> But it doesn t work.
>>> What can i do to modify the number of map and reduce tasks.
>>>
>>> Thank you
>>>
>>
>>
>

>>>
>>
>

>>>
>>
>


Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
Are you working locally?What exactly is the issue?

Regards,
Mohammad Tariq



On Wed, Dec 12, 2012 at 6:00 PM, imen Megdiche wrote:

> no
>
>
> 2012/12/12 Mohammad Tariq 
>
>> Any luck with "localhost:50030"??
>>
>> Regards,
>> Mohammad Tariq
>>
>>
>>
>> On Wed, Dec 12, 2012 at 5:53 PM, imen Megdiche 
>> wrote:
>>
>>> i run the job through the command line
>>>
>>>
>>> 2012/12/12 Mohammad Tariq 
>>>
 You have to replace "JobTrackerHost" in "JobTrackerHost:50030" with
 the actual name of the machine where JobTracker is running. For
 example, If you are working on a local cluster, you have to use
 "localhost:50030".

 Are you running your job through the command line or some IDE?

 Regards,
 Mohammad Tariq



 On Wed, Dec 12, 2012 at 5:42 PM, imen Megdiche >>> > wrote:

> excuse me the data size is 98 MB
>
>
> 2012/12/12 imen Megdiche 
>
>> the size of data 49 MB and n of map 4
>> the web UI JobTrackerHost:50030 does not wok, what should i do to
>> make this appear , i work on ubuntu
>>
>>
>> 2012/12/12 Mohammad Tariq 
>>
>>> Hi Imen,
>>>
>>>  You can visit the MR web UI at "JobTrackerHost:50030" and see
>>> all the useful information like no. of mappers, no of reducers, time 
>>> taken
>>>  for the execution etc.
>>>
>>> One quick question for you, what is the size of your data and what
>>> is the no of maps which you are getting right now?
>>>
>>> Regards,
>>> Mohammad Tariq
>>>
>>>
>>>
>>> On Wed, Dec 12, 2012 at 5:11 PM, imen Megdiche <
>>> imen.megdi...@gmail.com> wrote:
>>>
 Thank you Mohammad but the number of map tasks still the same in
 the execution. Do you know how to capture the time spent on execution.


 2012/12/12 Mohammad Tariq 

> Hi Imen,
>
> You can add "mapred.map.tasks" property in your
> mapred-site.xml file.
>
> But, it is just a hint for the InputFormat. Actually no. of maps
> is actually determined by the no of InputSplits created by the 
> InputFormat.
>
> HTH
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 4:11 PM, imen Megdiche <
> imen.megdi...@gmail.com> wrote:
>
>> Hi,
>>
>> I try to force the number of map for the mapreduce job with the
>> command :
>>   public static void main(String[] args) throws Exception {
>>
>>   JobConf conf = new JobConf(WordCount.class);
>>  conf.set("mapred.job.tracker", "local");
>>  conf.set("fs.default.name", "local");
>>   conf.setJobName("wordcount");
>>
>>   conf.setOutputKeyClass(Text.class);
>>  conf.setOutputValueClass(IntWritable.class);
>>
>>conf.setNumMapTask(6);
>>   conf.setMapperClass(Map.class);
>>   conf.setCombinerClass(Reduce.class);
>>   conf.setReducerClass(Reduce.class);
>> ...
>> }
>>
>> But it doesn t work.
>> What can i do to modify the number of map and reduce tasks.
>>
>> Thank you
>>
>
>

>>>
>>
>

>>>
>>
>


Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
no


2012/12/12 Mohammad Tariq 

> Any luck with "localhost:50030"??
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 5:53 PM, imen Megdiche wrote:
>
>> i run the job through the command line
>>
>>
>> 2012/12/12 Mohammad Tariq 
>>
>>> You have to replace "JobTrackerHost" in "JobTrackerHost:50030" with the
>>> actual name of the machine where JobTracker is running. For example, If
>>> you are working on a local cluster, you have to use "localhost:50030".
>>>
>>> Are you running your job through the command line or some IDE?
>>>
>>> Regards,
>>> Mohammad Tariq
>>>
>>>
>>>
>>> On Wed, Dec 12, 2012 at 5:42 PM, imen Megdiche 
>>> wrote:
>>>
 excuse me the data size is 98 MB


 2012/12/12 imen Megdiche 

> the size of data 49 MB and n of map 4
> the web UI JobTrackerHost:50030 does not wok, what should i do to make
> this appear , i work on ubuntu
>
>
> 2012/12/12 Mohammad Tariq 
>
>> Hi Imen,
>>
>>  You can visit the MR web UI at "JobTrackerHost:50030" and see
>> all the useful information like no. of mappers, no of reducers, time 
>> taken
>>  for the execution etc.
>>
>> One quick question for you, what is the size of your data and what is
>> the no of maps which you are getting right now?
>>
>> Regards,
>> Mohammad Tariq
>>
>>
>>
>> On Wed, Dec 12, 2012 at 5:11 PM, imen Megdiche <
>> imen.megdi...@gmail.com> wrote:
>>
>>> Thank you Mohammad but the number of map tasks still the same in the
>>> execution. Do you know how to capture the time spent on execution.
>>>
>>>
>>> 2012/12/12 Mohammad Tariq 
>>>
 Hi Imen,

 You can add "mapred.map.tasks" property in your mapred-site.xml
 file.

 But, it is just a hint for the InputFormat. Actually no. of maps is
 actually determined by the no of InputSplits created by the 
 InputFormat.

 HTH

 Regards,
 Mohammad Tariq



 On Wed, Dec 12, 2012 at 4:11 PM, imen Megdiche <
 imen.megdi...@gmail.com> wrote:

> Hi,
>
> I try to force the number of map for the mapreduce job with the
> command :
>   public static void main(String[] args) throws Exception {
>
>   JobConf conf = new JobConf(WordCount.class);
>  conf.set("mapred.job.tracker", "local");
>  conf.set("fs.default.name", "local");
>   conf.setJobName("wordcount");
>
>   conf.setOutputKeyClass(Text.class);
>  conf.setOutputValueClass(IntWritable.class);
>
>conf.setNumMapTask(6);
>   conf.setMapperClass(Map.class);
>   conf.setCombinerClass(Reduce.class);
>   conf.setReducerClass(Reduce.class);
> ...
> }
>
> But it doesn t work.
> What can i do to modify the number of map and reduce tasks.
>
> Thank you
>


>>>
>>
>

>>>
>>
>


Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
Any luck with "localhost:50030"??

Regards,
Mohammad Tariq



On Wed, Dec 12, 2012 at 5:53 PM, imen Megdiche wrote:

> i run the job through the command line
>
>
> 2012/12/12 Mohammad Tariq 
>
>> You have to replace "JobTrackerHost" in "JobTrackerHost:50030" with the
>> actual name of the machine where JobTracker is running. For example, If
>> you are working on a local cluster, you have to use "localhost:50030".
>>
>> Are you running your job through the command line or some IDE?
>>
>> Regards,
>> Mohammad Tariq
>>
>>
>>
>> On Wed, Dec 12, 2012 at 5:42 PM, imen Megdiche 
>> wrote:
>>
>>> excuse me the data size is 98 MB
>>>
>>>
>>> 2012/12/12 imen Megdiche 
>>>
 the size of data 49 MB and n of map 4
 the web UI JobTrackerHost:50030 does not wok, what should i do to make
 this appear , i work on ubuntu


 2012/12/12 Mohammad Tariq 

> Hi Imen,
>
>  You can visit the MR web UI at "JobTrackerHost:50030" and see all
> the useful information like no. of mappers, no of reducers, time taken  
> for
> the execution etc.
>
> One quick question for you, what is the size of your data and what is
> the no of maps which you are getting right now?
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 5:11 PM, imen Megdiche <
> imen.megdi...@gmail.com> wrote:
>
>> Thank you Mohammad but the number of map tasks still the same in the
>> execution. Do you know how to capture the time spent on execution.
>>
>>
>> 2012/12/12 Mohammad Tariq 
>>
>>> Hi Imen,
>>>
>>> You can add "mapred.map.tasks" property in your mapred-site.xml
>>> file.
>>>
>>> But, it is just a hint for the InputFormat. Actually no. of maps is
>>> actually determined by the no of InputSplits created by the InputFormat.
>>>
>>> HTH
>>>
>>> Regards,
>>> Mohammad Tariq
>>>
>>>
>>>
>>> On Wed, Dec 12, 2012 at 4:11 PM, imen Megdiche <
>>> imen.megdi...@gmail.com> wrote:
>>>
 Hi,

 I try to force the number of map for the mapreduce job with the
 command :
   public static void main(String[] args) throws Exception {

   JobConf conf = new JobConf(WordCount.class);
  conf.set("mapred.job.tracker", "local");
  conf.set("fs.default.name", "local");
   conf.setJobName("wordcount");

   conf.setOutputKeyClass(Text.class);
  conf.setOutputValueClass(IntWritable.class);

conf.setNumMapTask(6);
   conf.setMapperClass(Map.class);
   conf.setCombinerClass(Reduce.class);
   conf.setReducerClass(Reduce.class);
 ...
 }

 But it doesn t work.
 What can i do to modify the number of map and reduce tasks.

 Thank you

>>>
>>>
>>
>

>>>
>>
>


Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
i run the job through the command line


2012/12/12 Mohammad Tariq 

> You have to replace "JobTrackerHost" in "JobTrackerHost:50030" with the
> actual name of the machine where JobTracker is running. For example, If
> you are working on a local cluster, you have to use "localhost:50030".
>
> Are you running your job through the command line or some IDE?
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 5:42 PM, imen Megdiche wrote:
>
>> excuse me the data size is 98 MB
>>
>>
>> 2012/12/12 imen Megdiche 
>>
>>> the size of data 49 MB and n of map 4
>>> the web UI JobTrackerHost:50030 does not wok, what should i do to make
>>> this appear , i work on ubuntu
>>>
>>>
>>> 2012/12/12 Mohammad Tariq 
>>>
 Hi Imen,

  You can visit the MR web UI at "JobTrackerHost:50030" and see all
 the useful information like no. of mappers, no of reducers, time taken  for
 the execution etc.

 One quick question for you, what is the size of your data and what is
 the no of maps which you are getting right now?

 Regards,
 Mohammad Tariq



 On Wed, Dec 12, 2012 at 5:11 PM, imen Megdiche >>> > wrote:

> Thank you Mohammad but the number of map tasks still the same in the
> execution. Do you know how to capture the time spent on execution.
>
>
> 2012/12/12 Mohammad Tariq 
>
>> Hi Imen,
>>
>> You can add "mapred.map.tasks" property in your mapred-site.xml
>> file.
>>
>> But, it is just a hint for the InputFormat. Actually no. of maps is
>> actually determined by the no of InputSplits created by the InputFormat.
>>
>> HTH
>>
>> Regards,
>> Mohammad Tariq
>>
>>
>>
>> On Wed, Dec 12, 2012 at 4:11 PM, imen Megdiche <
>> imen.megdi...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I try to force the number of map for the mapreduce job with the
>>> command :
>>>   public static void main(String[] args) throws Exception {
>>>
>>>   JobConf conf = new JobConf(WordCount.class);
>>>  conf.set("mapred.job.tracker", "local");
>>>  conf.set("fs.default.name", "local");
>>>   conf.setJobName("wordcount");
>>>
>>>   conf.setOutputKeyClass(Text.class);
>>>  conf.setOutputValueClass(IntWritable.class);
>>>
>>>conf.setNumMapTask(6);
>>>   conf.setMapperClass(Map.class);
>>>   conf.setCombinerClass(Reduce.class);
>>>   conf.setReducerClass(Reduce.class);
>>> ...
>>> }
>>>
>>> But it doesn t work.
>>> What can i do to modify the number of map and reduce tasks.
>>>
>>> Thank you
>>>
>>
>>
>

>>>
>>
>


Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
You have to replace "JobTrackerHost" in "JobTrackerHost:50030" with the
actual name of the machine where JobTracker is running. For example, If you
are working on a local cluster, you have to use "localhost:50030".

Are you running your job through the command line or some IDE?

Regards,
Mohammad Tariq



On Wed, Dec 12, 2012 at 5:42 PM, imen Megdiche wrote:

> excuse me the data size is 98 MB
>
>
> 2012/12/12 imen Megdiche 
>
>> the size of data 49 MB and n of map 4
>> the web UI JobTrackerHost:50030 does not wok, what should i do to make
>> this appear , i work on ubuntu
>>
>>
>> 2012/12/12 Mohammad Tariq 
>>
>>> Hi Imen,
>>>
>>>  You can visit the MR web UI at "JobTrackerHost:50030" and see all
>>> the useful information like no. of mappers, no of reducers, time taken  for
>>> the execution etc.
>>>
>>> One quick question for you, what is the size of your data and what is
>>> the no of maps which you are getting right now?
>>>
>>> Regards,
>>> Mohammad Tariq
>>>
>>>
>>>
>>> On Wed, Dec 12, 2012 at 5:11 PM, imen Megdiche 
>>> wrote:
>>>
 Thank you Mohammad but the number of map tasks still the same in the
 execution. Do you know how to capture the time spent on execution.


 2012/12/12 Mohammad Tariq 

> Hi Imen,
>
> You can add "mapred.map.tasks" property in your mapred-site.xml
> file.
>
> But, it is just a hint for the InputFormat. Actually no. of maps is
> actually determined by the no of InputSplits created by the InputFormat.
>
> HTH
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 4:11 PM, imen Megdiche <
> imen.megdi...@gmail.com> wrote:
>
>> Hi,
>>
>> I try to force the number of map for the mapreduce job with the
>> command :
>>   public static void main(String[] args) throws Exception {
>>
>>   JobConf conf = new JobConf(WordCount.class);
>>  conf.set("mapred.job.tracker", "local");
>>  conf.set("fs.default.name", "local");
>>   conf.setJobName("wordcount");
>>
>>   conf.setOutputKeyClass(Text.class);
>>  conf.setOutputValueClass(IntWritable.class);
>>
>>conf.setNumMapTask(6);
>>   conf.setMapperClass(Map.class);
>>   conf.setCombinerClass(Reduce.class);
>>   conf.setReducerClass(Reduce.class);
>> ...
>> }
>>
>> But it doesn t work.
>> What can i do to modify the number of map and reduce tasks.
>>
>> Thank you
>>
>
>

>>>
>>
>


Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
excuse me the data size is 98 MB


2012/12/12 imen Megdiche 

> the size of data 49 MB and n of map 4
> the web UI JobTrackerHost:50030 does not wok, what should i do to make
> this appear , i work on ubuntu
>
>
> 2012/12/12 Mohammad Tariq 
>
>> Hi Imen,
>>
>>  You can visit the MR web UI at "JobTrackerHost:50030" and see all
>> the useful information like no. of mappers, no of reducers, time taken  for
>> the execution etc.
>>
>> One quick question for you, what is the size of your data and what is the
>> no of maps which you are getting right now?
>>
>> Regards,
>> Mohammad Tariq
>>
>>
>>
>> On Wed, Dec 12, 2012 at 5:11 PM, imen Megdiche 
>> wrote:
>>
>>> Thank you Mohammad but the number of map tasks still the same in the
>>> execution. Do you know how to capture the time spent on execution.
>>>
>>>
>>> 2012/12/12 Mohammad Tariq 
>>>
 Hi Imen,

 You can add "mapred.map.tasks" property in your mapred-site.xml
 file.

 But, it is just a hint for the InputFormat. Actually no. of maps is
 actually determined by the no of InputSplits created by the InputFormat.

 HTH

 Regards,
 Mohammad Tariq



 On Wed, Dec 12, 2012 at 4:11 PM, imen Megdiche >>> > wrote:

> Hi,
>
> I try to force the number of map for the mapreduce job with the
> command :
>   public static void main(String[] args) throws Exception {
>
>   JobConf conf = new JobConf(WordCount.class);
>  conf.set("mapred.job.tracker", "local");
>  conf.set("fs.default.name", "local");
>   conf.setJobName("wordcount");
>
>   conf.setOutputKeyClass(Text.class);
>  conf.setOutputValueClass(IntWritable.class);
>
>conf.setNumMapTask(6);
>   conf.setMapperClass(Map.class);
>   conf.setCombinerClass(Reduce.class);
>   conf.setReducerClass(Reduce.class);
> ...
> }
>
> But it doesn t work.
> What can i do to modify the number of map and reduce tasks.
>
> Thank you
>


>>>
>>
>


Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
the size of data 49 MB and n of map 4
the web UI JobTrackerHost:50030 does not wok, what should i do to make this
appear , i work on ubuntu


2012/12/12 Mohammad Tariq 

> Hi Imen,
>
>  You can visit the MR web UI at "JobTrackerHost:50030" and see all the
> useful information like no. of mappers, no of reducers, time taken  for the
> execution etc.
>
> One quick question for you, what is the size of your data and what is the
> no of maps which you are getting right now?
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 5:11 PM, imen Megdiche wrote:
>
>> Thank you Mohammad but the number of map tasks still the same in the
>> execution. Do you know how to capture the time spent on execution.
>>
>>
>> 2012/12/12 Mohammad Tariq 
>>
>>> Hi Imen,
>>>
>>> You can add "mapred.map.tasks" property in your mapred-site.xml
>>> file.
>>>
>>> But, it is just a hint for the InputFormat. Actually no. of maps is
>>> actually determined by the no of InputSplits created by the InputFormat.
>>>
>>> HTH
>>>
>>> Regards,
>>> Mohammad Tariq
>>>
>>>
>>>
>>> On Wed, Dec 12, 2012 at 4:11 PM, imen Megdiche 
>>> wrote:
>>>
 Hi,

 I try to force the number of map for the mapreduce job with the command
 :
   public static void main(String[] args) throws Exception {

   JobConf conf = new JobConf(WordCount.class);
  conf.set("mapred.job.tracker", "local");
  conf.set("fs.default.name", "local");
   conf.setJobName("wordcount");

   conf.setOutputKeyClass(Text.class);
  conf.setOutputValueClass(IntWritable.class);

conf.setNumMapTask(6);
   conf.setMapperClass(Map.class);
   conf.setCombinerClass(Reduce.class);
   conf.setReducerClass(Reduce.class);
 ...
 }

 But it doesn t work.
 What can i do to modify the number of map and reduce tasks.

 Thank you

>>>
>>>
>>
>


Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
Hi Imen,

 You can visit the MR web UI at "JobTrackerHost:50030" and see all the
useful information like no. of mappers, no of reducers, time taken  for the
execution etc.

One quick question for you, what is the size of your data and what is the
no of maps which you are getting right now?

Regards,
Mohammad Tariq



On Wed, Dec 12, 2012 at 5:11 PM, imen Megdiche wrote:

> Thank you Mohammad but the number of map tasks still the same in the
> execution. Do you know how to capture the time spent on execution.
>
>
> 2012/12/12 Mohammad Tariq 
>
>> Hi Imen,
>>
>> You can add "mapred.map.tasks" property in your mapred-site.xml file.
>>
>> But, it is just a hint for the InputFormat. Actually no. of maps is
>> actually determined by the no of InputSplits created by the InputFormat.
>>
>> HTH
>>
>> Regards,
>> Mohammad Tariq
>>
>>
>>
>> On Wed, Dec 12, 2012 at 4:11 PM, imen Megdiche 
>> wrote:
>>
>>> Hi,
>>>
>>> I try to force the number of map for the mapreduce job with the command
>>> :
>>>   public static void main(String[] args) throws Exception {
>>>
>>>   JobConf conf = new JobConf(WordCount.class);
>>>  conf.set("mapred.job.tracker", "local");
>>>  conf.set("fs.default.name", "local");
>>>   conf.setJobName("wordcount");
>>>
>>>   conf.setOutputKeyClass(Text.class);
>>>  conf.setOutputValueClass(IntWritable.class);
>>>
>>>conf.setNumMapTask(6);
>>>   conf.setMapperClass(Map.class);
>>>   conf.setCombinerClass(Reduce.class);
>>>   conf.setReducerClass(Reduce.class);
>>> ...
>>> }
>>>
>>> But it doesn t work.
>>> What can i do to modify the number of map and reduce tasks.
>>>
>>> Thank you
>>>
>>
>>
>


Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
Thank you Mohammad but the number of map tasks still the same in the
execution. Do you know how to capture the time spent on execution.


2012/12/12 Mohammad Tariq 

> Hi Imen,
>
> You can add "mapred.map.tasks" property in your mapred-site.xml file.
>
> But, it is just a hint for the InputFormat. Actually no. of maps is
> actually determined by the no of InputSplits created by the InputFormat.
>
> HTH
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 4:11 PM, imen Megdiche wrote:
>
>> Hi,
>>
>> I try to force the number of map for the mapreduce job with the command :
>>   public static void main(String[] args) throws Exception {
>>
>>   JobConf conf = new JobConf(WordCount.class);
>>  conf.set("mapred.job.tracker", "local");
>>  conf.set("fs.default.name", "local");
>>   conf.setJobName("wordcount");
>>
>>   conf.setOutputKeyClass(Text.class);
>>  conf.setOutputValueClass(IntWritable.class);
>>
>>conf.setNumMapTask(6);
>>   conf.setMapperClass(Map.class);
>>   conf.setCombinerClass(Reduce.class);
>>   conf.setReducerClass(Reduce.class);
>> ...
>> }
>>
>> But it doesn t work.
>> What can i do to modify the number of map and reduce tasks.
>>
>> Thank you
>>
>
>


Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
Hi Imen,

You can add "mapred.map.tasks" property in your mapred-site.xml file.

But, it is just a hint for the InputFormat. Actually no. of maps is
actually determined by the no of InputSplits created by the InputFormat.

HTH

Regards,
Mohammad Tariq



On Wed, Dec 12, 2012 at 4:11 PM, imen Megdiche wrote:

> Hi,
>
> I try to force the number of map for the mapreduce job with the command :
>   public static void main(String[] args) throws Exception {
>
>   JobConf conf = new JobConf(WordCount.class);
>  conf.set("mapred.job.tracker", "local");
>  conf.set("fs.default.name", "local");
>   conf.setJobName("wordcount");
>
>   conf.setOutputKeyClass(Text.class);
>  conf.setOutputValueClass(IntWritable.class);
>
>conf.setNumMapTask(6);
>   conf.setMapperClass(Map.class);
>   conf.setCombinerClass(Reduce.class);
>   conf.setReducerClass(Reduce.class);
> ...
> }
>
> But it doesn t work.
> What can i do to modify the number of map and reduce tasks.
>
> Thank you
>


Re: Re: Number of mapreduce job and the time spent

2012-12-12 Thread imen Megdiche
the command hadoop job -status work fine but the problem that it cannot
find the job
Could not find job job_local_0001
i don t understand why does it not find it.


2012/12/12 long 

> Sorry for my mistake.
> if $HADOOP_HOME is set, run as follow, or not just find the path for your
> 'hadoop' command for instead:
> $HADOOP_HOME/bin/hadoop job -status job_xxx
>
>
>
>
> --
> Best Regards,
> longmans
>
> At 2012-12-12 17:56:45,"imen Megdiche"  wrote:
>
> I think that my job id is in this line :
>
> 12/12/12 10:43:00 INFO mapred.JobClient: Running job: job_local_0001
>
>
> but i have this response when i execute :
>
> hadoop job -status  job_local_0001
> Warning: $HADOOP_HOME is deprecated.
>
> Could not find job job_local_0001
>
>
>
>
>
> 2012/12/12 long 
>
>> get you jobid and use this command:
>> $HADOOP_HOME/hadoop job -status job_xxx
>>
>>
>>
>>
>> --
>> Best Regards,
>> longmans
>>
>> At 2012-12-12 17:23:39,"imen Megdiche"  wrote:
>>
>> Hi,
>>
>>  I want to know from the output of the execution of the example of
>> mapreduce  wordcount on hadoop : the number of mapreduce job and the time
>> spent for the execution.
>>
>> There is an exceprt from the output.
>>
>> 12/12/12 10:20:09 INFO mapred.Task: Task 'attempt_local_0001_r_00_0'
>> done.
>> 12/12/12 10:20:10 INFO mapred.JobClient:  map 100% reduce 100%
>> 12/12/12 10:20:10 INFO mapred.JobClient: Job complete: job_local_0001
>> 12/12/12 10:20:10 INFO mapred.JobClient: Counters: 22
>> 12/12/12 10:20:10 INFO mapred.JobClient:   File Input Format Counters
>> 12/12/12 10:20:10 INFO mapred.JobClient: Bytes Read=145966941
>> 12/12/12 10:20:10 INFO mapred.JobClient:   File Output Format Counters
>> 12/12/12 10:20:10 INFO mapred.JobClient: Bytes Written=50704638
>> 12/12/12 10:20:10 INFO mapred.JobClient:
>> org.myorg.WordCount$Map$Counters
>> 12/12/12 10:20:10 INFO mapred.JobClient: INPUT_WORDS=4980060
>> 12/12/12 10:20:10 INFO mapred.JobClient:   FileSystemCounters
>> 12/12/12 10:20:10 INFO mapred.JobClient: FILE_BYTES_READ=1777104865
>> 12/12/12 10:20:10 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1783494521
>> 12/12/12 10:20:10 INFO mapred.JobClient:   Map-Reduce Framework
>> 12/12/12 10:20:10 INFO mapred.JobClient: Map output materialized
>> bytes=170854986
>> 12/12/12 10:20:10 INFO mapred.JobClient: Map input records=4980060
>> 12/12/12 10:20:10 INFO mapred.JobClient: Reduce shuffle bytes=0
>> 12/12/12 10:20:10 INFO mapred.JobClient: Spilled Records=14940180
>> 12/12/12 10:20:10 INFO mapred.JobClient: Map output bytes=160894830
>> 12/12/12 10:20:10 INFO mapred.JobClient: Total committed heap usage
>> (bytes)=1185910784
>> 12/12/12 10:20:10 INFO mapred.JobClient: CPU time spent (ms)=0
>> 12/12/12 10:20:10 INFO mapred.JobClient: Map input bytes=145954650
>> 12/12/12 10:20:10 INFO mapred.JobClient: SPLIT_RAW_BYTES=614
>> 12/12/12 10:20:10 INFO mapred.JobClient: Combine input records=8426541
>> 12/12/12 10:20:10 INFO mapred.JobClient: Reduce input records=4980060
>> 12/12/12 10:20:10 INFO mapred.JobClient: Reduce input groups=1660020
>> 12/12/12 10:20:10 INFO mapred.JobClient: Combine output
>> records=8426541
>> 12/12/12 10:20:10 INFO mapred.JobClient: Physical memory (bytes)
>> snapshot=0
>> 12/12/12 10:20:10 INFO mapred.JobClient: Reduce output records=1660020
>> 12/12/12 10:20:10 INFO mapred.JobClient: Virtual memory (bytes)
>> snapshot=0
>> 12/12/12 10:20:10 INFO mapred.JobClient: Map output records=4980060
>>
>>
>> Thank you for your responses.
>>
>>
>>
>>
>>
>
>
>


Re:Re: Number of mapreduce job and the time spent

2012-12-12 Thread long
Sorry for my mistake.
if $HADOOP_HOME is set, run as follow, or not just find the path for your 
'hadoop' command for instead:
$HADOOP_HOME/bin/hadoop job -status job_xxx





--
Best Regards,
longmans

At 2012-12-12 17:56:45,"imen Megdiche"  wrote:
I think that my job id is in this line :

12/12/12 10:43:00 INFO mapred.JobClient: Running job: job_local_0001


but i have this response when i execute :
 
hadoop job -status  job_local_0001
Warning: $HADOOP_HOME is deprecated.

Could not find job job_local_0001







2012/12/12 long 

get you jobid and use this command:
$HADOOP_HOME/hadoop job -status job_xxx





--
Best Regards,
longmans

At 2012-12-12 17:23:39,"imen Megdiche"  wrote:
Hi,

 I want to know from the output of the execution of the example of mapreduce  
wordcount on hadoop : the number of mapreduce job and the time spent for the 
execution.

There is an exceprt from the output.

12/12/12 10:20:09 INFO mapred.Task: Task 'attempt_local_0001_r_00_0' done.
12/12/12 10:20:10 INFO mapred.JobClient:  map 100% reduce 100%
12/12/12 10:20:10 INFO mapred.JobClient: Job complete: job_local_0001
12/12/12 10:20:10 INFO mapred.JobClient: Counters: 22
12/12/12 10:20:10 INFO mapred.JobClient:   File Input Format Counters
12/12/12 10:20:10 INFO mapred.JobClient: Bytes Read=145966941
12/12/12 10:20:10 INFO mapred.JobClient:   File Output Format Counters
12/12/12 10:20:10 INFO mapred.JobClient: Bytes Written=50704638
12/12/12 10:20:10 INFO mapred.JobClient:   org.myorg.WordCount$Map$Counters
12/12/12 10:20:10 INFO mapred.JobClient: INPUT_WORDS=4980060
12/12/12 10:20:10 INFO mapred.JobClient:   FileSystemCounters
12/12/12 10:20:10 INFO mapred.JobClient: FILE_BYTES_READ=1777104865
12/12/12 10:20:10 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1783494521
12/12/12 10:20:10 INFO mapred.JobClient:   Map-Reduce Framework
12/12/12 10:20:10 INFO mapred.JobClient: Map output materialized 
bytes=170854986
12/12/12 10:20:10 INFO mapred.JobClient: Map input records=4980060
12/12/12 10:20:10 INFO mapred.JobClient: Reduce shuffle bytes=0
12/12/12 10:20:10 INFO mapred.JobClient: Spilled Records=14940180
12/12/12 10:20:10 INFO mapred.JobClient: Map output bytes=160894830
12/12/12 10:20:10 INFO mapred.JobClient: Total committed heap usage 
(bytes)=1185910784
12/12/12 10:20:10 INFO mapred.JobClient: CPU time spent (ms)=0
12/12/12 10:20:10 INFO mapred.JobClient: Map input bytes=145954650
12/12/12 10:20:10 INFO mapred.JobClient: SPLIT_RAW_BYTES=614
12/12/12 10:20:10 INFO mapred.JobClient: Combine input records=8426541
12/12/12 10:20:10 INFO mapred.JobClient: Reduce input records=4980060
12/12/12 10:20:10 INFO mapred.JobClient: Reduce input groups=1660020
12/12/12 10:20:10 INFO mapred.JobClient: Combine output records=8426541
12/12/12 10:20:10 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
12/12/12 10:20:10 INFO mapred.JobClient: Reduce output records=1660020
12/12/12 10:20:10 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
12/12/12 10:20:10 INFO mapred.JobClient: Map output records=4980060


Thank you for your responses.









Re: Number of mapreduce job and the time spent

2012-12-12 Thread imen Megdiche
I think that my job id is in this line :

12/12/12 10:43:00 INFO mapred.JobClient: Running job: job_local_0001


but i have this response when i execute :

hadoop job -status  job_local_0001
Warning: $HADOOP_HOME is deprecated.

Could not find job job_local_0001





2012/12/12 long 

> get you jobid and use this command:
> $HADOOP_HOME/hadoop job -status job_xxx
>
>
>
>
> --
> Best Regards,
> longmans
>
> At 2012-12-12 17:23:39,"imen Megdiche"  wrote:
>
> Hi,
>
>  I want to know from the output of the execution of the example of
> mapreduce  wordcount on hadoop : the number of mapreduce job and the time
> spent for the execution.
>
> There is an exceprt from the output.
>
> 12/12/12 10:20:09 INFO mapred.Task: Task 'attempt_local_0001_r_00_0'
> done.
> 12/12/12 10:20:10 INFO mapred.JobClient:  map 100% reduce 100%
> 12/12/12 10:20:10 INFO mapred.JobClient: Job complete: job_local_0001
> 12/12/12 10:20:10 INFO mapred.JobClient: Counters: 22
> 12/12/12 10:20:10 INFO mapred.JobClient:   File Input Format Counters
> 12/12/12 10:20:10 INFO mapred.JobClient: Bytes Read=145966941
> 12/12/12 10:20:10 INFO mapred.JobClient:   File Output Format Counters
> 12/12/12 10:20:10 INFO mapred.JobClient: Bytes Written=50704638
> 12/12/12 10:20:10 INFO mapred.JobClient:   org.myorg.WordCount$Map$Counters
> 12/12/12 10:20:10 INFO mapred.JobClient: INPUT_WORDS=4980060
> 12/12/12 10:20:10 INFO mapred.JobClient:   FileSystemCounters
> 12/12/12 10:20:10 INFO mapred.JobClient: FILE_BYTES_READ=1777104865
> 12/12/12 10:20:10 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1783494521
> 12/12/12 10:20:10 INFO mapred.JobClient:   Map-Reduce Framework
> 12/12/12 10:20:10 INFO mapred.JobClient: Map output materialized
> bytes=170854986
> 12/12/12 10:20:10 INFO mapred.JobClient: Map input records=4980060
> 12/12/12 10:20:10 INFO mapred.JobClient: Reduce shuffle bytes=0
> 12/12/12 10:20:10 INFO mapred.JobClient: Spilled Records=14940180
> 12/12/12 10:20:10 INFO mapred.JobClient: Map output bytes=160894830
> 12/12/12 10:20:10 INFO mapred.JobClient: Total committed heap usage
> (bytes)=1185910784
> 12/12/12 10:20:10 INFO mapred.JobClient: CPU time spent (ms)=0
> 12/12/12 10:20:10 INFO mapred.JobClient: Map input bytes=145954650
> 12/12/12 10:20:10 INFO mapred.JobClient: SPLIT_RAW_BYTES=614
> 12/12/12 10:20:10 INFO mapred.JobClient: Combine input records=8426541
> 12/12/12 10:20:10 INFO mapred.JobClient: Reduce input records=4980060
> 12/12/12 10:20:10 INFO mapred.JobClient: Reduce input groups=1660020
> 12/12/12 10:20:10 INFO mapred.JobClient: Combine output records=8426541
> 12/12/12 10:20:10 INFO mapred.JobClient: Physical memory (bytes)
> snapshot=0
> 12/12/12 10:20:10 INFO mapred.JobClient: Reduce output records=1660020
> 12/12/12 10:20:10 INFO mapred.JobClient: Virtual memory (bytes)
> snapshot=0
> 12/12/12 10:20:10 INFO mapred.JobClient: Map output records=4980060
>
>
> Thank you for your responses.
>
>
>
>
>


Re:Number of mapreduce job and the time spent

2012-12-12 Thread long
get you jobid and use this command:
$HADOOP_HOME/hadoop job -status job_xxx





--
Best Regards,
longmans

At 2012-12-12 17:23:39,"imen Megdiche"  wrote:
Hi,

 I want to know from the output of the execution of the example of mapreduce  
wordcount on hadoop : the number of mapreduce job and the time spent for the 
execution.

There is an exceprt from the output.

12/12/12 10:20:09 INFO mapred.Task: Task 'attempt_local_0001_r_00_0' done.
12/12/12 10:20:10 INFO mapred.JobClient:  map 100% reduce 100%
12/12/12 10:20:10 INFO mapred.JobClient: Job complete: job_local_0001
12/12/12 10:20:10 INFO mapred.JobClient: Counters: 22
12/12/12 10:20:10 INFO mapred.JobClient:   File Input Format Counters
12/12/12 10:20:10 INFO mapred.JobClient: Bytes Read=145966941
12/12/12 10:20:10 INFO mapred.JobClient:   File Output Format Counters
12/12/12 10:20:10 INFO mapred.JobClient: Bytes Written=50704638
12/12/12 10:20:10 INFO mapred.JobClient:   org.myorg.WordCount$Map$Counters
12/12/12 10:20:10 INFO mapred.JobClient: INPUT_WORDS=4980060
12/12/12 10:20:10 INFO mapred.JobClient:   FileSystemCounters
12/12/12 10:20:10 INFO mapred.JobClient: FILE_BYTES_READ=1777104865
12/12/12 10:20:10 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1783494521
12/12/12 10:20:10 INFO mapred.JobClient:   Map-Reduce Framework
12/12/12 10:20:10 INFO mapred.JobClient: Map output materialized 
bytes=170854986
12/12/12 10:20:10 INFO mapred.JobClient: Map input records=4980060
12/12/12 10:20:10 INFO mapred.JobClient: Reduce shuffle bytes=0
12/12/12 10:20:10 INFO mapred.JobClient: Spilled Records=14940180
12/12/12 10:20:10 INFO mapred.JobClient: Map output bytes=160894830
12/12/12 10:20:10 INFO mapred.JobClient: Total committed heap usage 
(bytes)=1185910784
12/12/12 10:20:10 INFO mapred.JobClient: CPU time spent (ms)=0
12/12/12 10:20:10 INFO mapred.JobClient: Map input bytes=145954650
12/12/12 10:20:10 INFO mapred.JobClient: SPLIT_RAW_BYTES=614
12/12/12 10:20:10 INFO mapred.JobClient: Combine input records=8426541
12/12/12 10:20:10 INFO mapred.JobClient: Reduce input records=4980060
12/12/12 10:20:10 INFO mapred.JobClient: Reduce input groups=1660020
12/12/12 10:20:10 INFO mapred.JobClient: Combine output records=8426541
12/12/12 10:20:10 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
12/12/12 10:20:10 INFO mapred.JobClient: Reduce output records=1660020
12/12/12 10:20:10 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
12/12/12 10:20:10 INFO mapred.JobClient: Map output records=4980060


Thank you for your responses.




help on failed MR jobs (big hive files)

2012-12-12 Thread Elaine Gan
Hi,

I'm trying to run a program on Hadoop.

[Input] tsv file

My program does the following.
(1) Load tsv into hive
  load data local inpath 'tsvfile' overwrite into table A partitioned by xx
(2) insert overwrite table B select a, b, c from table A where 
datediff(to_date(from_unixtime(unix_timestamp('${logdate}'))), request_date) <= 
30
(3) Running Mahout

In step 2, i am trying to retrieve data from hive for the past month.
My hadoop work always stopped here.
When i check through my browser utility it says that 

Diagnostic Info:
# of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: 
task_201211291541_0262_m_001800

Task attempt_201211291541_0262_m_001800_0 failed to report status for 1802 
seconds. Killing!
Error: Java heap space
Task attempt_201211291541_0262_m_001800_2 failed to report status for 1800 
seconds. Killing!
Task attempt_201211291541_0262_m_001800_3 failed to report status for 1801 
seconds. Killing!



Each hive table is big, around 6 GB.

(1) Is it too big to have around 6GB for each hive table?
(2) I've increased by HEAPSIZE to 50G,which i think is far more than enough. 
Any else
where i can do the tuning?


Thank you.



rei