Re: How to configure mapreduce archive size?

2013-04-18 Thread Hemanth Yamijala
Well, since the DistributedCache is used by the tasktracker, you need to
update the log4j configuration file used by the tasktracker daemon. And you
need to get the tasktracker log file - from the machine where you see the
distributed cache problem.


On Fri, Apr 19, 2013 at 6:27 AM, xia_y...@dell.com wrote:

 Hi Hemanth,

 ** **

 I tried http://machine:50030. It did not work for me.

 ** **

 In hbase_home/conf folder, I update the log4j configuration properties and
 got attached log. Do you find what is happening for the map reduce job?***
 *

 ** **

 Thanks,

 ** **

 Jane

 ** **

 *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
 *Sent:* Wednesday, April 17, 2013 9:11 PM

 *To:* user@hadoop.apache.org
 *Subject:* Re: How to configure mapreduce archive size?

 ** **

 The check for cache file cleanup is controlled by the
 property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to
 1 minute (which should be sufficient for your requirement).

 ** **

 I am not sure why the JobTracker UI is inaccessible. If you know where JT
 is running, try hitting http://machine:50030. If that doesn't work, maybe
 check if ports have been changed in mapred-site.xml for a property similar
 to mapred.job.tracker.http.address. 

 ** **

 There is logging in the code of the tasktracker component that can help
 debug the distributed cache behaviour. In order to get those logs you need
 to enable debug logging in the log4j configuration properties and restart
 the daemons. Hopefully that will help you get some hints on what is
 happening.

 ** **

 Thanks

 hemanth

 ** **

 On Wed, Apr 17, 2013 at 11:49 PM, xia_y...@dell.com wrote:

 Hi Hemanth and Bejoy KS,

  

 I have tried both mapred-site.xml and core-site.xml. They do not work. I
 set the value to 50K just for testing purpose, however the folder size
 already goes to 900M now. As in your email, “After they are done, the
 property will help cleanup the files due to the limit set. ” How frequently
 the cleanup task will be triggered? 

  

 Regarding the job.xml, I cannot use JT web UI to find it. It seems when
 hadoop is packaged within Hbase, this is disabled. I am only use Hbase
 jobs. I was suggested by Hbase people to get help from Hadoop mailing list.
 I will contact them again.

  

 Thanks,

  

 Jane

  

 *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
 *Sent:* Tuesday, April 16, 2013 9:35 PM


 *To:* user@hadoop.apache.org
 *Subject:* Re: How to configure mapreduce archive size?

  

 You can limit the size by setting local.cache.size in the mapred-site.xml
 (or core-site.xml if that works for you). I mistakenly mentioned
 mapred-default.xml in my last mail - apologies for that. However, please
 note that this does not prevent whatever is writing into the distributed
 cache from creating those files when they are required. After they are
 done, the property will help cleanup the files due to the limit set. 

  

 That's why I am more keen on finding what is using the files in the
 Distributed cache. It may be useful if you can ask on the HBase list as
 well if the APIs you are using are creating the files you mention (assuming
 you are only running HBase jobs on the cluster and nothing else)

  

 Thanks

 Hemanth

  

 On Tue, Apr 16, 2013 at 11:15 PM, xia_y...@dell.com wrote:

 Hi Hemanth,

  

 I did not explicitly using DistributedCache in my code. I did not use any
 command line arguments like –libjars neither.

  

 Where can I find job.xml? I am using Hbase MapReduce API and not setting
 any job.xml.

  

 The key point is I want to limit the size of
 /tmp/hadoop-root/mapred/local/archive. Could you help?

  

 Thanks.

  

 Xia

  

 *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
 *Sent:* Thursday, April 11, 2013 9:09 PM


 *To:* user@hadoop.apache.org
 *Subject:* Re: How to configure mapreduce archive size?

  

 TableMapReduceUtil has APIs like addDependencyJars which will use
 DistributedCache. I don't think you are explicitly using that. Are you
 using any command line arguments like -libjars etc when you are launching
 the MapReduce job ? Alternatively you can check job.xml of the launched MR
 job to see if it has set properties having prefixes like mapred.cache. If
 nothing's set there, it would seem like some other process or user is
 adding jars to DistributedCache when using the cluster.

  

 Thanks

 hemanth

  

  

  

 On Thu, Apr 11, 2013 at 11:40 PM, xia_y...@dell.com wrote:

 Hi Hemanth,

  

 Attached is some sample folders within my
 /tmp/hadoop-root/mapred/local/archive. There are some jar and class files
 inside.

  

 My application uses MapReduce job to do purge Hbase old data. I am using
 basic HBase MapReduce API to delete rows from Hbase

RE: How to configure mapreduce archive size?

2013-04-17 Thread Xia_Yang
Hi Hemanth and Bejoy KS,

I have tried both mapred-site.xml and core-site.xml. They do not work. I set 
the value to 50K just for testing purpose, however the folder size already goes 
to 900M now. As in your email, After they are done, the property will help 
cleanup the files due to the limit set.  How frequently the cleanup task will 
be triggered?

Regarding the job.xml, I cannot use JT web UI to find it. It seems when hadoop 
is packaged within Hbase, this is disabled. I am only use Hbase jobs. I was 
suggested by Hbase people to get help from Hadoop mailing list. I will contact 
them again.

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
Sent: Tuesday, April 16, 2013 9:35 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

You can limit the size by setting local.cache.size in the mapred-site.xml (or 
core-site.xml if that works for you). I mistakenly mentioned mapred-default.xml 
in my last mail - apologies for that. However, please note that this does not 
prevent whatever is writing into the distributed cache from creating those 
files when they are required. After they are done, the property will help 
cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the Distributed 
cache. It may be useful if you can ask on the HBase list as well if the APIs 
you are using are creating the files you mention (assuming you are only running 
HBase jobs on the cluster and nothing else)

Thanks
Hemanth

On Tue, Apr 16, 2013 at 11:15 PM, xia_y...@dell.commailto:xia_y...@dell.com 
wrote:
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any 
command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any 
job.xml.

The key point is I want to limit the size of 
/tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala 
[mailto:yhema...@thoughtworks.commailto:yhema...@thoughtworks.com]
Sent: Thursday, April 11, 2013 9:09 PM

To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use 
DistributedCache. I don't think you are explicitly using that. Are you using 
any command line arguments like -libjars etc when you are launching the 
MapReduce job ? Alternatively you can check job.xml of the launched MR job to 
see if it has set properties having prefixes like mapred.cache. If nothing's 
set there, it would seem like some other process or user is adding jars to 
DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, xia_y...@dell.commailto:xia_y...@dell.com 
wrote:
Hi Hemanth,

Attached is some sample folders within my 
/tmp/hadoop-root/mapred/local/archive. There are some jar and class files 
inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic 
HBase MapReduce API to delete rows from Hbase table. I do not specify to use 
Distributed cache. Maybe HBase use it?

Some code here:

   Scan scan = new Scan();
   scan.setCaching(500);// 1 is the default in Scan, which will be 
bad for MapReduce jobs
   scan.setCacheBlocks(false);  // don't set to true for MR jobs
   scan.setTimeRange(Long.MIN_VALUE, timestamp);
   // set other scan attrs
   // the purge start time
   Date date=new Date();
   TableMapReduceUtil.initTableMapperJob(
 tableName,// input table
 scan,   // Scan instance to control CF and attribute 
selection
 MapperDelete.class, // mapper class
 null, // mapper output key
 null,  // mapper output value
 job);

   job.setOutputFormatClass(TableOutputFormat.class);
   job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
   job.setNumReduceTasks(0);

   boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala 
[mailto:yhema...@thoughtworks.commailto:yhema...@thoughtworks.com]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will 
help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use 
Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, xia_y...@dell.commailto:xia_y...@dell.com 
wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After 
that I start my application. After one evening, my 
/tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

local.cache.size in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C

Re: How to configure mapreduce archive size?

2013-04-17 Thread Hemanth Yamijala
The check for cache file cleanup is controlled by the
property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to
1 minute (which should be sufficient for your requirement).

I am not sure why the JobTracker UI is inaccessible. If you know where JT
is running, try hitting http://machine:50030. If that doesn't work, maybe
check if ports have been changed in mapred-site.xml for a property similar
to mapred.job.tracker.http.address.

There is logging in the code of the tasktracker component that can help
debug the distributed cache behaviour. In order to get those logs you need
to enable debug logging in the log4j configuration properties and restart
the daemons. Hopefully that will help you get some hints on what is
happening.

Thanks
hemanth


On Wed, Apr 17, 2013 at 11:49 PM, xia_y...@dell.com wrote:

 Hi Hemanth and Bejoy KS,

 ** **

 I have tried both mapred-site.xml and core-site.xml. They do not work. I
 set the value to 50K just for testing purpose, however the folder size
 already goes to 900M now. As in your email, “After they are done, the
 property will help cleanup the files due to the limit set. ” How frequently
 the cleanup task will be triggered? 

 ** **

 Regarding the job.xml, I cannot use JT web UI to find it. It seems when
 hadoop is packaged within Hbase, this is disabled. I am only use Hbase
 jobs. I was suggested by Hbase people to get help from Hadoop mailing list.
 I will contact them again.

 ** **

 Thanks,

 ** **

 Jane

 ** **

 *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
 *Sent:* Tuesday, April 16, 2013 9:35 PM

 *To:* user@hadoop.apache.org
 *Subject:* Re: How to configure mapreduce archive size?

 ** **

 You can limit the size by setting local.cache.size in the mapred-site.xml
 (or core-site.xml if that works for you). I mistakenly mentioned
 mapred-default.xml in my last mail - apologies for that. However, please
 note that this does not prevent whatever is writing into the distributed
 cache from creating those files when they are required. After they are
 done, the property will help cleanup the files due to the limit set. 

 ** **

 That's why I am more keen on finding what is using the files in the
 Distributed cache. It may be useful if you can ask on the HBase list as
 well if the APIs you are using are creating the files you mention (assuming
 you are only running HBase jobs on the cluster and nothing else)

 ** **

 Thanks

 Hemanth

 ** **

 On Tue, Apr 16, 2013 at 11:15 PM, xia_y...@dell.com wrote:

 Hi Hemanth,

  

 I did not explicitly using DistributedCache in my code. I did not use any
 command line arguments like –libjars neither.

  

 Where can I find job.xml? I am using Hbase MapReduce API and not setting
 any job.xml.

  

 The key point is I want to limit the size of
 /tmp/hadoop-root/mapred/local/archive. Could you help?

  

 Thanks.

  

 Xia

  

 *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
 *Sent:* Thursday, April 11, 2013 9:09 PM


 *To:* user@hadoop.apache.org
 *Subject:* Re: How to configure mapreduce archive size?

  

 TableMapReduceUtil has APIs like addDependencyJars which will use
 DistributedCache. I don't think you are explicitly using that. Are you
 using any command line arguments like -libjars etc when you are launching
 the MapReduce job ? Alternatively you can check job.xml of the launched MR
 job to see if it has set properties having prefixes like mapred.cache. If
 nothing's set there, it would seem like some other process or user is
 adding jars to DistributedCache when using the cluster.

  

 Thanks

 hemanth

  

  

  

 On Thu, Apr 11, 2013 at 11:40 PM, xia_y...@dell.com wrote:

 Hi Hemanth,

  

 Attached is some sample folders within my
 /tmp/hadoop-root/mapred/local/archive. There are some jar and class files
 inside.

  

 My application uses MapReduce job to do purge Hbase old data. I am using
 basic HBase MapReduce API to delete rows from Hbase table. I do not specify
 to use Distributed cache. Maybe HBase use it?

  

 Some code here:

  

Scan scan = *new* Scan();

scan.setCaching(500);// 1 is the default in Scan, which
 will be bad for MapReduce jobs

scan.setCacheBlocks(*false*);  // don't set to true for MR jobs

scan.setTimeRange(Long.*MIN_VALUE*, timestamp);

// set other scan *attrs*

// the purge start time

Date date=*new* Date();

TableMapReduceUtil.*initTableMapperJob*(

  tableName,// input table

  scan,   // Scan instance to control CF and
 attribute selection

  MapperDelete.*class*, // *mapper* class

  *null*, // *mapper* output key

  *null*,  // *mapper* output value

RE: How to configure mapreduce archive size?

2013-04-16 Thread Xia_Yang
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any 
command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any 
job.xml.

The key point is I want to limit the size of 
/tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
Sent: Thursday, April 11, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use 
DistributedCache. I don't think you are explicitly using that. Are you using 
any command line arguments like -libjars etc when you are launching the 
MapReduce job ? Alternatively you can check job.xml of the launched MR job to 
see if it has set properties having prefixes like mapred.cache. If nothing's 
set there, it would seem like some other process or user is adding jars to 
DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, xia_y...@dell.commailto:xia_y...@dell.com 
wrote:
Hi Hemanth,

Attached is some sample folders within my 
/tmp/hadoop-root/mapred/local/archive. There are some jar and class files 
inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic 
HBase MapReduce API to delete rows from Hbase table. I do not specify to use 
Distributed cache. Maybe HBase use it?

Some code here:

   Scan scan = new Scan();
   scan.setCaching(500);// 1 is the default in Scan, which will be 
bad for MapReduce jobs
   scan.setCacheBlocks(false);  // don't set to true for MR jobs
   scan.setTimeRange(Long.MIN_VALUE, timestamp);
   // set other scan attrs
   // the purge start time
   Date date=new Date();
   TableMapReduceUtil.initTableMapperJob(
 tableName,// input table
 scan,   // Scan instance to control CF and attribute 
selection
 MapperDelete.class, // mapper class
 null, // mapper output key
 null,  // mapper output value
 job);

   job.setOutputFormatClass(TableOutputFormat.class);
   job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
   job.setNumReduceTasks(0);

   boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala 
[mailto:yhema...@thoughtworks.commailto:yhema...@thoughtworks.com]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will 
help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use 
Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, xia_y...@dell.commailto:xia_y...@dell.com 
wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After 
that I start my application. After one evening, my 
/tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

local.cache.size in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:a...@hortonworks.commailto:a...@hortonworks.com]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), 
check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, xia_y...@dell.commailto:xia_y...@dell.com 
xia_y...@dell.commailto:xia_y...@dell.com wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find local.cache.size in file 
core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in 
mapred-default.xml.

I updated the value in file default.xml and changed the value to 50. This 
is just for my testing purpose. However, the folder 
/tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like 
it does not do the work. Could you advise if what I did is correct?

  namelocal.cache.size/name
  value50/value

Thanks,

Xia

From: Hemanth Yamijala 
[mailto:yhema...@thoughtworks.commailto:yhema...@thoughtworks.com]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. 
(http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). 
There is a configuration key local.cache.size which controls the amount of 
data stored under DistributedCache. The default limit is 10GB. However, the 
files under this cannot be deleted if they are being used. Also, some 
frameworks on Hadoop could be using DistributedCache

Re: How to configure mapreduce archive size?

2013-04-16 Thread Hemanth Yamijala
You can limit the size by setting local.cache.size in the mapred-site.xml
(or core-site.xml if that works for you). I mistakenly mentioned
mapred-default.xml in my last mail - apologies for that. However, please
note that this does not prevent whatever is writing into the distributed
cache from creating those files when they are required. After they are
done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the
Distributed cache. It may be useful if you can ask on the HBase list as
well if the APIs you are using are creating the files you mention (assuming
you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth


On Tue, Apr 16, 2013 at 11:15 PM, xia_y...@dell.com wrote:

 Hi Hemanth,

 ** **

 I did not explicitly using DistributedCache in my code. I did not use any
 command line arguments like –libjars neither.

 ** **

 Where can I find job.xml? I am using Hbase MapReduce API and not setting
 any job.xml.

 ** **

 The key point is I want to limit the size of 
 /tmp/hadoop-root/mapred/local/archive.
 Could you help?

 ** **

 Thanks.

 ** **

 Xia

 ** **

 *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
 *Sent:* Thursday, April 11, 2013 9:09 PM

 *To:* user@hadoop.apache.org
 *Subject:* Re: How to configure mapreduce archive size?

 ** **

 TableMapReduceUtil has APIs like addDependencyJars which will use
 DistributedCache. I don't think you are explicitly using that. Are you
 using any command line arguments like -libjars etc when you are launching
 the MapReduce job ? Alternatively you can check job.xml of the launched MR
 job to see if it has set properties having prefixes like mapred.cache. If
 nothing's set there, it would seem like some other process or user is
 adding jars to DistributedCache when using the cluster.

 ** **

 Thanks

 hemanth

 ** **

 ** **

 ** **

 On Thu, Apr 11, 2013 at 11:40 PM, xia_y...@dell.com wrote:

 Hi Hemanth,

  

 Attached is some sample folders within my
 /tmp/hadoop-root/mapred/local/archive. There are some jar and class files
 inside.

  

 My application uses MapReduce job to do purge Hbase old data. I am using
 basic HBase MapReduce API to delete rows from Hbase table. I do not specify
 to use Distributed cache. Maybe HBase use it?

  

 Some code here:

  

Scan scan = *new* Scan();

scan.setCaching(500);// 1 is the default in Scan, which
 will be bad for MapReduce jobs

scan.setCacheBlocks(*false*);  // don't set to true for MR jobs

scan.setTimeRange(Long.*MIN_VALUE*, timestamp);

// set other scan *attrs*

// the purge start time

Date date=*new* Date();

TableMapReduceUtil.*initTableMapperJob*(

  tableName,// input table

  scan,   // Scan instance to control CF and
 attribute selection

  MapperDelete.*class*, // *mapper* class

  *null*, // *mapper* output key

  *null*,  // *mapper* output value

  job);

  

job.setOutputFormatClass(TableOutputFormat.*class*);

job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
 tableName);

job.setNumReduceTasks(0);



*boolean* b = job.waitForCompletion(*true*);

  

 *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
 *Sent:* Thursday, April 11, 2013 12:29 AM


 *To:* user@hadoop.apache.org
 *Subject:* Re: How to configure mapreduce archive size?

  

 Could you paste the contents of the directory ? Not sure whether that will
 help, but just giving it a shot.

  

 What application are you using ? Is it custom MapReduce jobs in which you
 use Distributed cache (I guess not) ? 

  

 Thanks

 Hemanth

  

 On Thu, Apr 11, 2013 at 3:34 AM, xia_y...@dell.com wrote:

 Hi Arun,

  

 I stopped my application, then restarted my hbase (which include hadoop).
 After that I start my application. After one evening, my
 /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
 work.

  

 Is this the right place to change the value?

  

 local.cache.size in file core-default.xml, which is in
 hadoop-core-1.0.3.jar

  

 Thanks,

  

 Jane

  

 *From:* Arun C Murthy [mailto:a...@hortonworks.com]
 *Sent:* Wednesday, April 10, 2013 2:45 PM


 *To:* user@hadoop.apache.org
 *Subject:* Re: How to configure mapreduce archive size?

  

 Ensure no jobs are running (cache limit is only for non-active cache
 files), check after a little while (takes sometime for the cleaner thread
 to kick in).

  

 Arun

  

 On Apr 11, 2013, at 2:29 AM, xia_y...@dell.com xia_y...@dell.com
 wrote

Re: How to configure mapreduce archive size?

2013-04-11 Thread Hemanth Yamijala
Could you paste the contents of the directory ? Not sure whether that will
help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you
use Distributed cache (I guess not) ?

Thanks
Hemanth


On Thu, Apr 11, 2013 at 3:34 AM, xia_y...@dell.com wrote:

 Hi Arun,

 ** **

 I stopped my application, then restarted my hbase (which include hadoop).
 After that I start my application. After one evening, my 
 /tmp/hadoop-root/mapred/local/archive
 goes to more than 1G. It does not work.

 ** **

 Is this the right place to change the value?

 ** **

 local.cache.size in file core-default.xml, which is in
 hadoop-core-1.0.3.jar

 ** **

 Thanks,

 ** **

 Jane

 ** **

 *From:* Arun C Murthy [mailto:a...@hortonworks.com]
 *Sent:* Wednesday, April 10, 2013 2:45 PM

 *To:* user@hadoop.apache.org
 *Subject:* Re: How to configure mapreduce archive size?

 ** **

 Ensure no jobs are running (cache limit is only for non-active cache
 files), check after a little while (takes sometime for the cleaner thread
 to kick in).

 ** **

 Arun

 ** **

 On Apr 11, 2013, at 2:29 AM, xia_y...@dell.com xia_y...@dell.com
 wrote:



 

 Hi Hemanth,

  

 For the hadoop 1.0.3, I can only find local.cache.size in file
 core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
 mapred-default.xml.

  

 I updated the value in file default.xml and changed the value to 50.
 This is just for my testing purpose. However, the folder
 /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
 like it does not do the work. Could you advise if what I did is correct?**
 **

  

   namelocal.cache.size/name

   value50/value

  

 Thanks,

  

 Xia

  

 *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
 *Sent:* Monday, April 08, 2013 9:09 PM
 *To:* user@hadoop.apache.org
 *Subject:* Re: How to configure mapreduce archive size?

  

 Hi,

  

 This directory is used as part of the 'DistributedCache' feature. (
 http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
 There is a configuration key local.cache.size which controls the amount
 of data stored under DistributedCache. The default limit is 10GB. However,
 the files under this cannot be deleted if they are being used. Also, some
 frameworks on Hadoop could be using DistributedCache transparently to you.
 

  

 So you could check what is being stored here and based on that lower the
 limit of the cache size if you feel that will help. The property needs to
 be set in mapred-default.xml.

  

 Thanks

 Hemanth

  

 On Mon, Apr 8, 2013 at 11:09 PM, xia_y...@dell.com wrote:

 Hi,

  

 I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
 1.0.3. There is some mapreduce job running on my server. After some time, I
 found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
 **

  

 How to configure this and limit the size? I do not want  to waste my space
 for archive.

  

 Thanks,

  

 Xia

  

  

 ** **

 --

 Arun C. Murthy

 Hortonworks Inc.
 http://hortonworks.com/

 ** **



Re: How to configure mapreduce archive size?

2013-04-11 Thread Hemanth Yamijala
TableMapReduceUtil has APIs like addDependencyJars which will use
DistributedCache. I don't think you are explicitly using that. Are you
using any command line arguments like -libjars etc when you are launching
the MapReduce job ? Alternatively you can check job.xml of the launched MR
job to see if it has set properties having prefixes like mapred.cache. If
nothing's set there, it would seem like some other process or user is
adding jars to DistributedCache when using the cluster.

Thanks
hemanth




On Thu, Apr 11, 2013 at 11:40 PM, xia_y...@dell.com wrote:

 Hi Hemanth,

 ** **

 Attached is some sample folders within my 
 /tmp/hadoop-root/mapred/local/archive.
 There are some jar and class files inside.

 ** **

 My application uses MapReduce job to do purge Hbase old data. I am using
 basic HBase MapReduce API to delete rows from Hbase table. I do not specify
 to use Distributed cache. Maybe HBase use it?

 ** **

 Some code here:

 ** **

Scan scan = *new* Scan();

scan.setCaching(500);// 1 is the default in Scan, which
 will be bad for MapReduce jobs

scan.setCacheBlocks(*false*);  // don't set to true for MR jobs

scan.setTimeRange(Long.*MIN_VALUE*, timestamp);

// set other scan *attrs*

// the purge start time

Date date=*new* Date();

TableMapReduceUtil.*initTableMapperJob*(

  tableName,// input table

  scan,   // Scan instance to control CF and
 attribute selection

  MapperDelete.*class*, // *mapper* class

  *null*, // *mapper* output key

  *null*,  // *mapper* output value

  job);

 ** **

job.setOutputFormatClass(TableOutputFormat.*class*);

job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
 tableName);

job.setNumReduceTasks(0);



*boolean* b = job.waitForCompletion(*true*);

 ** **

 *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
 *Sent:* Thursday, April 11, 2013 12:29 AM

 *To:* user@hadoop.apache.org
 *Subject:* Re: How to configure mapreduce archive size?

 ** **

 Could you paste the contents of the directory ? Not sure whether that will
 help, but just giving it a shot.

 ** **

 What application are you using ? Is it custom MapReduce jobs in which you
 use Distributed cache (I guess not) ? 

 ** **

 Thanks

 Hemanth

 ** **

 On Thu, Apr 11, 2013 at 3:34 AM, xia_y...@dell.com wrote:

 Hi Arun,

  

 I stopped my application, then restarted my hbase (which include hadoop).
 After that I start my application. After one evening, my
 /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
 work.

  

 Is this the right place to change the value?

  

 local.cache.size in file core-default.xml, which is in
 hadoop-core-1.0.3.jar

  

 Thanks,

  

 Jane

  

 *From:* Arun C Murthy [mailto:a...@hortonworks.com]
 *Sent:* Wednesday, April 10, 2013 2:45 PM


 *To:* user@hadoop.apache.org
 *Subject:* Re: How to configure mapreduce archive size?

  

 Ensure no jobs are running (cache limit is only for non-active cache
 files), check after a little while (takes sometime for the cleaner thread
 to kick in).

  

 Arun

  

 On Apr 11, 2013, at 2:29 AM, xia_y...@dell.com xia_y...@dell.com
 wrote:

 ** **

 Hi Hemanth,

  

 For the hadoop 1.0.3, I can only find local.cache.size in file
 core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
 mapred-default.xml.

  

 I updated the value in file default.xml and changed the value to 50.
 This is just for my testing purpose. However, the folder
 /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
 like it does not do the work. Could you advise if what I did is correct?**
 **

  

   namelocal.cache.size/name

   value50/value

  

 Thanks,

  

 Xia

  

 *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
 *Sent:* Monday, April 08, 2013 9:09 PM
 *To:* user@hadoop.apache.org
 *Subject:* Re: How to configure mapreduce archive size?

  

 Hi,

  

 This directory is used as part of the 'DistributedCache' feature. (
 http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
 There is a configuration key local.cache.size which controls the amount
 of data stored under DistributedCache. The default limit is 10GB. However,
 the files under this cannot be deleted if they are being used. Also, some
 frameworks on Hadoop could be using DistributedCache transparently to you.
 

  

 So you could check what is being stored here and based on that lower the
 limit of the cache size if you feel that will help. The property needs to
 be set in mapred-default.xml

RE: How to configure mapreduce archive size?

2013-04-10 Thread Xia_Yang
Hi Hemanth,

For the hadoop 1.0.3, I can only find local.cache.size in file 
core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in 
mapred-default.xml.

I updated the value in file default.xml and changed the value to 50. This 
is just for my testing purpose. However, the folder 
/tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like 
it does not do the work. Could you advise if what I did is correct?

  namelocal.cache.size/name
  value50/value

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. 
(http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). 
There is a configuration key local.cache.size which controls the amount of 
data stored under DistributedCache. The default limit is 10GB. However, the 
files under this cannot be deleted if they are being used. Also, some 
frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit 
of the cache size if you feel that will help. The property needs to be set in 
mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, xia_y...@dell.commailto:xia_y...@dell.com 
wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. 
There is some mapreduce job running on my server. After some time, I found that 
my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for 
archive.

Thanks,

Xia




Re: How to configure mapreduce archive size?

2013-04-10 Thread Arun C Murthy
Ensure no jobs are running (cache limit is only for non-active cache files), 
check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, xia_y...@dell.com xia_y...@dell.com wrote:

 Hi Hemanth,
  
 For the hadoop 1.0.3, I can only find local.cache.size in file 
 core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in 
 mapred-default.xml.
  
 I updated the value in file default.xml and changed the value to 50. This 
 is just for my testing purpose. However, the folder 
 /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks 
 like it does not do the work. Could you advise if what I did is correct?
  
   namelocal.cache.size/name
   value50/value
  
 Thanks,
  
 Xia
  
 From: Hemanth Yamijala [mailto:yhema...@thoughtworks.com] 
 Sent: Monday, April 08, 2013 9:09 PM
 To: user@hadoop.apache.org
 Subject: Re: How to configure mapreduce archive size?
  
 Hi,
  
 This directory is used as part of the 'DistributedCache' feature. 
 (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). 
 There is a configuration key local.cache.size which controls the amount of 
 data stored under DistributedCache. The default limit is 10GB. However, the 
 files under this cannot be deleted if they are being used. Also, some 
 frameworks on Hadoop could be using DistributedCache transparently to you.
  
 So you could check what is being stored here and based on that lower the 
 limit of the cache size if you feel that will help. The property needs to be 
 set in mapred-default.xml.
  
 Thanks
 Hemanth
  
 
 On Mon, Apr 8, 2013 at 11:09 PM, xia_y...@dell.com wrote:
 Hi,
  
 I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. 
 There is some mapreduce job running on my server. After some time, I found 
 that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.
  
 How to configure this and limit the size? I do not want  to waste my space 
 for archive.
  
 Thanks,
  
 Xia
  
  

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/




RE: How to configure mapreduce archive size?

2013-04-10 Thread Xia_Yang
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After 
that I start my application. After one evening, my 
/tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

local.cache.size in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:a...@hortonworks.com]
Sent: Wednesday, April 10, 2013 2:45 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), 
check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, xia_y...@dell.commailto:xia_y...@dell.com 
xia_y...@dell.commailto:xia_y...@dell.com wrote:


Hi Hemanth,

For the hadoop 1.0.3, I can only find local.cache.size in file 
core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in 
mapred-default.xml.

I updated the value in file default.xml and changed the value to 50. This 
is just for my testing purpose. However, the folder 
/tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like 
it does not do the work. Could you advise if what I did is correct?

  namelocal.cache.size/name
  value50/value

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. 
(http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). 
There is a configuration key local.cache.size which controls the amount of 
data stored under DistributedCache. The default limit is 10GB. However, the 
files under this cannot be deleted if they are being used. Also, some 
frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit 
of the cache size if you feel that will help. The property needs to be set in 
mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, xia_y...@dell.commailto:xia_y...@dell.com 
wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. 
There is some mapreduce job running on my server. After some time, I found that 
my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for 
archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



How to configure mapreduce archive size?

2013-04-08 Thread Xia_Yang
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. 
There is some mapreduce job running on my server. After some time, I found that 
my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for 
archive.

Thanks,

Xia



Re: How to configure mapreduce archive size?

2013-04-08 Thread Hemanth Yamijala
Hi,

This directory is used as part of the 'DistributedCache' feature. (
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
There is a configuration key local.cache.size which controls the amount
of data stored under DistributedCache. The default limit is 10GB. However,
the files under this cannot be deleted if they are being used. Also, some
frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the
limit of the cache size if you feel that will help. The property needs to
be set in mapred-default.xml.

Thanks
Hemanth


On Mon, Apr 8, 2013 at 11:09 PM, xia_y...@dell.com wrote:

 Hi,

 ** **

 I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
 1.0.3. There is some mapreduce job running on my server. After some time, I
 found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
 **

 ** **

 How to configure this and limit the size? I do not want  to waste my space
 for archive.

 ** **

 Thanks,

 ** **

 Xia

 ** **



Re: How to configure mapreduce archive size?

2013-03-31 Thread Ted Yu
This question is more related to mapreduce.

I put user@hbase in Bcc.

Cheers


On Sun, Mar 31, 2013 at 11:15 AM, tojaneyang xia_y...@dell.com wrote:

 Hi Ted,

 Do you have any suggestions for this?

 I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
 1.0.3.

 Thanks,

 Xia



 --
 View this message in context:
 http://apache-hbase.679495.n3.nabble.com/How-to-configure-mapreduce-archive-size-tp4040488p4041237.html
 Sent from the HBase User mailing list archive at Nabble.com.