Re: desperate question about NameNode startup sequence

2011-12-17 Thread Meng Mao
The namenode eventually came up. Here's the resumation of the logging:

2011-12-17 01:37:35,648 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 16978046
2011-12-17 01:43:24,023 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 1
2011-12-17 01:43:24,025 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 2589456651 loaded in 348 seconds.
2011-12-17 01:43:24,030 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /hadoop/hadoop-metadata/cache/dfs/name/current/edits of size
3885 edits # 43 loaded in 0 seconds.
2011-12-17 03:06:26,731 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Invalid opcode,
reached end of edit log Number of transactions found 306757368
2011-12-17 03:06:26,732 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /hadoop/hadoop-metadata/cache/dfs/name/current/edits.new of size
41011966085 edits # 306757368 loaded in 4982 seconds.
2011-12-17 03:06:47,264 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 1724849462 saved in 19 seconds.
2011-12-17 03:07:09,051 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
FSImage in 5373458 msecs

It took a long time to load edits.new, consistent with the speed at which
it loaded fsimage.

So at 3:12 or so, the namenode came back up. At that time, the fsimage got
updated on our secondary namenode:
[vmc@prod1-secondary ~]$ ls -l
/hadoop/hadoop-metadata/cache/dfs/namesecondary/current/
total 1686124
-rw-r--r-- 1 hadoop hadoop  38117 Dec 17 03:12 edits
-rw-r--r-- 1 hadoop hadoop 1724849462 Dec 17 03:12 fsimage

But that's it. No updates since then. Is that normal 2NN behavior? I don't
think we've tuned away from the defaults for fsimage and edits maintenance.
How should I diagnose?

Similarly, primary namenode seems to continue to log changes to edits.new:
-bash-3.2$ ls -l /hadoop/hadoop-metadata/cache/dfs/name/current/
total 1728608
-rw-r--r-- 1 hadoop hadoop  38117 Dec 17 03:12 edits
-rw-r--r-- 1 hadoop hadoop   44064633 Dec 17 11:41 edits.new
-rw-r--r-- 1 hadoop hadoop 1724849462 Dec 17 03:06 fsimage
-rw-r--r-- 1 hadoop hadoop  8 Dec 17 03:07 fstime
-rw-r--r-- 1 hadoop hadoop101 Dec 17 03:07 VERSION

Is this normal? Have I been misunderstanding normal NN operation?

On Sat, Dec 17, 2011 at 3:01 AM, Meng Mao meng...@gmail.com wrote:

 Maybe this is a bad sign -- the edits.new was created before the master
 node crashed, and is huge:

 -bash-3.2$ ls -lh /hadoop/hadoop-metadata/cache/dfs/name/current
 total 41G
 -rw-r--r-- 1 hadoop hadoop 3.8K Jan 27  2011 edits
 -rw-r--r-- 1 hadoop hadoop  39G Dec 17 00:44 edits.new
 -rw-r--r-- 1 hadoop hadoop 2.5G Jan 27  2011 fsimage
 -rw-r--r-- 1 hadoop hadoop8 Jan 27  2011 fstime
 -rw-r--r-- 1 hadoop hadoop  101 Jan 27  2011 VERSION

 could this mean something was up with our SecondaryNameNode and rolling
 the edits file?

 On Sat, Dec 17, 2011 at 2:53 AM, Meng Mao meng...@gmail.com wrote:

 All of the worker nodes datanodes' logs haven't logged anything after the
 initial startup announcement:
 STARTUP_MSG:   host = prod1-worker075/10.2.19.75
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 0.20.1+169.56
 STARTUP_MSG:   build =  -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3;
 compiled by 'root' on Tue Feb  9 13:40:08 EST 2010
 /

 On Sat, Dec 17, 2011 at 2:00 AM, Meng Mao meng...@gmail.com wrote:

 Our CDH2 production grid just crashed with some sort of master node
 failure.
 When I went in there, JobTracker was missing and NameNode was up.
 Trying to ls on HDFS met with no connection.

  We decided to go for a restart. This is in the namenode log right now:

 2011-12-17 01:37:35,568 INFO
 org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
 Initializing NameNodeMeterics using context
 object:org.apache.hadoop.metrics.spi.NullContext
 2011-12-17 01:37:35,612 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
 2011-12-17 01:37:35,613 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
 2011-12-17 01:37:35,613 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 isPermissionEnabled=true
 2011-12-17 01:37:35,620 INFO
 org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
 Initializing FSNamesystemMetrics using context
 object:org.apache.hadoop.metrics.spi.NullContext
 2011-12-17 01:37:35,621 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
 FSNamesystemStatusMBean
 2011-12-17 01:37:35,648 INFO
 org.apache.hadoop.hdfs.server.common.Storage: Number of files = 16978046
 2011-12-17 01:43:24,023 INFO
 org.apache.hadoop.hdfs.server.common.Storage: Number of files under
 construction = 1
 2011-12-17 01:43:24,025 INFO
 org.apache.hadoop.hdfs.server.common.Storage: Image file of size 2589456651
 loaded in 348 seconds.
 2011-12-17 01:43:24,030 INFO

desperate question about NameNode startup sequence

2011-12-16 Thread Meng Mao
Our CDH2 production grid just crashed with some sort of master node failure.
When I went in there, JobTracker was missing and NameNode was up.
Trying to ls on HDFS met with no connection.

We decided to go for a restart. This is in the namenode log right now:

2011-12-17 01:37:35,568 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
Initializing NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2011-12-17 01:37:35,612 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
2011-12-17 01:37:35,613 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-12-17 01:37:35,613 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2011-12-17 01:37:35,620 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
2011-12-17 01:37:35,621 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2011-12-17 01:37:35,648 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 16978046
2011-12-17 01:43:24,023 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 1
2011-12-17 01:43:24,025 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 2589456651 loaded in 348 seconds.
2011-12-17 01:43:24,030 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /hadoop/hadoop-metadata/cache/dfs/name/current/edits of size
3885 edits # 43 loaded in 0 seconds.


What's coming up in the startup sequence? We have a ton of data on there.
Is there any way to estimate startup time?


Re: desperate question about NameNode startup sequence

2011-12-16 Thread Meng Mao
All of the worker nodes datanodes' logs haven't logged anything after the
initial startup announcement:
STARTUP_MSG:   host = prod1-worker075/10.2.19.75
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.1+169.56
STARTUP_MSG:   build =  -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3;
compiled by 'root' on Tue Feb  9 13:40:08 EST 2010
/

On Sat, Dec 17, 2011 at 2:00 AM, Meng Mao meng...@gmail.com wrote:

 Our CDH2 production grid just crashed with some sort of master node
 failure.
 When I went in there, JobTracker was missing and NameNode was up.
 Trying to ls on HDFS met with no connection.

  We decided to go for a restart. This is in the namenode log right now:

 2011-12-17 01:37:35,568 INFO
 org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
 Initializing NameNodeMeterics using context
 object:org.apache.hadoop.metrics.spi.NullContext
 2011-12-17 01:37:35,612 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
 2011-12-17 01:37:35,613 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
 2011-12-17 01:37:35,613 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 isPermissionEnabled=true
 2011-12-17 01:37:35,620 INFO
 org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
 Initializing FSNamesystemMetrics using context
 object:org.apache.hadoop.metrics.spi.NullContext
 2011-12-17 01:37:35,621 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
 FSNamesystemStatusMBean
 2011-12-17 01:37:35,648 INFO org.apache.hadoop.hdfs.server.common.Storage:
 Number of files = 16978046
 2011-12-17 01:43:24,023 INFO org.apache.hadoop.hdfs.server.common.Storage:
 Number of files under construction = 1
 2011-12-17 01:43:24,025 INFO org.apache.hadoop.hdfs.server.common.Storage:
 Image file of size 2589456651 loaded in 348 seconds.
 2011-12-17 01:43:24,030 INFO org.apache.hadoop.hdfs.server.common.Storage:
 Edits file /hadoop/hadoop-metadata/cache/dfs/name/current/edits of size
 3885 edits # 43 loaded in 0 seconds.


 What's coming up in the startup sequence? We have a ton of data on there.
 Is there any way to estimate startup time?



Re: Do failed task attempts stick around the jobcache on local disk?

2011-11-05 Thread Meng Mao
Am I being completely silly asking about this? Does anyone know?

On Wed, Nov 2, 2011 at 6:27 PM, Meng Mao meng...@gmail.com wrote:

 Is there any mechanism in place to remove failed task attempt directories
 from the TaskTracker's jobcache?

 It seems like for us, the only way to get rid of them is manually.


Re: apache-solr-3.3.0, corrupt indexes, and speculative execution

2011-10-30 Thread Meng Mao
Today we again had some leftover attempt dirs -- 2 out of 5 reduce tasks
with a redundant speculative attempt left around their extra attempt:

part-5:
conf  data  solr_attempt_201101270134_42328_r_05_1.1

part-6:
conf  data  solr_attempt_201101270134_42328_r_06_1.1

Is it possible that the disk being really full could cause transient
filesystem glitches that aren't being thrown as exceptions?

On Sat, Oct 29, 2011 at 2:23 PM, Meng Mao meng...@gmail.com wrote:

 We've been getting up to speed on SOLR, and one of the recent problems
 we've run into is with successful jobs delivering corrupt index shards.

 This is a look at a 12-sharded index we built and then copied to local
 disk off of HDFS.

 $ ls -l part-1
 total 16
 drwxr-xr-x 2 vmc visible 4096 Oct 29 09:39 conf
 drwxr-xr-x 4 vmc visible 4096 Oct 29 09:42 data

 $ ls -l part-4/
 total 16
 drwxr-xr-x 4 vmc visible 4096 Oct 29 09:54 data
 drwxr-xr-x 2 vmc visible 4096 Oct 29 09:54
 solr_attempt_201101270134_42143_r_04_1.1


 Right away, you can see that there's apparently some lack of cleanup or
 incompleteness. Shard 04 is missing a conf directory, and has an empty
 attempt directory lying around.

 This is what a complete shard listing looks like:
 $ ls -l part-1/*/*
 -rw-r--r-- 1 vmc visible 33402 Oct 29 09:39 part-1/conf/schema.xml

 part-1/data/index:
 total 13088776
 -rw-r--r-- 1 vmc visible 6036701453 Oct 29 09:40 _1m3.fdt
 *-rw-r--r-- 1 vmc visible  246345692 Oct 29 09:40 _1m3.fdx #missing from
 shard 04*
 -rw-r--r-- 1 vmc visible211 Oct 29 09:40 _1m3.fnm
 -rw-r--r-- 1 vmc visible 3516034769 Oct 29 09:41 _1m3.frq
 -rw-r--r-- 1 vmc visible   92379637 Oct 29 09:41 _1m3.nrm
 -rw-r--r-- 1 vmc visible  695935796 Oct 29 09:41 _1m3.prx
 -rw-r--r-- 1 vmc visible   28548963 Oct 29 09:41 _1m3.tii
 -rw-r--r-- 1 vmc visible 2773769958 Oct 29 09:42 _1m3.tis
 -rw-r--r-- 1 vmc visible284 Oct 29 09:42 segments_2
 -rw-r--r-- 1 vmc visible 20 Oct 29 09:42 segments.gen

 part-1/data/spellchecker:
 total 16
 -rw-r--r-- 1 vmc visible 32 Oct 29 09:42 segments_1
 -rw-r--r-- 1 vmc visible 20 Oct 29 09:42 segments.gen


 And shard 04:
 $ ls -l part-4/*/*
 #missing conf/

 part-4/data/index:
 total 12818420
 -rw-r--r-- 1 vmc visible 6036000614 Oct 29 09:52 _1m1.fdt
 -rw-r--r-- 1 vmc visible211 Oct 29 09:52 _1m1.fnm
 -rw-r--r-- 1 vmc visible 3515333900 Oct 29 09:53 _1m1.frq
 -rw-r--r-- 1 vmc visible   92361544 Oct 29 09:53 _1m1.nrm
 -rw-r--r-- 1 vmc visible  696258210 Oct 29 09:54 _1m1.prx
 -rw-r--r-- 1 vmc visible   28552866 Oct 29 09:54 _1m1.tii
 -rw-r--r-- 1 vmc visible 2744647680 Oct 29 09:54 _1m1.tis
 -rw-r--r-- 1 vmc visible283 Oct 29 09:54 segments_2
 -rw-r--r-- 1 vmc visible 20 Oct 29 09:54 segments.gen

 part-4/data/spellchecker:
 total 16
 -rw-r--r-- 1 vmc visible 32 Oct 29 09:54 segments_1
 -rw-r--r-- 1 vmc visible 20 Oct 29 09:54 segments.gen


 What might cause that attempt path to be lying around at the time of
 completion? Has anyone seen anything like this? My gut says if we were able
 to disable speculative execution, we would probably see this go away. But
 that might be overreacting.


 In this job, of the 12 reduce tasks, 5 had an extra speculative attempt.
 Of those 5, 2 were cases where the speculative attempt won out over the
 first attempt. And one of them had this output inconsistency.

 Here is an excerpt from the task log for shard 04:
 2011-10-29 07:08:33,152 INFO org.apache.solr.hadoop.SolrRecordWriter:
 SolrHome:
 /hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip
 2011-10-29 07:08:33,889 INFO org.apache.solr.hadoop.SolrRecordWriter:
 Constructed instance information solr.home
 /hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip
 (/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip),
 instance dir
 /hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip/,
 conf dir
 /hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip/conf/,
 writing index to temporary directory
 /hadoop/hadoop-metadata/cache/mapred/local/taskTracker/jobcache/job_201101270134_42143/work/solr_attempt_201101270134_42143_r_04_1.1/data,
 with permdir /PROD/output/solr/solr-20111029063514-12/part-4
 2011-10-29 07:08:35,868 INFO org.apache.solr.schema.IndexSchema: Reading
 Solr Schema

 ... much later ...

 2011-10-29 07:08:37,263 WARN org.apache.solr.core.SolrCore: [core1] Solr 
 index directory 
 '/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/jobcache/job_201101270134_42143/work/solr_attempt_201101270134_42143_r_04_1.1/data/index'
  doesn't exist. Creating new index...

 The log doesn't look much

apache-solr-3.3.0, corrupt indexes, and speculative execution

2011-10-29 Thread Meng Mao
We've been getting up to speed on SOLR, and one of the recent problems we've
run into is with successful jobs delivering corrupt index shards.

This is a look at a 12-sharded index we built and then copied to local disk
off of HDFS.

$ ls -l part-1
total 16
drwxr-xr-x 2 vmc visible 4096 Oct 29 09:39 conf
drwxr-xr-x 4 vmc visible 4096 Oct 29 09:42 data

$ ls -l part-4/
total 16
drwxr-xr-x 4 vmc visible 4096 Oct 29 09:54 data
drwxr-xr-x 2 vmc visible 4096 Oct 29 09:54
solr_attempt_201101270134_42143_r_04_1.1


Right away, you can see that there's apparently some lack of cleanup or
incompleteness. Shard 04 is missing a conf directory, and has an empty
attempt directory lying around.

This is what a complete shard listing looks like:
$ ls -l part-1/*/*
-rw-r--r-- 1 vmc visible 33402 Oct 29 09:39 part-1/conf/schema.xml

part-1/data/index:
total 13088776
-rw-r--r-- 1 vmc visible 6036701453 Oct 29 09:40 _1m3.fdt
*-rw-r--r-- 1 vmc visible  246345692 Oct 29 09:40 _1m3.fdx #missing from
shard 04*
-rw-r--r-- 1 vmc visible211 Oct 29 09:40 _1m3.fnm
-rw-r--r-- 1 vmc visible 3516034769 Oct 29 09:41 _1m3.frq
-rw-r--r-- 1 vmc visible   92379637 Oct 29 09:41 _1m3.nrm
-rw-r--r-- 1 vmc visible  695935796 Oct 29 09:41 _1m3.prx
-rw-r--r-- 1 vmc visible   28548963 Oct 29 09:41 _1m3.tii
-rw-r--r-- 1 vmc visible 2773769958 Oct 29 09:42 _1m3.tis
-rw-r--r-- 1 vmc visible284 Oct 29 09:42 segments_2
-rw-r--r-- 1 vmc visible 20 Oct 29 09:42 segments.gen

part-1/data/spellchecker:
total 16
-rw-r--r-- 1 vmc visible 32 Oct 29 09:42 segments_1
-rw-r--r-- 1 vmc visible 20 Oct 29 09:42 segments.gen


And shard 04:
$ ls -l part-4/*/*
#missing conf/

part-4/data/index:
total 12818420
-rw-r--r-- 1 vmc visible 6036000614 Oct 29 09:52 _1m1.fdt
-rw-r--r-- 1 vmc visible211 Oct 29 09:52 _1m1.fnm
-rw-r--r-- 1 vmc visible 3515333900 Oct 29 09:53 _1m1.frq
-rw-r--r-- 1 vmc visible   92361544 Oct 29 09:53 _1m1.nrm
-rw-r--r-- 1 vmc visible  696258210 Oct 29 09:54 _1m1.prx
-rw-r--r-- 1 vmc visible   28552866 Oct 29 09:54 _1m1.tii
-rw-r--r-- 1 vmc visible 2744647680 Oct 29 09:54 _1m1.tis
-rw-r--r-- 1 vmc visible283 Oct 29 09:54 segments_2
-rw-r--r-- 1 vmc visible 20 Oct 29 09:54 segments.gen

part-4/data/spellchecker:
total 16
-rw-r--r-- 1 vmc visible 32 Oct 29 09:54 segments_1
-rw-r--r-- 1 vmc visible 20 Oct 29 09:54 segments.gen


What might cause that attempt path to be lying around at the time of
completion? Has anyone seen anything like this? My gut says if we were able
to disable speculative execution, we would probably see this go away. But
that might be overreacting.


In this job, of the 12 reduce tasks, 5 had an extra speculative attempt. Of
those 5, 2 were cases where the speculative attempt won out over the first
attempt. And one of them had this output inconsistency.

Here is an excerpt from the task log for shard 04:
2011-10-29 07:08:33,152 INFO org.apache.solr.hadoop.SolrRecordWriter:
SolrHome:
/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip
2011-10-29 07:08:33,889 INFO org.apache.solr.hadoop.SolrRecordWriter:
Constructed instance information solr.home
/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip
(/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip),
instance dir
/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip/,
conf dir
/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip/conf/,
writing index to temporary directory
/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/jobcache/job_201101270134_42143/work/solr_attempt_201101270134_42143_r_04_1.1/data,
with permdir /PROD/output/solr/solr-20111029063514-12/part-4
2011-10-29 07:08:35,868 INFO org.apache.solr.schema.IndexSchema: Reading
Solr Schema

... much later ...

2011-10-29 07:08:37,263 WARN org.apache.solr.core.SolrCore: [core1]
Solr index directory
'/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/jobcache/job_201101270134_42143/work/solr_attempt_201101270134_42143_r_04_1.1/data/index'
doesn't exist. Creating new index...

The log doesn't look much different from the successful shard 3 (where the
later attempt beat the earlier attempt). I have the full logs if anyone has
diagnostic advice.


Re: ways to expand hadoop.tmp.dir capacity?

2011-10-10 Thread Meng Mao
So the only way we can expand to multiple mapred.local.dir paths is to
config our site.xml and to restart the DataNode?

On Mon, Oct 10, 2011 at 9:36 AM, Marcos Luis Ortiz Valmaseda 
marcosluis2...@googlemail.com wrote:

 2011/10/9 Harsh J ha...@cloudera.com

  Hello Meng,
 
  On Wed, Oct 5, 2011 at 11:02 AM, Meng Mao meng...@gmail.com wrote:
   Currently, we've got defined:
property
   namehadoop.tmp.dir/name
   value/hadoop/hadoop-metadata/cache//value
/property
  
   In our experiments with SOLR, the intermediate files are so large that
  they
   tend to blow out disk space and fail (and annoyingly leave behind their
  huge
   failed attempts). We've had issues with it in the past, but we're
 having
   real problems with SOLR if we can't comfortably get more space out of
   hadoop.tmp.dir somehow.
  
   1) It seems we never set *mapred.system.dir* to anything special, so
 it's
   defaulting to ${hadoop.tmp.dir}/mapred/system.
   Is this a problem? The docs seem to recommend against it when
  hadoop.tmp.dir
   had ${user.name} in it, which ours doesn't.
 
  The {mapred.system.dir} is a HDFS location, and you shouldn't really
  be worried about it as much.
 
   1b) The doc says mapred.system.dir is the in-HDFS path to shared
  MapReduce
   system files. To me, that means there's must be 1 single path for
   mapred.system.dir, which sort of forces hadoop.tmp.dir to be 1 path.
   Otherwise, one might imagine that you could specify multiple paths to
  store
   hadoop.tmp.dir, like you can for dfs.data.dir. Is this a correct
   interpretation? -- hadoop.tmp.dir could live on multiple paths/disks if
   there were more mapping/lookup between mapred.system.dir and
  hadoop.tmp.dir?
 
  {hadoop.tmp.dir} is indeed reused for {mapred.system.dir}, although it
  is on HDFS, and hence is confusing, but there should just be one
  mapred.system.dir, yes.
 
  Also, the config {hadoop.tmp.dir} doesn't support  1 path. What you
  need here is a proper {mapred.local.dir} configuration.
 
   2) IIRC, there's a -D switch for supplying config name/value pairs into
   indivdiual jobs. Does such a switch exist? Googling for single letters
 is
   fruitless. If we had a path on our workers with more space (in our
 case,
   another hard disk), could we simply pass that path in as hadoop.tmp.dir
  for
   our SOLR jobs? Without incurring any consistency issues on future jobs
  that
   might use the SOLR output on HDFS?
 
  Only a few parameters of a job are user-configurable. Stuff like
  hadoop.tmp.dir and mapred.local.dir are not override-able by user set
  parameters as they are server side configurations (static).
 
   Given that the default value is ${hadoop.tmp.dir}/mapred/local, would
 the
   expanded capacity we're looking for be as easily accomplished as by
  defining
   mapred.local.dir to span multiple disks? Setting aside the issue of
 temp
   files so big that they could still fill a whole disk.
 
  1. You can set mapred.local.dir independent of hadoop.tmp.dir
  2. mapred.local.dir can have comma separated values in it, spanning
  multiple disks
  3. Intermediate outputs may spread across these disks but shall not
  consume  1 disk at a time. So if your largest configured disk is 500
  GB while the total set of them may be 2 TB, then your intermediate
  output size can't really exceed 500 GB, cause only one disk is
  consumed by one task -- the multiple disks are for better I/O
  parallelism between tasks.
 
  Know that hadoop.tmp.dir is a convenience property, for quickly
  starting up dev clusters and such. For a proper configuration, you
  need to remove dependency on it (almost nothing uses hadoop.tmp.dir on
  the server side, once the right properties are configured - ex:
  dfs.data.dir, dfs.name.dir, fs.checkpoint.dir, mapred.local.dir, etc.)
 
  --
  Harsh J
 

 Here it's a excellent explanation how to install Apache Hadoop manually,
 and
 Lars explains this very good.


 http://blog.lars-francke.de/2011/01/26/setting-up-a-hadoop-cluster-part-1-manual-installation/

 Regards

 --
 Marcos Luis Ortíz Valmaseda
  Linux Infrastructure Engineer
  Linux User # 418229
  http://marcosluis2186.posterous.com
  http://www.linkedin.com/in/marcosluis2186
  Twitter: @marcosluis2186



ways to expand hadoop.tmp.dir capacity?

2011-10-04 Thread Meng Mao
Currently, we've got defined:
  property
 namehadoop.tmp.dir/name
 value/hadoop/hadoop-metadata/cache//value
  /property

In our experiments with SOLR, the intermediate files are so large that they
tend to blow out disk space and fail (and annoyingly leave behind their huge
failed attempts). We've had issues with it in the past, but we're having
real problems with SOLR if we can't comfortably get more space out of
hadoop.tmp.dir somehow.

1) It seems we never set *mapred.system.dir* to anything special, so it's
defaulting to ${hadoop.tmp.dir}/mapred/system.
Is this a problem? The docs seem to recommend against it when hadoop.tmp.dir
had ${user.name} in it, which ours doesn't.

1b) The doc says mapred.system.dir is the in-HDFS path to shared MapReduce
system files. To me, that means there's must be 1 single path for
mapred.system.dir, which sort of forces hadoop.tmp.dir to be 1 path.
Otherwise, one might imagine that you could specify multiple paths to store
hadoop.tmp.dir, like you can for dfs.data.dir. Is this a correct
interpretation? -- hadoop.tmp.dir could live on multiple paths/disks if
there were more mapping/lookup between mapred.system.dir and hadoop.tmp.dir?

2) IIRC, there's a -D switch for supplying config name/value pairs into
indivdiual jobs. Does such a switch exist? Googling for single letters is
fruitless. If we had a path on our workers with more space (in our case,
another hard disk), could we simply pass that path in as hadoop.tmp.dir for
our SOLR jobs? Without incurring any consistency issues on future jobs that
might use the SOLR output on HDFS?


Re: ways to expand hadoop.tmp.dir capacity?

2011-10-04 Thread Meng Mao
I just read this:

MapReduce performance can also be improved by distributing the temporary
data generated by MapReduce tasks across multiple disks on each machine:

  property
namemapred.local.dir/name

value/d1/mapred/local,/d2/mapred/local,/d3/mapred/local,/d4/mapred/local/value
finaltrue/final

  /property

Given that the default value is ${hadoop.tmp.dir}/mapred/local, would the
expanded capacity we're looking for be as easily accomplished as by defining
mapred.local.dir to span multiple disks? Setting aside the issue of temp
files so big that they could still fill a whole disk.

On Wed, Oct 5, 2011 at 1:32 AM, Meng Mao meng...@gmail.com wrote:

 Currently, we've got defined:
   property
  namehadoop.tmp.dir/name
  value/hadoop/hadoop-metadata/cache//value
   /property

 In our experiments with SOLR, the intermediate files are so large that they
 tend to blow out disk space and fail (and annoyingly leave behind their huge
 failed attempts). We've had issues with it in the past, but we're having
 real problems with SOLR if we can't comfortably get more space out of
 hadoop.tmp.dir somehow.

 1) It seems we never set *mapred.system.dir* to anything special, so it's
 defaulting to ${hadoop.tmp.dir}/mapred/system.
 Is this a problem? The docs seem to recommend against it when
 hadoop.tmp.dir had ${user.name} in it, which ours doesn't.

 1b) The doc says mapred.system.dir is the in-HDFS path to shared MapReduce
 system files. To me, that means there's must be 1 single path for
 mapred.system.dir, which sort of forces hadoop.tmp.dir to be 1 path.
 Otherwise, one might imagine that you could specify multiple paths to store
 hadoop.tmp.dir, like you can for dfs.data.dir. Is this a correct
 interpretation? -- hadoop.tmp.dir could live on multiple paths/disks if
 there were more mapping/lookup between mapred.system.dir and hadoop.tmp.dir?

 2) IIRC, there's a -D switch for supplying config name/value pairs into
 indivdiual jobs. Does such a switch exist? Googling for single letters is
 fruitless. If we had a path on our workers with more space (in our case,
 another hard disk), could we simply pass that path in as hadoop.tmp.dir for
 our SOLR jobs? Without incurring any consistency issues on future jobs that
 might use the SOLR output on HDFS?







Re: operation of DistributedCache following manual deletion of cached files?

2011-09-27 Thread Meng Mao
Who is in charge of getting the files there for the first time? The
addCacheFile call in the mapreduce job? Or a manual setup by the
user/operator?

On Tue, Sep 27, 2011 at 11:35 AM, Robert Evans ev...@yahoo-inc.com wrote:

 The problem is the step 4 in the breaking sequence.  Currently the
 TaskTracker never looks at the disk to know if a file is in the distributed
 cache or not.  It assumes that if it downloaded the file and did not delete
 that file itself then the file is still there in its original form.  It does
 not know that you deleted those files, or if wrote to the files, or in any
 way altered those files.  In general you should not be modifying those
 files.  This is not only because it messes up the tracking of those files,
 but because other jobs running concurrently with your task may also be using
 those files.

 --Bobby Evans


 On 9/26/11 4:40 PM, Meng Mao meng...@gmail.com wrote:

 Let's frame the issue in another way. I'll describe a sequence of Hadoop
 operations that I think should work, and then I'll get into what we did and
 how it failed.

 Normal sequence:
 1. have files to be cached in HDFS
 2. Run Job A, which specifies those files to be put into DistributedCache
 space
 3. job runs fine
 4. Run Job A some time later. job runs fine again.

 Breaking sequence:
 1. have files to be cached in HDFS
 2. Run Job A, which specifies those files to be put into DistributedCache
 space
 3. job runs fine
 4. Manually delete cached files out of local disk on worker nodes
 5. Run Job A again, expect it to push out cache copies as needed.
 6. job fails because the cache copies didn't get distributed

 Should this second sequence have broken?

 On Fri, Sep 23, 2011 at 3:09 PM, Meng Mao meng...@gmail.com wrote:

  Hmm, I must have really missed an important piece somewhere. This is from
  the MapRed tutorial text:
 
  DistributedCache is a facility provided by the Map/Reduce framework to
  cache files (text, archives, jars and so on) needed by applications.
 
  Applications specify the files to be cached via urls (hdfs://) in the
  JobConf. The DistributedCache* assumes that the files specified via
  hdfs:// urls are already present on the FileSystem.*
 
  *The framework will copy the necessary files to the slave node before any
  tasks for the job are executed on that node*. Its efficiency stems from
  the fact that the files are only copied once per job and the ability to
  cache archives which are un-archived on the slaves.
 
 
  After some close reading, the two bolded pieces seem to be in
 contradiction
  of each other? I'd always that addCacheFile() would perform the 2nd
 bolded
  statement. If that sentence is true, then I still don't have an
 explanation
  of why our job didn't correctly push out new versions of the cache files
  upon the startup and execution of JobConfiguration. We deleted them
 before
  our job started, not during.
 
  On Fri, Sep 23, 2011 at 9:35 AM, Robert Evans ev...@yahoo-inc.com
 wrote:
 
  Meng Mao,
 
  The way the distributed cache is currently written, it does not verify
 the
  integrity of the cache files at all after they are downloaded.  It just
  assumes that if they were downloaded once they are still there and in
 the
  proper shape.  It might be good to file a JIRA to add in some sort of
 check.
   Another thing to do is that the distributed cache also includes the
 time
  stamp of the original file, just incase you delete the file and then use
 a
  different version.  So if you want it to force a download again you can
 copy
  it delete the original and then move it back to what it was before.
 
  --Bobby Evans
 
  On 9/23/11 1:57 AM, Meng Mao meng...@gmail.com wrote:
 
  We use the DistributedCache class to distribute a few lookup files for
 our
  jobs. We have been aggressively deleting failed task attempts' leftover
  data
  , and our script accidentally deleted the path to our distributed cache
  files.
 
  Our task attempt leftover data was here [per node]:
  /hadoop/hadoop-metadata/cache/mapred/local/
  and our distributed cache path was:
  hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/nameNode
  We deleted this path by accident.
 
  Does this latter path look normal? I'm not that familiar with
  DistributedCache but I'm up right now investigating the issue so I
 thought
  I'd ask.
 
  After that deletion, the first 2 jobs to run (which are use the
  addCacheFile
  method to distribute their files) didn't seem to push the files out to
 the
  cache path, except on one node. Is this expected behavior? Shouldn't
  addCacheFile check to see if the files are missing, and if so,
 repopulate
  them as needed?
 
  I'm trying to get a handle on whether it's safe to delete the
 distributed
  cache path when the grid is quiet and no jobs are running. That is, if
  addCacheFile is designed to be robust against the files it's caching not
  being at each job start.
 
 
 




Re: operation of DistributedCache following manual deletion of cached files?

2011-09-27 Thread Meng Mao
I'm not concerned about disk space usage -- the script we used that deleted
the taskTracker cache path has been fixed not to do so.

I'm curious about the exact behavior of jobs that use DistributedCache
files. Again, it seems safe from your description to delete files between
completed runs. How could the job or the taskTracker distinguish between the
files having been deleted and their not having been downloaded from a
previous run of the job? Is it state in memory that the taskTracker
maintains?


On Tue, Sep 27, 2011 at 1:44 PM, Robert Evans ev...@yahoo-inc.com wrote:

 If you are never ever going to use that file again for any map/reduce task
 in the future then yes you can delete it, but I would not recommend it.  If
 you want to reduce the amount of space that is used by the distributed cache
 there is a config parameter for that.

 local.cache.size  it is the number of bytes per drive that will be used
 for storing data in the distributed cache.   This is in 0.20 for hadoop I am
 not sure if it has changed at all for trunk.  It is not documented as far as
 I can tell, and it defaults to 10GB.

 --Bobby Evans


 On 9/27/11 12:04 PM, Meng Mao meng...@gmail.com wrote:

 From that interpretation, it then seems like it would be safe to delete the
 files between completed runs? How could it distinguish between the files
 having been deleted and their not having been downloaded from a previous
 run?

 On Tue, Sep 27, 2011 at 12:25 PM, Robert Evans ev...@yahoo-inc.com
 wrote:

  addCacheFile sets a config value in your jobConf that indicates which
 files
  your particular job depends on.  When the TaskTracker is assigned to run
  part of your job (map task or reduce task), it will download your
 jobConf,
  read it in, and then download the files listed in the conf, if it has not
  already downloaded them from a previous run.  Then it will set up the
  directory structure for your job, possibly adding in symbolic links to
 these
  files in the working directory for your task.  After that it will launch
  your task.
 
  --Bobby Evans
 
  On 9/27/11 11:17 AM, Meng Mao meng...@gmail.com wrote:
 
  Who is in charge of getting the files there for the first time? The
  addCacheFile call in the mapreduce job? Or a manual setup by the
  user/operator?
 
  On Tue, Sep 27, 2011 at 11:35 AM, Robert Evans ev...@yahoo-inc.com
  wrote:
 
   The problem is the step 4 in the breaking sequence.  Currently the
   TaskTracker never looks at the disk to know if a file is in the
  distributed
   cache or not.  It assumes that if it downloaded the file and did not
  delete
   that file itself then the file is still there in its original form.  It
  does
   not know that you deleted those files, or if wrote to the files, or in
  any
   way altered those files.  In general you should not be modifying those
   files.  This is not only because it messes up the tracking of those
  files,
   but because other jobs running concurrently with your task may also be
  using
   those files.
  
   --Bobby Evans
  
  
   On 9/26/11 4:40 PM, Meng Mao meng...@gmail.com wrote:
  
   Let's frame the issue in another way. I'll describe a sequence of
 Hadoop
   operations that I think should work, and then I'll get into what we did
  and
   how it failed.
  
   Normal sequence:
   1. have files to be cached in HDFS
   2. Run Job A, which specifies those files to be put into
 DistributedCache
   space
   3. job runs fine
   4. Run Job A some time later. job runs fine again.
  
   Breaking sequence:
   1. have files to be cached in HDFS
   2. Run Job A, which specifies those files to be put into
 DistributedCache
   space
   3. job runs fine
   4. Manually delete cached files out of local disk on worker nodes
   5. Run Job A again, expect it to push out cache copies as needed.
   6. job fails because the cache copies didn't get distributed
  
   Should this second sequence have broken?
  
   On Fri, Sep 23, 2011 at 3:09 PM, Meng Mao meng...@gmail.com wrote:
  
Hmm, I must have really missed an important piece somewhere. This is
  from
the MapRed tutorial text:
   
DistributedCache is a facility provided by the Map/Reduce framework
 to
cache files (text, archives, jars and so on) needed by applications.
   
Applications specify the files to be cached via urls (hdfs://) in the
JobConf. The DistributedCache* assumes that the files specified via
hdfs:// urls are already present on the FileSystem.*
   
*The framework will copy the necessary files to the slave node before
  any
tasks for the job are executed on that node*. Its efficiency stems
 from
the fact that the files are only copied once per job and the ability
 to
cache archives which are un-archived on the slaves.
   
   
After some close reading, the two bolded pieces seem to be in
   contradiction
of each other? I'd always that addCacheFile() would perform the 2nd
   bolded
statement. If that sentence is true, then I still don't have

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-27 Thread Meng Mao
So the proper description of how DistributedCache normally works is:

1. have files to be cached sitting around in HDFS
2. Run Job A, which specifies those files to be put into DistributedCache
space. Each worker node copies the to-be-cached files from HDFS to local
disk, but more importantly, the TaskTracker acknowledges this distribution
and marks somewhere the fact that these files are cached for the first (and
only time)
3. job runs fine
4. Run Job A some time later. TaskTracker simply assumes (by looking at its
memory) that the files are still cached. The tasks on the workers, on this
second call to addCacheFile, don't actually copy the files from HDFS to
local disk again, but instead accept TaskTracker's word that they're still
there. Because the files actually still exist, the workers run fine and the
job finishes normally.

Is that a correct interpretation? If so, the caution, then, must be that if
you accidentally deleted the local disk cache file copies, you either
repopulate them (as well as their crc checksums) or you restart the
TaskTracker?



On Tue, Sep 27, 2011 at 3:03 PM, Robert Evans ev...@yahoo-inc.com wrote:

 Yes, all of the state for the task tracker is in memory.  It never looks at
 the disk to see what is there, it only maintains the state in memory.

 --bobby Evans


 On 9/27/11 1:00 PM, Meng Mao meng...@gmail.com wrote:

 I'm not concerned about disk space usage -- the script we used that deleted
 the taskTracker cache path has been fixed not to do so.

 I'm curious about the exact behavior of jobs that use DistributedCache
 files. Again, it seems safe from your description to delete files between
 completed runs. How could the job or the taskTracker distinguish between
 the
 files having been deleted and their not having been downloaded from a
 previous run of the job? Is it state in memory that the taskTracker
 maintains?


 On Tue, Sep 27, 2011 at 1:44 PM, Robert Evans ev...@yahoo-inc.com wrote:

  If you are never ever going to use that file again for any map/reduce
 task
  in the future then yes you can delete it, but I would not recommend it.
  If
  you want to reduce the amount of space that is used by the distributed
 cache
  there is a config parameter for that.
 
  local.cache.size  it is the number of bytes per drive that will be used
  for storing data in the distributed cache.   This is in 0.20 for hadoop I
 am
  not sure if it has changed at all for trunk.  It is not documented as far
 as
  I can tell, and it defaults to 10GB.
 
  --Bobby Evans
 
 
  On 9/27/11 12:04 PM, Meng Mao meng...@gmail.com wrote:
 
  From that interpretation, it then seems like it would be safe to delete
 the
  files between completed runs? How could it distinguish between the files
  having been deleted and their not having been downloaded from a previous
  run?
 
  On Tue, Sep 27, 2011 at 12:25 PM, Robert Evans ev...@yahoo-inc.com
  wrote:
 
   addCacheFile sets a config value in your jobConf that indicates which
  files
   your particular job depends on.  When the TaskTracker is assigned to
 run
   part of your job (map task or reduce task), it will download your
  jobConf,
   read it in, and then download the files listed in the conf, if it has
 not
   already downloaded them from a previous run.  Then it will set up the
   directory structure for your job, possibly adding in symbolic links to
  these
   files in the working directory for your task.  After that it will
 launch
   your task.
  
   --Bobby Evans
  
   On 9/27/11 11:17 AM, Meng Mao meng...@gmail.com wrote:
  
   Who is in charge of getting the files there for the first time? The
   addCacheFile call in the mapreduce job? Or a manual setup by the
   user/operator?
  
   On Tue, Sep 27, 2011 at 11:35 AM, Robert Evans ev...@yahoo-inc.com
   wrote:
  
The problem is the step 4 in the breaking sequence.  Currently the
TaskTracker never looks at the disk to know if a file is in the
   distributed
cache or not.  It assumes that if it downloaded the file and did not
   delete
that file itself then the file is still there in its original form.
  It
   does
not know that you deleted those files, or if wrote to the files, or
 in
   any
way altered those files.  In general you should not be modifying
 those
files.  This is not only because it messes up the tracking of those
   files,
but because other jobs running concurrently with your task may also
 be
   using
those files.
   
--Bobby Evans
   
   
On 9/26/11 4:40 PM, Meng Mao meng...@gmail.com wrote:
   
Let's frame the issue in another way. I'll describe a sequence of
  Hadoop
operations that I think should work, and then I'll get into what we
 did
   and
how it failed.
   
Normal sequence:
1. have files to be cached in HDFS
2. Run Job A, which specifies those files to be put into
  DistributedCache
space
3. job runs fine
4. Run Job A some time later. job runs fine again.
   
Breaking sequence

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-26 Thread Meng Mao
Let's frame the issue in another way. I'll describe a sequence of Hadoop
operations that I think should work, and then I'll get into what we did and
how it failed.

Normal sequence:
1. have files to be cached in HDFS
2. Run Job A, which specifies those files to be put into DistributedCache
space
3. job runs fine
4. Run Job A some time later. job runs fine again.

Breaking sequence:
1. have files to be cached in HDFS
2. Run Job A, which specifies those files to be put into DistributedCache
space
3. job runs fine
4. Manually delete cached files out of local disk on worker nodes
5. Run Job A again, expect it to push out cache copies as needed.
6. job fails because the cache copies didn't get distributed

Should this second sequence have broken?

On Fri, Sep 23, 2011 at 3:09 PM, Meng Mao meng...@gmail.com wrote:

 Hmm, I must have really missed an important piece somewhere. This is from
 the MapRed tutorial text:

 DistributedCache is a facility provided by the Map/Reduce framework to
 cache files (text, archives, jars and so on) needed by applications.

 Applications specify the files to be cached via urls (hdfs://) in the
 JobConf. The DistributedCache* assumes that the files specified via
 hdfs:// urls are already present on the FileSystem.*

 *The framework will copy the necessary files to the slave node before any
 tasks for the job are executed on that node*. Its efficiency stems from
 the fact that the files are only copied once per job and the ability to
 cache archives which are un-archived on the slaves.


 After some close reading, the two bolded pieces seem to be in contradiction
 of each other? I'd always that addCacheFile() would perform the 2nd bolded
 statement. If that sentence is true, then I still don't have an explanation
 of why our job didn't correctly push out new versions of the cache files
 upon the startup and execution of JobConfiguration. We deleted them before
 our job started, not during.

 On Fri, Sep 23, 2011 at 9:35 AM, Robert Evans ev...@yahoo-inc.com wrote:

 Meng Mao,

 The way the distributed cache is currently written, it does not verify the
 integrity of the cache files at all after they are downloaded.  It just
 assumes that if they were downloaded once they are still there and in the
 proper shape.  It might be good to file a JIRA to add in some sort of check.
  Another thing to do is that the distributed cache also includes the time
 stamp of the original file, just incase you delete the file and then use a
 different version.  So if you want it to force a download again you can copy
 it delete the original and then move it back to what it was before.

 --Bobby Evans

 On 9/23/11 1:57 AM, Meng Mao meng...@gmail.com wrote:

 We use the DistributedCache class to distribute a few lookup files for our
 jobs. We have been aggressively deleting failed task attempts' leftover
 data
 , and our script accidentally deleted the path to our distributed cache
 files.

 Our task attempt leftover data was here [per node]:
 /hadoop/hadoop-metadata/cache/mapred/local/
 and our distributed cache path was:
 hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/nameNode
 We deleted this path by accident.

 Does this latter path look normal? I'm not that familiar with
 DistributedCache but I'm up right now investigating the issue so I thought
 I'd ask.

 After that deletion, the first 2 jobs to run (which are use the
 addCacheFile
 method to distribute their files) didn't seem to push the files out to the
 cache path, except on one node. Is this expected behavior? Shouldn't
 addCacheFile check to see if the files are missing, and if so, repopulate
 them as needed?

 I'm trying to get a handle on whether it's safe to delete the distributed
 cache path when the grid is quiet and no jobs are running. That is, if
 addCacheFile is designed to be robust against the files it's caching not
 being at each job start.





operation of DistributedCache following manual deletion of cached files?

2011-09-23 Thread Meng Mao
We use the DistributedCache class to distribute a few lookup files for our
jobs. We have been aggressively deleting failed task attempts' leftover data
, and our script accidentally deleted the path to our distributed cache
files.

Our task attempt leftover data was here [per node]:
/hadoop/hadoop-metadata/cache/mapred/local/
and our distributed cache path was:
hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/nameNode
We deleted this path by accident.

Does this latter path look normal? I'm not that familiar with
DistributedCache but I'm up right now investigating the issue so I thought
I'd ask.

After that deletion, the first 2 jobs to run (which are use the addCacheFile
method to distribute their files) didn't seem to push the files out to the
cache path, except on one node. Is this expected behavior? Shouldn't
addCacheFile check to see if the files are missing, and if so, repopulate
them as needed?

I'm trying to get a handle on whether it's safe to delete the distributed
cache path when the grid is quiet and no jobs are running. That is, if
addCacheFile is designed to be robust against the files it's caching not
being at each job start.


Re: operation of DistributedCache following manual deletion of cached files?

2011-09-23 Thread Meng Mao
Hmm, I must have really missed an important piece somewhere. This is from
the MapRed tutorial text:

DistributedCache is a facility provided by the Map/Reduce framework to
cache files (text, archives, jars and so on) needed by applications.

Applications specify the files to be cached via urls (hdfs://) in the
JobConf. The DistributedCache* assumes that the files specified via hdfs://
urls are already present on the FileSystem.*

*The framework will copy the necessary files to the slave node before any
tasks for the job are executed on that node*. Its efficiency stems from the
fact that the files are only copied once per job and the ability to cache
archives which are un-archived on the slaves.


After some close reading, the two bolded pieces seem to be in contradiction
of each other? I'd always that addCacheFile() would perform the 2nd bolded
statement. If that sentence is true, then I still don't have an explanation
of why our job didn't correctly push out new versions of the cache files
upon the startup and execution of JobConfiguration. We deleted them before
our job started, not during.

On Fri, Sep 23, 2011 at 9:35 AM, Robert Evans ev...@yahoo-inc.com wrote:

 Meng Mao,

 The way the distributed cache is currently written, it does not verify the
 integrity of the cache files at all after they are downloaded.  It just
 assumes that if they were downloaded once they are still there and in the
 proper shape.  It might be good to file a JIRA to add in some sort of check.
  Another thing to do is that the distributed cache also includes the time
 stamp of the original file, just incase you delete the file and then use a
 different version.  So if you want it to force a download again you can copy
 it delete the original and then move it back to what it was before.

 --Bobby Evans

 On 9/23/11 1:57 AM, Meng Mao meng...@gmail.com wrote:

 We use the DistributedCache class to distribute a few lookup files for our
 jobs. We have been aggressively deleting failed task attempts' leftover
 data
 , and our script accidentally deleted the path to our distributed cache
 files.

 Our task attempt leftover data was here [per node]:
 /hadoop/hadoop-metadata/cache/mapred/local/
 and our distributed cache path was:
 hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/nameNode
 We deleted this path by accident.

 Does this latter path look normal? I'm not that familiar with
 DistributedCache but I'm up right now investigating the issue so I thought
 I'd ask.

 After that deletion, the first 2 jobs to run (which are use the
 addCacheFile
 method to distribute their files) didn't seem to push the files out to the
 cache path, except on one node. Is this expected behavior? Shouldn't
 addCacheFile check to see if the files are missing, and if so, repopulate
 them as needed?

 I'm trying to get a handle on whether it's safe to delete the distributed
 cache path when the grid is quiet and no jobs are running. That is, if
 addCacheFile is designed to be robust against the files it's caching not
 being at each job start.




Re: Disable Sorting?

2011-09-10 Thread Meng Mao
Is there a way to collate the possibly large number of map output files,
though?

On Sat, Sep 10, 2011 at 2:48 PM, Arun C Murthy a...@hortonworks.com wrote:

 Run a map-only job with #reduces set to 0.

 Arun

 On Sep 10, 2011, at 2:06 AM, john smith wrote:

  Hi,
 
  Some of the MR jobs I run doesn't need sorting of map-output in each
  partition. Is there someway I can disable it?
 
  Any help?
 
  Thanks
  jS




Re: do HDFS files starting with _ (underscore) have special properties?

2011-09-03 Thread Meng Mao
I get the opposite behavior --

[this is more or less how I listed the files in the original email]
hadoop dfs -ls /test/output/solr-20110901165238/part-0/data/index/*
-rw-r--r--   2 hadoopuser visible 8538430603 2011-09-01 18:58
/test/output/solr-20110901165238/part-0/data/index/_ox.fdt
-rw-r--r--   2 hadoopuser visible  233396596 2011-09-01 18:57
/test/output/solr-20110901165238/part-0/data/index/_ox.fdx
-rw-r--r--   2 hadoopuser visible130 2011-09-01 18:57
/test/output/solr-20110901165238/part-0/data/index/_ox.fnm
-rw-r--r--   2 hadoopuser visible 2147948283 2011-09-01 18:55
/test/output/solr-20110901165238/part-0/data/index/_ox.frq
-rw-r--r--   2 hadoopuser visible   87523726 2011-09-01 18:57
/test/output/solr-20110901165238/part-0/data/index/_ox.nrm
-rw-r--r--   2 hadoopuser visible  920936168 2011-09-01 18:57
/test/output/solr-20110901165238/part-0/data/index/_ox.prx
-rw-r--r--   2 hadoopuser visible   22619542 2011-09-01 18:58
/test/output/solr-20110901165238/part-0/data/index/_ox.tii
-rw-r--r--   2 hadoopuser visible 2070214402 2011-09-01 18:51
/test/output/solr-20110901165238/part-0/data/index/_ox.tis
-rw-r--r--   2 hadoopuser visible 20 2011-09-01 18:51
/test/output/solr-20110901165238/part-0/data/index/segments.gen
-rw-r--r--   2 hadoopuser visible282 2011-09-01 18:55
/test/output/solr-20110901165238/part-0/data/index/segments_2

Whereas my globStatus doesn't capture them.

I thought we were on Cloudera's CDH3, but now I'm not sure. This is what
version reports:
$ hadoop version
Hadoop 0.20.1+169.56
Subversion  -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3
Compiled by root on Tue Feb  9 13:40:08 EST 2010





On Fri, Sep 2, 2011 at 11:45 PM, Harsh J ha...@cloudera.com wrote:

 Meng,

 What version of hadoop are you on? I'm able to use globStatus(Path)
 for '_' listing successfully, with a '*' glob. Although the same
 doesn't apply to what FsShell's ls utility provide (which is odd
 here!).

 Here's my test code which can validate that the listing is indeed
 done: http://pastebin.com/vCbd2wmK

 $ hadoop dfs -ls
 Found 4 items
 drwxr-xr-x   - harshchouraria supergroup  0 2011-09-03 09:09
 /user/harshchouraria/_abc
 -rw-r--r--   1 harshchouraria supergroup  0 2011-09-03 09:10
 /user/harshchouraria/_def
 drwxr-xr-x   - harshchouraria supergroup  0 2011-09-03 08:10
 /user/harshchouraria/abc
 -rw-r--r--   1 harshchouraria supergroup  0 2011-09-03 09:10
 /user/harshchouraria/def


 $ hadoop dfs -ls '*'
 -rw-r--r--   1 harshchouraria supergroup  0 2011-09-03 09:10
 /user/harshchouraria/_def
 -rw-r--r--   1 harshchouraria supergroup  0 2011-09-03 09:10
 /user/harshchouraria/def

 $ # No dir results! ^^

 $ hadoop jar myjar.jar # (My code)
 hdfs://localhost/user/harshchouraria/_abc
 hdfs://localhost/user/harshchouraria/_def
 hdfs://localhost/user/harshchouraria/abc
 hdfs://localhost/user/harshchouraria/def

 I suppose that means globStatus is fine, but the FsShell.ls(…) code
 does something more than a simple glob status, and filters away
 directory results when used with a glob.

 On Sat, Sep 3, 2011 at 3:07 AM, Meng Mao meng...@gmail.com wrote:
  Is there a programmatic way to access these hidden files then?
 
  On Fri, Sep 2, 2011 at 5:20 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
 
  On Fri, Sep 2, 2011 at 4:04 PM, Meng Mao meng...@gmail.com wrote:
 
   We have a compression utility that tries to grab all subdirs to a
  directory
   on HDFS. It makes a call like this:
   FileStatus[] subdirs = fs.globStatus(new Path(inputdir, *));
  
   and handles files vs dirs accordingly.
  
   We tried to run our utility against a dir containing a computed SOLR
  shard,
   which has files that look like this:
   -rw-r--r--   2 hadoopuser visible 8538430603 2011-09-01 18:58
   /test/output/solr-20110901165238/part-0/data/index/_ox.fdt
   -rw-r--r--   2 hadoopuser visible  233396596 2011-09-01 18:57
   /test/output/solr-20110901165238/part-0/data/index/_ox.fdx
   -rw-r--r--   2 hadoopuser visible130 2011-09-01 18:57
   /test/output/solr-20110901165238/part-0/data/index/_ox.fnm
   -rw-r--r--   2 hadoopuser visible 2147948283 2011-09-01 18:55
   /test/output/solr-20110901165238/part-0/data/index/_ox.frq
   -rw-r--r--   2 hadoopuser visible   87523726 2011-09-01 18:57
   /test/output/solr-20110901165238/part-0/data/index/_ox.nrm
   -rw-r--r--   2 hadoopuser visible  920936168 2011-09-01 18:57
   /test/output/solr-20110901165238/part-0/data/index/_ox.prx
   -rw-r--r--   2 hadoopuser visible   22619542 2011-09-01 18:58
   /test/output/solr-20110901165238/part-0/data/index/_ox.tii
   -rw-r--r--   2 hadoopuser visible 2070214402 2011-09-01 18:51
   /test/output/solr-20110901165238/part-0/data/index/_ox.tis
   -rw-r--r--   2 hadoopuser visible 20 2011-09-01 18:51
   /test/output/solr-20110901165238/part-0/data/index/segments.gen
   -rw-r--r--   2

do HDFS files starting with _ (underscore) have special properties?

2011-09-02 Thread Meng Mao
We have a compression utility that tries to grab all subdirs to a directory
on HDFS. It makes a call like this:
FileStatus[] subdirs = fs.globStatus(new Path(inputdir, *));

and handles files vs dirs accordingly.

We tried to run our utility against a dir containing a computed SOLR shard,
which has files that look like this:
-rw-r--r--   2 hadoopuser visible 8538430603 2011-09-01 18:58
/test/output/solr-20110901165238/part-0/data/index/_ox.fdt
-rw-r--r--   2 hadoopuser visible  233396596 2011-09-01 18:57
/test/output/solr-20110901165238/part-0/data/index/_ox.fdx
-rw-r--r--   2 hadoopuser visible130 2011-09-01 18:57
/test/output/solr-20110901165238/part-0/data/index/_ox.fnm
-rw-r--r--   2 hadoopuser visible 2147948283 2011-09-01 18:55
/test/output/solr-20110901165238/part-0/data/index/_ox.frq
-rw-r--r--   2 hadoopuser visible   87523726 2011-09-01 18:57
/test/output/solr-20110901165238/part-0/data/index/_ox.nrm
-rw-r--r--   2 hadoopuser visible  920936168 2011-09-01 18:57
/test/output/solr-20110901165238/part-0/data/index/_ox.prx
-rw-r--r--   2 hadoopuser visible   22619542 2011-09-01 18:58
/test/output/solr-20110901165238/part-0/data/index/_ox.tii
-rw-r--r--   2 hadoopuser visible 2070214402 2011-09-01 18:51
/test/output/solr-20110901165238/part-0/data/index/_ox.tis
-rw-r--r--   2 hadoopuser visible 20 2011-09-01 18:51
/test/output/solr-20110901165238/part-0/data/index/segments.gen
-rw-r--r--   2 hadoopuser visible282 2011-09-01 18:55
/test/output/solr-20110901165238/part-0/data/index/segments_2


The globStatus call seems only able to pick up those last 2 files; the
several files that start with _ don't register.

I've skimmed the FileSystem and GlobExpander source to see if there's
anything related to this, but didn't see it. Google didn't turn up anything
about underscores. Am I misunderstanding something about the regex patterns
needed to pick these up or unaware of some filename convention in HDFS?


Re: do HDFS files starting with _ (underscore) have special properties?

2011-09-02 Thread Meng Mao
Is there a programmatic way to access these hidden files then?

On Fri, Sep 2, 2011 at 5:20 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 On Fri, Sep 2, 2011 at 4:04 PM, Meng Mao meng...@gmail.com wrote:

  We have a compression utility that tries to grab all subdirs to a
 directory
  on HDFS. It makes a call like this:
  FileStatus[] subdirs = fs.globStatus(new Path(inputdir, *));
 
  and handles files vs dirs accordingly.
 
  We tried to run our utility against a dir containing a computed SOLR
 shard,
  which has files that look like this:
  -rw-r--r--   2 hadoopuser visible 8538430603 2011-09-01 18:58
  /test/output/solr-20110901165238/part-0/data/index/_ox.fdt
  -rw-r--r--   2 hadoopuser visible  233396596 2011-09-01 18:57
  /test/output/solr-20110901165238/part-0/data/index/_ox.fdx
  -rw-r--r--   2 hadoopuser visible130 2011-09-01 18:57
  /test/output/solr-20110901165238/part-0/data/index/_ox.fnm
  -rw-r--r--   2 hadoopuser visible 2147948283 2011-09-01 18:55
  /test/output/solr-20110901165238/part-0/data/index/_ox.frq
  -rw-r--r--   2 hadoopuser visible   87523726 2011-09-01 18:57
  /test/output/solr-20110901165238/part-0/data/index/_ox.nrm
  -rw-r--r--   2 hadoopuser visible  920936168 2011-09-01 18:57
  /test/output/solr-20110901165238/part-0/data/index/_ox.prx
  -rw-r--r--   2 hadoopuser visible   22619542 2011-09-01 18:58
  /test/output/solr-20110901165238/part-0/data/index/_ox.tii
  -rw-r--r--   2 hadoopuser visible 2070214402 2011-09-01 18:51
  /test/output/solr-20110901165238/part-0/data/index/_ox.tis
  -rw-r--r--   2 hadoopuser visible 20 2011-09-01 18:51
  /test/output/solr-20110901165238/part-0/data/index/segments.gen
  -rw-r--r--   2 hadoopuser visible282 2011-09-01 18:55
  /test/output/solr-20110901165238/part-0/data/index/segments_2
 
 
  The globStatus call seems only able to pick up those last 2 files; the
  several files that start with _ don't register.
 
  I've skimmed the FileSystem and GlobExpander source to see if there's
  anything related to this, but didn't see it. Google didn't turn up
 anything
  about underscores. Am I misunderstanding something about the regex
 patterns
  needed to pick these up or unaware of some filename convention in HDFS?
 

 Files starting with '_' are considered 'hidden' like unix files starting
 with '.'. I did not know that for a very long time because not everyone
 follows this rule or even knows about it.



streaming: -f with a zip file?

2010-04-15 Thread Meng Mao
We are trying to pass into a streaming program a zip file. An invocation
looks like this:

/usr/local/hadoop/bin/hadoop jar
/usr/local/hadoop/contrib/streaming/hadoop-0.20.1+152-streaming.jar ...
 -file 'myfile.zip'
Within the program, it seems like a ZipFile invocation can't locate the zip
file.

Is there anything that might be problematic with a zip file when used with a
-f?


Re: duplicate tasks getting started/killed

2010-02-10 Thread Meng Mao
Right, so have you ever seen your non-idempotent DEFINE command have an
incorrect result? That would essentially point to duplicate attempts
behaving badly.

To your second question -- I think spec exec assumes that not all machines
run at the same speed. If a machine is free (not used for some other
attempt), then Hadoop might schedule an attempt right away on it. It's
possible, depending on the granularity of the work or issues with the
original attempt, that this later attempt finishes first, and thus becomes
the committing attempt.


On Wed, Feb 10, 2010 at 12:10 PM, prasenjit mukherjee 
pmukher...@quattrowireless.com wrote:

 Correctness of the results actually  depends on my DEFINE command. If
 the commands are  idempotent ( which is not in my case ) then I
 believe it wont have any affect on the results, otherwise it will
 indeed make the results incorrect. For example if my command fetches
 some data and appends to a mysql db in that case it is undesirable.

 I have a question though, why in the first place the second attempt
 was kicked off just seconds after the first one. I mean what is the
 rationale behind kicking off a second attempt immediately afterwards ?
 Baffling...

 -Prasen

 On Wed, Feb 10, 2010 at 10:32 PM, Meng Mao meng...@gmail.com wrote:
  That cleanup action looks promising in terms of preventing duplication.
 What
  I'd meant was, could you ever find an instance where the results of your
  DEFINE statement were made incorrect by multiple attempts?
 
  On Wed, Feb 10, 2010 at 5:05 AM, prasenjit mukherjee 
  pmukher...@quattrowireless.com wrote:
 
  Below is the log :
 
  attempt_201002090552_0009_m_01_0
 /default-rack/ip-10-242-142-193.ec2.internal
 SUCCEEDED
 100.00%
 9-Feb-2010 07:04:37
 9-Feb-2010 07:07:00 (2mins, 23sec)
 
   attempt_201002090552_0009_m_01_1
 Task attempt: /default-rack/ip-10-212-147-129.ec2.internal
 Cleanup Attempt: /default-rack/ip-10-212-147-129.ec2.internal
 KILLED
 100.00%
 9-Feb-2010 07:05:34
 9-Feb-2010 07:07:10 (1mins,
 
  So, here is the time-line for both the  attempts :
  attempt_1's start_time=07:04:37 , end_time=07:07:00
  attempt_2's  start_time=07:05:34,
 end_time=07:07:10
 
  -Thanks,
  Prasen
 
  On Wed, Feb 10, 2010 at 1:15 PM, Meng Mao meng...@gmail.com wrote:
   Can you confirm that duplication is happening in the case that one
  attempt
   gets underway but killed before the other's completion?
   I believe by default (though I'm not sure for Pig), each attempt's
 output
  is
   first isolated to a path keyed to its attempt id, and only committed
 when
   one and only one attempt is complete.
  
   On Tue, Feb 9, 2010 at 9:52 PM, prasenjit mukherjee 
   pmukher...@quattrowireless.com wrote:
  
   Any thoughts on this problem ? I am using a DEFINE command ( in PIG )
   and hence the actions are not idempotent. Because of which duplicate
   execution does have an affect on my results. Any way to overcome that
   ?
 
 



Re: duplicate tasks getting started/killed

2010-02-09 Thread Meng Mao
Can you confirm that duplication is happening in the case that one attempt
gets underway but killed before the other's completion?
I believe by default (though I'm not sure for Pig), each attempt's output is
first isolated to a path keyed to its attempt id, and only committed when
one and only one attempt is complete.

On Tue, Feb 9, 2010 at 9:52 PM, prasenjit mukherjee 
pmukher...@quattrowireless.com wrote:

 Any thoughts on this problem ? I am using a DEFINE command ( in PIG )
 and hence the actions are not idempotent. Because of which duplicate
 execution does have an affect on my results. Any way to overcome that
 ?

 On Tue, Feb 9, 2010 at 9:26 PM, prasenjit mukherjee
 pmukher...@quattrowireless.com wrote:
  But the second attempted job got killed even before the first one was
  completed. How can we explain that.
 
  On Tue, Feb 9, 2010 at 7:38 PM, Eric Sammer e...@lifeless.net wrote:
  Prasen:
 
  This is most likely speculative execution. Hadoop fires up multiple
  attempts for the same task and lets them race to see which finishes
  first and then kills the others. This is meant to speed things along.
 
  Speculative execution is on by default, but can be disabled. See the
  configuration reference for mapred-*.xml.
 
  On 2/9/10 9:03 AM, prasenjit mukherjee wrote:
  Sometimes for the same task I see that a duplicate task gets run on a
  different machine and gets killed later. Not always but sometimes. Any
  reason why duplicate tasks get run. I thought tasks are duplicated
  only if  either the first attempt exits( exceptions etc ) or  exceeds
  mapred.task.timeout. In this case none of them happens. As can be seen
  from timestamp, the second attempt starts even though the first
  attempt is still running ( only for 1 minute ).
 
  Any explanation ?
 
  attempt_201002090552_0009_m_01_0
  /default-rack/ip-10-242-142-193.ec2.internal
  SUCCEEDED
  100.00%
  9-Feb-2010 07:04:37
  9-Feb-2010 07:07:00 (2mins, 23sec)
 
  attempt_201002090552_0009_m_01_1
  Task attempt: /default-rack/ip-10-212-147-129.ec2.internal
  Cleanup Attempt: /default-rack/ip-10-212-147-129.ec2.internal
  KILLED
  100.00%
  9-Feb-2010 07:05:34
  9-Feb-2010 07:07:10 (1mins, 36sec)
 
   -Prasen
 
 
 
  --
  Eric Sammer
  e...@lifeless.net
  http://esammer.blogspot.com
 
 



Re: has anyone ported hadoop.lib.aggregate?

2010-02-08 Thread Meng Mao
I'm not familiar with the current roadmap for 0.20, but is there any plan to
backport the new mapreduce.lib.aggregate library into 0.20.x?

I suppose our team could attempt to use the patch ourselves, but we'd be
much more comfortable going with a standard release if at all possible.

On Sun, Feb 7, 2010 at 10:41 PM, Amareshwari Sri Ramadasu 
amar...@yahoo-inc.com wrote:

 Org.apache.hadoop.mapred.lib.aggregate has been ported to new api in branch
 0.21.
 See http://issues.apache.org/jira/browse/MAPREDUCE-358

 Thanks
 Amareshwari

 On 2/7/10 5:34 AM, Meng Mao meng...@gmail.com wrote:

 From what I can tell, while the ValueAggregator stuff should be useable,
 the
 ValueAggregatorJob and ValueAggregatorJobBase classes still use the old
 Mapper and Reducer signatures, and basically aren't compatible with the new
 mapreduce.* API. Is that correct?

 Has anyone out there done a port? We've been dragging our feet very hard
 about getting away from use of deprecated API for our classes that take
 advantage of the aggregate lib. It would be a huge boost if there was any
 stuff we could borrow to port over.




Re: Is it possible to write each key-value pair emitted by the reducer to a different output file

2010-02-06 Thread Meng Mao
It's possible to write your own class to take better encapsulate writing of
side-effect files, but as people have said, you can run into unanticipated
issues if the number of files you try to write at once becomes high.

On Sat, Feb 6, 2010 at 3:47 AM, Udaya Lakshmi udaya...@gmail.com wrote:

 Hi Amareshwari,

   But this feature is not available in Hadoop 0.18.3. Is there any work
 around for this version.
 Thanks,
 Udaya.

 On Fri, Feb 5, 2010 at 10:49 AM, Amareshwari Sri Ramadasu 
 amar...@yahoo-inc.com wrote:

  See MultipleOutputs at
 
 http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
 
  -Amareshwari
 
  On 2/5/10 10:41 AM, Udaya Lakshmi udaya...@gmail.com wrote:
 
  Hi,
   I was wondering if it is possible to write each key-value pair produced
 by
  the reduce function to a different file. How could I open a new file in
 the
  reduce function of the reducer? I know its possible in configure function
  but it will write all the output that reducer to that file.
  Thanks,
  Udaya.
 
 



has anyone ported hadoop.lib.aggregate?

2010-02-06 Thread Meng Mao
From what I can tell, while the ValueAggregator stuff should be useable, the
ValueAggregatorJob and ValueAggregatorJobBase classes still use the old
Mapper and Reducer signatures, and basically aren't compatible with the new
mapreduce.* API. Is that correct?

Has anyone out there done a port? We've been dragging our feet very hard
about getting away from use of deprecated API for our classes that take
advantage of the aggregate lib. It would be a huge boost if there was any
stuff we could borrow to port over.


Re: EOFException and BadLink, but file descriptors number is ok?

2010-02-05 Thread Meng Mao
ack, after looking at the logs again, there are definitely xcievers errors.
It's set to 256!
I had thought I had cleared this a possible cause, but guess I was wrong.
Gonna retest right away.
Thanks!

On Fri, Feb 5, 2010 at 11:05 AM, Todd Lipcon t...@cloudera.com wrote:

 Yes, you're likely to see an error in the DN log. Do you see anything
 about max number of xceivers?

 -Todd

 On Thu, Feb 4, 2010 at 11:42 PM, Meng Mao meng...@gmail.com wrote:
  not sure what else I could be checking to see where the problem lies.
 Should
  I be looking in the datanode logs? I looked briefly in there and didn't
 see
  anything from around the time exceptions started getting reported.
  lsof during the job execution? Number of open threads?
 
  I'm at a loss here.
 
  On Thu, Feb 4, 2010 at 2:52 PM, Meng Mao meng...@gmail.com wrote:
 
  I wrote a hadoop job that checks for ulimits across the nodes, and every
  node is reporting:
  core file size  (blocks, -c) 0
  data seg size   (kbytes, -d) unlimited
  scheduling priority (-e) 0
  file size   (blocks, -f) unlimited
  pending signals (-i) 139264
  max locked memory   (kbytes, -l) 32
  max memory size (kbytes, -m) unlimited
  open files  (-n) 65536
  pipe size(512 bytes, -p) 8
  POSIX message queues (bytes, -q) 819200
  real-time priority  (-r) 0
  stack size  (kbytes, -s) 10240
  cpu time   (seconds, -t) unlimited
  max user processes  (-u) 139264
  virtual memory  (kbytes, -v) unlimited
  file locks  (-x) unlimited
 
 
  Is anything in there telling about file number limits? From what I
  understand, a high open files limit like 65536 should be enough. I
 estimate
  only a couple thousand part-files on HDFS being written to at once, and
  around 200 on the filesystem per node.
 
  On Wed, Feb 3, 2010 at 4:04 PM, Meng Mao meng...@gmail.com wrote:
 
  also, which is the ulimit that's important, the one for the user who is
  running the job, or the hadoop user that owns the Hadoop processes?
 
 
  On Tue, Feb 2, 2010 at 7:29 PM, Meng Mao meng...@gmail.com wrote:
 
  I've been trying to run a fairly small input file (300MB) on Cloudera
  Hadoop 0.20.1. The job I'm using probably writes to on the order of
 over
  1000 part-files at once, across the whole grid. The grid has 33 nodes
 in it.
  I get the following exception in the run logs:
 
  10/01/30 17:24:25 INFO mapred.JobClient:  map 100% reduce 12%
  10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
  attempt_201001261532_1137_r_13_0, Status : FAILED
  java.io.EOFException
  at java.io.DataInputStream.readByte(DataInputStream.java:250)
  at
  org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
  at
  org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
  at org.apache.hadoop.io.Text.readString(Text.java:400)
  at
 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2869)
  at
 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
  at
 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
  at
 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)
 
  lots of EOFExceptions
 
  10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
  attempt_201001261532_1137_r_19_0, Status : FAILED
  java.io.IOException: Bad connect ack with firstBadLink
 10.2.19.1:50010
  at
 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2871)
  at
 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
   at
 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
  at
 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)
 
  10/01/30 17:24:36 INFO mapred.JobClient:  map 100% reduce 11%
  10/01/30 17:24:42 INFO mapred.JobClient:  map 100% reduce 12%
  10/01/30 17:24:49 INFO mapred.JobClient:  map 100% reduce 13%
  10/01/30 17:24:55 INFO mapred.JobClient:  map 100% reduce 14%
  10/01/30 17:25:00 INFO mapred.JobClient:  map 100% reduce 15%
 
  From searching around, it seems like the most common cause of BadLink
 and
  EOFExceptions is when the nodes don't have enough file descriptors
 set. But
  across all the grid machines, the file-max has been set to 1573039.
  Furthermore, we set ulimit -n to 65536 using hadoop-env.sh.
 
  Where else should I be looking for what's causing this?
 
 
 
 
 



Re: EOFException and BadLink, but file descriptors number is ok?

2010-02-04 Thread Meng Mao
I wrote a hadoop job that checks for ulimits across the nodes, and every
node is reporting:
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 139264
max locked memory   (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files  (-n) 65536
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 139264
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited


Is anything in there telling about file number limits? From what I
understand, a high open files limit like 65536 should be enough. I estimate
only a couple thousand part-files on HDFS being written to at once, and
around 200 on the filesystem per node.

On Wed, Feb 3, 2010 at 4:04 PM, Meng Mao meng...@gmail.com wrote:

 also, which is the ulimit that's important, the one for the user who is
 running the job, or the hadoop user that owns the Hadoop processes?


 On Tue, Feb 2, 2010 at 7:29 PM, Meng Mao meng...@gmail.com wrote:

 I've been trying to run a fairly small input file (300MB) on Cloudera
 Hadoop 0.20.1. The job I'm using probably writes to on the order of over
 1000 part-files at once, across the whole grid. The grid has 33 nodes in it.
 I get the following exception in the run logs:

 10/01/30 17:24:25 INFO mapred.JobClient:  map 100% reduce 12%
 10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
 attempt_201001261532_1137_r_13_0, Status : FAILED
 java.io.EOFException
 at java.io.DataInputStream.readByte(DataInputStream.java:250)
 at
 org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
 at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
 at org.apache.hadoop.io.Text.readString(Text.java:400)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2869)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)

 lots of EOFExceptions

 10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
 attempt_201001261532_1137_r_19_0, Status : FAILED
 java.io.IOException: Bad connect ack with firstBadLink 10.2.19.1:50010
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2871)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
  at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)

 10/01/30 17:24:36 INFO mapred.JobClient:  map 100% reduce 11%
 10/01/30 17:24:42 INFO mapred.JobClient:  map 100% reduce 12%
 10/01/30 17:24:49 INFO mapred.JobClient:  map 100% reduce 13%
 10/01/30 17:24:55 INFO mapred.JobClient:  map 100% reduce 14%
 10/01/30 17:25:00 INFO mapred.JobClient:  map 100% reduce 15%

 From searching around, it seems like the most common cause of BadLink and
 EOFExceptions is when the nodes don't have enough file descriptors set. But
 across all the grid machines, the file-max has been set to 1573039.
 Furthermore, we set ulimit -n to 65536 using hadoop-env.sh.

 Where else should I be looking for what's causing this?





Re: EOFException and BadLink, but file descriptors number is ok?

2010-02-04 Thread Meng Mao
not sure what else I could be checking to see where the problem lies. Should
I be looking in the datanode logs? I looked briefly in there and didn't see
anything from around the time exceptions started getting reported.
lsof during the job execution? Number of open threads?

I'm at a loss here.

On Thu, Feb 4, 2010 at 2:52 PM, Meng Mao meng...@gmail.com wrote:

 I wrote a hadoop job that checks for ulimits across the nodes, and every
 node is reporting:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 139264
 max locked memory   (kbytes, -l) 32
 max memory size (kbytes, -m) unlimited
 open files  (-n) 65536
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 139264
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited


 Is anything in there telling about file number limits? From what I
 understand, a high open files limit like 65536 should be enough. I estimate
 only a couple thousand part-files on HDFS being written to at once, and
 around 200 on the filesystem per node.

 On Wed, Feb 3, 2010 at 4:04 PM, Meng Mao meng...@gmail.com wrote:

 also, which is the ulimit that's important, the one for the user who is
 running the job, or the hadoop user that owns the Hadoop processes?


 On Tue, Feb 2, 2010 at 7:29 PM, Meng Mao meng...@gmail.com wrote:

 I've been trying to run a fairly small input file (300MB) on Cloudera
 Hadoop 0.20.1. The job I'm using probably writes to on the order of over
 1000 part-files at once, across the whole grid. The grid has 33 nodes in it.
 I get the following exception in the run logs:

 10/01/30 17:24:25 INFO mapred.JobClient:  map 100% reduce 12%
 10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
 attempt_201001261532_1137_r_13_0, Status : FAILED
 java.io.EOFException
 at java.io.DataInputStream.readByte(DataInputStream.java:250)
 at
 org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
 at
 org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
 at org.apache.hadoop.io.Text.readString(Text.java:400)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2869)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)

 lots of EOFExceptions

 10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
 attempt_201001261532_1137_r_19_0, Status : FAILED
 java.io.IOException: Bad connect ack with firstBadLink 10.2.19.1:50010
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2871)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
  at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)

 10/01/30 17:24:36 INFO mapred.JobClient:  map 100% reduce 11%
 10/01/30 17:24:42 INFO mapred.JobClient:  map 100% reduce 12%
 10/01/30 17:24:49 INFO mapred.JobClient:  map 100% reduce 13%
 10/01/30 17:24:55 INFO mapred.JobClient:  map 100% reduce 14%
 10/01/30 17:25:00 INFO mapred.JobClient:  map 100% reduce 15%

 From searching around, it seems like the most common cause of BadLink and
 EOFExceptions is when the nodes don't have enough file descriptors set. But
 across all the grid machines, the file-max has been set to 1573039.
 Furthermore, we set ulimit -n to 65536 using hadoop-env.sh.

 Where else should I be looking for what's causing this?






Re: EOFException and BadLink, but file descriptors number is ok?

2010-02-03 Thread Meng Mao
also, which is the ulimit that's important, the one for the user who is
running the job, or the hadoop user that owns the Hadoop processes?

On Tue, Feb 2, 2010 at 7:29 PM, Meng Mao meng...@gmail.com wrote:

 I've been trying to run a fairly small input file (300MB) on Cloudera
 Hadoop 0.20.1. The job I'm using probably writes to on the order of over
 1000 part-files at once, across the whole grid. The grid has 33 nodes in it.
 I get the following exception in the run logs:

 10/01/30 17:24:25 INFO mapred.JobClient:  map 100% reduce 12%
 10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
 attempt_201001261532_1137_r_13_0, Status : FAILED
 java.io.EOFException
 at java.io.DataInputStream.readByte(DataInputStream.java:250)
 at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
 at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
 at org.apache.hadoop.io.Text.readString(Text.java:400)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2869)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)

 lots of EOFExceptions

 10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
 attempt_201001261532_1137_r_19_0, Status : FAILED
 java.io.IOException: Bad connect ack with firstBadLink 10.2.19.1:50010
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2871)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
  at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)

 10/01/30 17:24:36 INFO mapred.JobClient:  map 100% reduce 11%
 10/01/30 17:24:42 INFO mapred.JobClient:  map 100% reduce 12%
 10/01/30 17:24:49 INFO mapred.JobClient:  map 100% reduce 13%
 10/01/30 17:24:55 INFO mapred.JobClient:  map 100% reduce 14%
 10/01/30 17:25:00 INFO mapred.JobClient:  map 100% reduce 15%

 From searching around, it seems like the most common cause of BadLink and
 EOFExceptions is when the nodes don't have enough file descriptors set. But
 across all the grid machines, the file-max has been set to 1573039.
 Furthermore, we set ulimit -n to 65536 using hadoop-env.sh.

 Where else should I be looking for what's causing this?



EOFException and BadLink, but file descriptors number is ok?

2010-02-02 Thread Meng Mao
I've been trying to run a fairly small input file (300MB) on Cloudera Hadoop
0.20.1. The job I'm using probably writes to on the order of over 1000
part-files at once, across the whole grid. The grid has 33 nodes in it. I
get the following exception in the run logs:

10/01/30 17:24:25 INFO mapred.JobClient:  map 100% reduce 12%
10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
attempt_201001261532_1137_r_13_0, Status : FAILED
java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at org.apache.hadoop.io.Text.readString(Text.java:400)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2869)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)

lots of EOFExceptions

10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
attempt_201001261532_1137_r_19_0, Status : FAILED
java.io.IOException: Bad connect ack with firstBadLink 10.2.19.1:50010
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2871)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
 at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)

10/01/30 17:24:36 INFO mapred.JobClient:  map 100% reduce 11%
10/01/30 17:24:42 INFO mapred.JobClient:  map 100% reduce 12%
10/01/30 17:24:49 INFO mapred.JobClient:  map 100% reduce 13%
10/01/30 17:24:55 INFO mapred.JobClient:  map 100% reduce 14%
10/01/30 17:25:00 INFO mapred.JobClient:  map 100% reduce 15%

From searching around, it seems like the most common cause of BadLink and
EOFExceptions is when the nodes don't have enough file descriptors set. But
across all the grid machines, the file-max has been set to 1573039.
Furthermore, we set ulimit -n to 65536 using hadoop-env.sh.

Where else should I be looking for what's causing this?


Re: Repeated attempts to kill old job?

2010-02-02 Thread Meng Mao
We restarted the grid and that did kill the repeated KillJobAction attempts.
I forgot to look around with hadoop -dfsadmin, though.

On Mon, Feb 1, 2010 at 11:29 PM, Rekha Joshi rekha...@yahoo-inc.com wrote:

 I would say restart the cluster, but suspect that would not help either -
 instead try checking up your running process list (eg: perl/shell script or
 a ETL pipeline job) to analyze/kill.
 Also wondering if any hadoop -dfsadmin commands can supersede this
 scenario..

 Cheers,
 /R


 On 2/2/10 2:50 AM, Meng Mao meng...@gmail.com wrote:

 On our worker nodes, I see repeated requests for KillJobActions for the
 same
 old job:
 2010-01-31 00:00:01,024 INFO org.apache.hadoop.mapred.TaskTracker: Received
 'KillJobAction' for job: job_201001261532_0690
 2010-01-31 00:00:01,064 WARN org.apache.hadoop.mapred.TaskTracker: Unknown
 job job_201001261532_0690 being deleted.

 This request and response is repeated almost 30k times over the course of
 the day. Other nodes have the same behavior, except with different job ids.
 The jobs presumably all ran in the past, to completion or got killed
 manually. We use the grid fairly actively and that job is several hundred
 increments old.

 Has anyone seen this before? Is there a way to stop it?




Repeated attempts to kill old job?

2010-02-01 Thread Meng Mao
On our worker nodes, I see repeated requests for KillJobActions for the same
old job:
2010-01-31 00:00:01,024 INFO org.apache.hadoop.mapred.TaskTracker: Received
'KillJobAction' for job: job_201001261532_0690
2010-01-31 00:00:01,064 WARN org.apache.hadoop.mapred.TaskTracker: Unknown
job job_201001261532_0690 being deleted.

This request and response is repeated almost 30k times over the course of
the day. Other nodes have the same behavior, except with different job ids.
The jobs presumably all ran in the past, to completion or got killed
manually. We use the grid fairly actively and that job is several hundred
increments old.

Has anyone seen this before? Is there a way to stop it?