Hadoop streaming - Subprocess failed

2012-08-29 Thread Periya.Data
Hi,
I am running a map-reduce job in Python and I get this error message. I
do not understand what it means. Output is not written to HDFS. I am using
CDH3u3. Any suggestion is appreciated.

MapAttempt TASK_TYPE=MAP TASKID=task_201208232245_2812_m_00
TASK_ATTEMPT_ID=attempt_201208232245_2812_m_00_0
TASK_STATUS=FAILED  *ERROR=java\.lang\.RuntimeException:
PipeMapRed\.waitOutputThreads(): subprocess failed with code 1*
at
org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:362)
at
org\.apache\.hadoop\.streaming\.PipeMapRed\.mapRedFinished(PipeMapRed\.java:572)
at
org\.apache\.hadoop\.streaming\.PipeMapper\.close(PipeMapper\.java:136)
at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:57)
at
org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:34)
at
org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:391)
at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:325)
at org\.apache\.hadoop\.mapred\.Child$4\.run(Child\.java:270)
at java\.security\.AccessController\.doPrivileged(Native Method)
at javax\.security\.auth\.Subject\.doAs(Subject\.java:396)
at
org\.apache\.hadoop\.security\.UserGroupInformation\.doAs(UserGroupInformation\.java:1157)
at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:264)
 .


RE: Metrics ..

2012-08-29 Thread Wong, David (DMITS)
Here's a snippet of tasktracker metrics using Metrics2.  (I think there were 
(more) gaps in the pre-metrics2 versions.)
Note that you'll need to have hadoop-env.sh and hadoop-metrics2.properties 
setup on all the nodes you want reports from.

1345570905436 ugi.ugi: context=ugi, hostName=sqws31.caclab.cac.cpqcorp.net, 
loginSuccess_num_ops=0, loginSuccess_avg_time=0.0, loginFailure_num_ops=0, 
loginFailure_avg_time=0.0
1345570905436 jvm.metrics: context=jvm, processName=TaskTracker, sessionId=, 
hostName=sqws31.caclab.cac.cpqcorp.net, memNonHeapUsedM=11.540627, 
memNonHeapCommittedM=18.25, memHeapUsedM=12.972412, memHeapCommittedM=61.375, 
gcCount=1, gcTimeMillis=6, threadsNew=0, threadsRunnable=9, threadsBlocked=0, 
threadsWaiting=9, threadsTimedWaiting=1, threadsTerminated=0, logFatal=0, 
logError=0, logWarn=0, logInfo=1
1345570905436 mapred.tasktracker: context=mapred, sessionId=, 
hostName=sqws31.caclab.cac.cpqcorp.net, maps_running=0, reduces_running=0, 
mapTaskSlots=2, reduceTaskSlots=2, tasks_completed=0, tasks_failed_timeout=0, 
tasks_failed_ping=0
1345570905436 rpcdetailed.rpcdetailed: context=rpcdetailed, port=33997, 
hostName=sqws31.caclab.cac.cpqcorp.net
1345570905436 rpc.rpc: context=rpc, port=33997, 
hostName=sqws31.caclab.cac.cpqcorp.net, rpcAuthenticationSuccesses=0, 
rpcAuthenticationFailures=0, rpcAuthorizationSuccesses=0, 
rpcAuthorizationFailures=0, ReceivedBytes=0, SentBytes=0, 
RpcQueueTime_num_ops=0, RpcQueueTime_avg_time=0.0, RpcProcessingTime_num_ops=0, 
RpcProcessingTime_avg_time=0.0, NumOpenConnections=0, callQueueLen=0
1345570905436 metricssystem.MetricsSystem: context=metricssystem, 
hostName=sqws31.caclab.cac.cpqcorp.net, num_sources=5, num_sinks=1, 
sink.file.latency_num_ops=0, sink.file.latency_avg_time=0.0, 
sink.file.dropped=0, sink.file.qsize=0, snapshot_num_ops=5, 
snapshot_avg_time=0.2, snapshot_stdev_time=0.447213595499958, 
snapshot_imin_time=0.0, snapshot_imax_time=1.0, snapshot_min_time=0.0, 
snapshot_max_time=1.0, publish_num_ops=0, publish_avg_time=0.0, 
publish_stdev_time=0.0, publish_imin_time=3.4028234663852886E38, 
publish_imax_time=1.401298464324817E-45, 
publish_min_time=3.4028234663852886E38, publish_max_time=1.401298464324817E-45, 
dropped_pub_all=0
1345570915435 ugi.ugi: context=ugi, hostName=sqws31.caclab.cac.cpqcorp.net
1345570915435 jvm.metrics: context=jvm, processName=TaskTracker, sessionId=, 
hostName=sqws31.caclab.cac.cpqcorp.net, memNonHeapUsedM=11.549316, 
memNonHeapCommittedM=18.25, memHeapUsedM=13.136337, memHeapCommittedM=61.375, 
gcCount=1, gcTimeMillis=6, threadsNew=0, threadsRunnable=9, threadsBlocked=0, 
threadsWaiting=9, threadsTimedWaiting=1, threadsTerminated=0, logFatal=0, 
logError=0, logWarn=0, logInfo=1
1345570915435 mapred.tasktracker: context=mapred, sessionId=, 
hostName=sqws31.caclab.cac.cpqcorp.net, maps_running=0, reduces_running=0, 
mapTaskSlots=2, reduceTaskSlots=2
1345570915435 rpcdetailed.rpcdetailed: context=rpcdetailed, port=33997, 
hostName=sqws31.caclab.cac.cpqcorp.net
1345570915435 rpc.rpc: context=rpc, port=33997, 
hostName=sqws31.caclab.cac.cpqcorp.net
1345570915435 metricssystem.MetricsSystem: context=metricssystem, 
hostName=sqws31.caclab.cac.cpqcorp.net, num_sources=5, num_sinks=1, 
sink.file.latency_num_ops=1, sink.file.latency_avg_time=4.0, 
snapshot_num_ops=11, snapshot_avg_time=0.16669, 
snapshot_stdev_time=0.408248290463863, snapshot_imin_time=0.0, 
snapshot_imax_time=1.0, snapshot_min_time=0.0, snapshot_max_time=1.0, 
publish_num_ops=1, publish_avg_time=0.0, publish_stdev_time=0.0, 
publish_imin_time=0.0, publish_imax_time=1.401298464324817E-45, 
publish_min_time=0.0, publish_max_time=1.401298464324817E-45, dropped_pub_all=0
1345570925435 ugi.ugi: context=ugi, hostName=sqws31.caclab.cac.cpqcorp.net
1345570925435 jvm.metrics: context=jvm, processName=TaskTracker, sessionId=, 
hostName=sqws31.caclab.cac.cpqcorp.net, memNonHeapUsedM=13.002403, 
memNonHeapCommittedM=18.25, memHeapUsedM=11.503555, memHeapCommittedM=61.375, 
gcCount=2, gcTimeMillis=12, threadsNew=0, threadsRunnable=9, threadsBlocked=0, 
threadsWaiting=13, threadsTimedWaiting=7, threadsTerminated=0, logFatal=0, 
logError=0, logWarn=0, logInfo=3
1345570925435 mapred.tasktracker: context=mapred, sessionId=, 
hostName=sqws31.caclab.cac.cpqcorp.net, maps_running=0, reduces_running=0, 
mapTaskSlots=2, reduceTaskSlots=2
1345570925435 rpcdetailed.rpcdetailed: context=rpcdetailed, port=33997, 
hostName=sqws31.caclab.cac.cpqcorp.net
1345570925435 rpc.rpc: context=rpc, port=33997, 
hostName=sqws31.caclab.cac.cpqcorp.net
1345570925436 mapred.shuffleOutput: context=mapred, sessionId=, 
hostName=sqws31.caclab.cac.cpqcorp.net, shuffle_handler_busy_percent=0.0, 
shuffle_output_bytes=0, shuffle_failed_outputs=0, shuffle_success_outputs=0, 
shuffle_exceptions_caught=0
1345570925436 metricssystem.MetricsSystem: context=metricssystem, 
hostName=sqws31.caclab.cac.cpqcorp.net, num_sources=6, num_sinks=1, 
sink.file.latency_num_ops=2, 

Re: Metrics ..

2012-08-29 Thread Mark Olimpiati
Hi David,

   I  enabled the jvm.class of the hadoop-metrics.properties, you're
output seems to be from something else (dfs.class or mapred.class) which
reports hadoop deamons performace. For example your output shows
processName=TaskTracker
which I'm not looking for.

  How can I report jvm statistics for individual jvms (maps/reducers) ??

Thank you,
Mark

On Wed, Aug 29, 2012 at 1:28 PM, Wong, David (DMITS) dav...@hp.com wrote:

 Here's a snippet of tasktracker metrics using Metrics2.  (I think there
 were (more) gaps in the pre-metrics2 versions.)
 Note that you'll need to have hadoop-env.sh and hadoop-metrics2.properties
 setup on all the nodes you want reports from.

 1345570905436 ugi.ugi: context=ugi, hostName=sqws31.caclab.cac.cpqcorp.net,
 loginSuccess_num_ops=0, loginSuccess_avg_time=0.0, loginFailure_num_ops=0,
 loginFailure_avg_time=0.0
 1345570905436 jvm.metrics: context=jvm, processName=TaskTracker,
 sessionId=, hostName=sqws31.caclab.cac.cpqcorp.net,
 memNonHeapUsedM=11.540627, memNonHeapCommittedM=18.25,
 memHeapUsedM=12.972412, memHeapCommittedM=61.375, gcCount=1,
 gcTimeMillis=6, threadsNew=0, threadsRunnable=9, threadsBlocked=0,
 threadsWaiting=9, threadsTimedWaiting=1, threadsTerminated=0, logFatal=0,
 logError=0, logWarn=0, logInfo=1
 1345570905436 mapred.tasktracker: context=mapred, sessionId=, hostName=
 sqws31.caclab.cac.cpqcorp.net, maps_running=0, reduces_running=0,
 mapTaskSlots=2, reduceTaskSlots=2, tasks_completed=0,
 tasks_failed_timeout=0, tasks_failed_ping=0
 1345570905436 rpcdetailed.rpcdetailed: context=rpcdetailed, port=33997,
 hostName=sqws31.caclab.cac.cpqcorp.net
 1345570905436 rpc.rpc: context=rpc, port=33997, hostName=
 sqws31.caclab.cac.cpqcorp.net, rpcAuthenticationSuccesses=0,
 rpcAuthenticationFailures=0, rpcAuthorizationSuccesses=0,
 rpcAuthorizationFailures=0, ReceivedBytes=0, SentBytes=0,
 RpcQueueTime_num_ops=0, RpcQueueTime_avg_time=0.0,
 RpcProcessingTime_num_ops=0, RpcProcessingTime_avg_time=0.0,
 NumOpenConnections=0, callQueueLen=0
 1345570905436 metricssystem.MetricsSystem: context=metricssystem, hostName=
 sqws31.caclab.cac.cpqcorp.net, num_sources=5, num_sinks=1,
 sink.file.latency_num_ops=0, sink.file.latency_avg_time=0.0,
 sink.file.dropped=0, sink.file.qsize=0, snapshot_num_ops=5,
 snapshot_avg_time=0.2, snapshot_stdev_time=0.447213595499958,
 snapshot_imin_time=0.0, snapshot_imax_time=1.0, snapshot_min_time=0.0,
 snapshot_max_time=1.0, publish_num_ops=0, publish_avg_time=0.0,
 publish_stdev_time=0.0, publish_imin_time=3.4028234663852886E38,
 publish_imax_time=1.401298464324817E-45,
 publish_min_time=3.4028234663852886E38,
 publish_max_time=1.401298464324817E-45, dropped_pub_all=0
 1345570915435 ugi.ugi: context=ugi, hostName=sqws31.caclab.cac.cpqcorp.net
 1345570915435 jvm.metrics: context=jvm, processName=TaskTracker,
 sessionId=, hostName=sqws31.caclab.cac.cpqcorp.net,
 memNonHeapUsedM=11.549316, memNonHeapCommittedM=18.25,
 memHeapUsedM=13.136337, memHeapCommittedM=61.375, gcCount=1,
 gcTimeMillis=6, threadsNew=0, threadsRunnable=9, threadsBlocked=0,
 threadsWaiting=9, threadsTimedWaiting=1, threadsTerminated=0, logFatal=0,
 logError=0, logWarn=0, logInfo=1
 1345570915435 mapred.tasktracker: context=mapred, sessionId=, hostName=
 sqws31.caclab.cac.cpqcorp.net, maps_running=0, reduces_running=0,
 mapTaskSlots=2, reduceTaskSlots=2
 1345570915435 rpcdetailed.rpcdetailed: context=rpcdetailed, port=33997,
 hostName=sqws31.caclab.cac.cpqcorp.net
 1345570915435 rpc.rpc: context=rpc, port=33997, hostName=
 sqws31.caclab.cac.cpqcorp.net
 1345570915435 metricssystem.MetricsSystem: context=metricssystem, hostName=
 sqws31.caclab.cac.cpqcorp.net, num_sources=5, num_sinks=1,
 sink.file.latency_num_ops=1, sink.file.latency_avg_time=4.0,
 snapshot_num_ops=11, snapshot_avg_time=0.16669,
 snapshot_stdev_time=0.408248290463863, snapshot_imin_time=0.0,
 snapshot_imax_time=1.0, snapshot_min_time=0.0, snapshot_max_time=1.0,
 publish_num_ops=1, publish_avg_time=0.0, publish_stdev_time=0.0,
 publish_imin_time=0.0, publish_imax_time=1.401298464324817E-45,
 publish_min_time=0.0, publish_max_time=1.401298464324817E-45,
 dropped_pub_all=0
 1345570925435 ugi.ugi: context=ugi, hostName=sqws31.caclab.cac.cpqcorp.net
 1345570925435 jvm.metrics: context=jvm, processName=TaskTracker,
 sessionId=, hostName=sqws31.caclab.cac.cpqcorp.net,
 memNonHeapUsedM=13.002403, memNonHeapCommittedM=18.25,
 memHeapUsedM=11.503555, memHeapCommittedM=61.375, gcCount=2,
 gcTimeMillis=12, threadsNew=0, threadsRunnable=9, threadsBlocked=0,
 threadsWaiting=13, threadsTimedWaiting=7, threadsTerminated=0, logFatal=0,
 logError=0, logWarn=0, logInfo=3
 1345570925435 mapred.tasktracker: context=mapred, sessionId=, hostName=
 sqws31.caclab.cac.cpqcorp.net, maps_running=0, reduces_running=0,
 mapTaskSlots=2, reduceTaskSlots=2
 1345570925435 rpcdetailed.rpcdetailed: context=rpcdetailed, port=33997,
 hostName=sqws31.caclab.cac.cpqcorp.net
 1345570925435 rpc.rpc: context=rpc, port=33997, 

no output written to HDFS

2012-08-29 Thread Periya.Data
Hi All,
   My Hadoop streaming job (in Python) runs to completion (both map and
reduce says 100% complete). But, when I look at the output directory in
HDFS, the part files are empty. I do not know what might be causing this
behavior. I understand that the percentages represent the records that have
been read in (not processed).

The following are some of the logs. The detailed logs from Cloudera Manager
says that there were no Map Outputs...which is interesting. Any suggestions?


12/08/30 03:27:14 INFO streaming.StreamJob: To kill this job, run:
12/08/30 03:27:14 INFO streaming.StreamJob: /usr/lib/hadoop-0.20/bin/hadoop
job  -Dmapred.job.tracker=x.yyy.com:8021 -kill job_201208232245_3182
12/08/30 03:27:14 INFO streaming.StreamJob: Tracking URL:
http://xx..com:60030/jobdetails.jsp?jobid=job_201208232245_3182
12/08/30 03:27:15 INFO streaming.StreamJob:  map 0%  reduce 0%
12/08/30 03:27:20 INFO streaming.StreamJob:  map 33%  reduce 0%
12/08/30 03:27:23 INFO streaming.StreamJob:  map 67%  reduce 0%
12/08/30 03:27:29 INFO streaming.StreamJob:  map 100%  reduce 0%
12/08/30 03:27:33 INFO streaming.StreamJob:  map 100%  reduce 100%
12/08/30 03:27:35 INFO streaming.StreamJob: Job complete:
job_201208232245_3182
12/08/30 03:27:35 INFO streaming.StreamJob: Output: /user/GHU
Thu Aug 30 03:27:24 GMT 2012
*** END
bash-3.2$
bash-3.2$ hadoop fs -ls /user/ghu/
Found 5 items
-rw-r--r--   3 ghu hadoop  0 2012-08-30 03:27 /user/GHU/_SUCCESS
drwxrwxrwx   - ghu hadoop  0 2012-08-30 03:27 /user/GHU/_logs
-rw-r--r--   3 ghu hadoop  0 2012-08-30 03:27 /user/GHU/part-0
-rw-r--r--   3 ghu hadoop  0 2012-08-30 03:27 /user/GHU/part-1
-rw-r--r--   3 ghu hadoop  0 2012-08-30 03:27 /user/GHU/part-2
bash-3.2$



Metadata Status Succeeded  Type MapReduce  Id job_201208232245_3182
Name CaidMatch
 User srisrini  Mapper class PipeMapper  Reducer class
 Scheduler pool name default  Job input directory
hdfs://x.yyy.txt,hdfs://..com/user/GHUcaidlist.txt  Job output
directory hdfs://..com/user/GHU/  Timing
Duration 20.977s  Submit time Wed, 29 Aug 2012 08:27 PM  Start time Wed, 29
Aug 2012 08:27 PM  Finish time Wed, 29 Aug 2012 08:27 PM






 Progress and Scheduling Map Progress
100.0%
 Reduce Progress
100.0%
 Launched maps 4  Data-local maps 3  Rack-local maps 1  Other local maps
 Desired maps 3  Launched reducers
 Desired reducers 0  Fairscheduler running tasks
 Fairscheduler minimum share
 Fairscheduler demand
 Current Resource Usage Current User CPUs 0  Current System CPUs 0  Resident
memory 0 B  Running maps 0  Running reducers 0  Aggregate Resource Usage
and Counters User CPU 0s  System CPU 0s  Map Slot Time 12.135s  Reduce slot
time 0s  Cumulative disk reads
 Cumulative disk writes 155.0 KiB  Cumulative HDFS reads 3.6 KiB  Cumulative
HDFS writes
 Map input bytes 2.5 KiB  Map input records 45  Map output records 0  Reducer
input groups
 Reducer input records
 Reducer output records
 Reducer shuffle bytes
 Spilled records


Re: no output written to HDFS

2012-08-29 Thread Bertrand Dechoux
Do you observe the same thing when running without Hadoop? (cat, map, sort
and then reduce)

Could you provide the counters of your job? You should be able to get them
using the job tracker interface.

The most probable answer without more information would be that your
reducer do not output any key,values.

Regards

Bertrand



On Thu, Aug 30, 2012 at 5:52 AM, Periya.Data periya.d...@gmail.com wrote:

 Hi All,
My Hadoop streaming job (in Python) runs to completion (both map and
 reduce says 100% complete). But, when I look at the output directory in
 HDFS, the part files are empty. I do not know what might be causing this
 behavior. I understand that the percentages represent the records that have
 been read in (not processed).

 The following are some of the logs. The detailed logs from Cloudera Manager
 says that there were no Map Outputs...which is interesting. Any
 suggestions?


 12/08/30 03:27:14 INFO streaming.StreamJob: To kill this job, run:
 12/08/30 03:27:14 INFO streaming.StreamJob: /usr/lib/hadoop-0.20/bin/hadoop
 job  -Dmapred.job.tracker=x.yyy.com:8021 -kill job_201208232245_3182
 12/08/30 03:27:14 INFO streaming.StreamJob: Tracking URL:
 http://xx..com:60030/jobdetails.jsp?jobid=job_201208232245_3182
 12/08/30 03:27:15 INFO streaming.StreamJob:  map 0%  reduce 0%
 12/08/30 03:27:20 INFO streaming.StreamJob:  map 33%  reduce 0%
 12/08/30 03:27:23 INFO streaming.StreamJob:  map 67%  reduce 0%
 12/08/30 03:27:29 INFO streaming.StreamJob:  map 100%  reduce 0%
 12/08/30 03:27:33 INFO streaming.StreamJob:  map 100%  reduce 100%
 12/08/30 03:27:35 INFO streaming.StreamJob: Job complete:
 job_201208232245_3182
 12/08/30 03:27:35 INFO streaming.StreamJob: Output: /user/GHU
 Thu Aug 30 03:27:24 GMT 2012
 *** END
 bash-3.2$
 bash-3.2$ hadoop fs -ls /user/ghu/
 Found 5 items
 -rw-r--r--   3 ghu hadoop  0 2012-08-30 03:27 /user/GHU/_SUCCESS
 drwxrwxrwx   - ghu hadoop  0 2012-08-30 03:27 /user/GHU/_logs
 -rw-r--r--   3 ghu hadoop  0 2012-08-30 03:27 /user/GHU/part-0
 -rw-r--r--   3 ghu hadoop  0 2012-08-30 03:27 /user/GHU/part-1
 -rw-r--r--   3 ghu hadoop  0 2012-08-30 03:27 /user/GHU/part-2
 bash-3.2$

 


 Metadata Status Succeeded  Type MapReduce  Id job_201208232245_3182
 Name CaidMatch
  User srisrini  Mapper class PipeMapper  Reducer class
  Scheduler pool name default  Job input directory
 hdfs://x.yyy.txt,hdfs://..com/user/GHUcaidlist.txt  Job output
 directory hdfs://..com/user/GHU/  Timing
 Duration 20.977s  Submit time Wed, 29 Aug 2012 08:27 PM  Start time Wed, 29
 Aug 2012 08:27 PM  Finish time Wed, 29 Aug 2012 08:27 PM






  Progress and Scheduling Map Progress
 100.0%
  Reduce Progress
 100.0%
  Launched maps 4  Data-local maps 3  Rack-local maps 1  Other local maps
  Desired maps 3  Launched reducers
  Desired reducers 0  Fairscheduler running tasks
  Fairscheduler minimum share
  Fairscheduler demand
  Current Resource Usage Current User CPUs 0  Current System CPUs 0
  Resident
 memory 0 B  Running maps 0  Running reducers 0  Aggregate Resource Usage
 and Counters User CPU 0s  System CPU 0s  Map Slot Time 12.135s  Reduce slot
 time 0s  Cumulative disk reads
  Cumulative disk writes 155.0 KiB  Cumulative HDFS reads 3.6 KiB
  Cumulative
 HDFS writes
  Map input bytes 2.5 KiB  Map input records 45  Map output records 0
  Reducer
 input groups
  Reducer input records
  Reducer output records
  Reducer shuffle bytes
  Spilled records




-- 
Bertrand Dechoux


Minimun Input Spit Size for a map job

2012-08-29 Thread cat mys
Hello,I 'm currently developing a MapReduce application, and i want my Map job 
to takemore data than a single line as the default configuration does (eg 64kb 
input split size).How can i change the input split size for the Map job? Are 
there any configuration files that i have to edit or a class'field that needs 
to be changed?I tried to change the appropriate field in FileIputFormat class 
but nothing changed.ThanksC.
  

Re: example usage of s3 file system

2012-08-29 Thread Håvard Wahl Kongsgård
see also

http://wiki.apache.org/hadoop/AmazonS3

On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
chris_j_coll...@yahoo.com wrote:
 Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my 
 tinkering I am not having a great deal of success.  I am particularly 
 interested in the ability to mimic a directory structure (since s3 native 
 doesnt do it).

 Can anyone point me to some good example usage of Hadoop FileSystem with s3?

 I created a few directories using transit and AWS S3 console for test.  Doing 
 a liststatus of the bucket returns a FileStatus object of the directory 
 created but if I try to do a liststatus of that path I am getting a 404:

 org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: 
 Request Error. HEAD '/' on Host 

 Probably not the best list to look for help, any clues appreciated.

 C



-- 
Håvard Wahl Kongsgård
Faculty of Medicine 
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/


Custom InputFormat errer

2012-08-29 Thread Chen He
Hi guys

I met a interesting problem when I implement my own custom InputFormat
which extends the FileInputFormat.(I rewrite the RecordReader class but not
the InputSplit class)

My recordreader will take following format as a basic record: (my
recordreader extends the LineRecordReader. It returns a record if it meets
#Trailer# and contains #Header#. I only have one input file that is
composed of many of following basic record)

#Header#
.(many lines, may be 0 lines or 1000 lines, it varies)
#Trailer#

Everything works fine if above basic input unit in a file is integer times
of mapper. For example, I use 2 mappers and there are two basic records in
my input file. Or I use 3 mappers and there are 6 basic units in the input
file.

However, if I use 4 mappers but there are 3 basic units in the input
file(not integer times). The final output is incorrect. The Map Input
Bytes in the job counter is also less than the input file size. How can I
fix it? Do I need to rewrite the inputSplit?

Any reply will be appreciated!

Regards!

Chen


Re: example usage of s3 file system

2012-08-29 Thread Chris Collins
Thanks Haavard, I am aware of that page but I am not sure why you are pointing 
me to it.  This really looks like a bug where Jets3tNativeFileSystemStore is 
parsing a response from jets3t.  its looking for ResponseCode=404 but actually 
getting ResponseCode: 404.  I dont see how it ever worked looking back through 
the versions of release code.

I took a copy of the s3native package and made a fix and seems to get around 
the issue.

I have reported it to the issues email alias, I will see if there is actually 
any interest in this problem.

Cheers

C
On Aug 29, 2012, at 12:11 AM, Håvard Wahl Kongsgård 
haavard.kongsga...@gmail.com wrote:

 see also
 
 http://wiki.apache.org/hadoop/AmazonS3
 
 On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
 chris_j_coll...@yahoo.com wrote:
 Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my 
 tinkering I am not having a great deal of success.  I am particularly 
 interested in the ability to mimic a directory structure (since s3 native 
 doesnt do it).
 
 Can anyone point me to some good example usage of Hadoop FileSystem with s3?
 
 I created a few directories using transit and AWS S3 console for test.  
 Doing a liststatus of the bucket returns a FileStatus object of the 
 directory created but if I try to do a liststatus of that path I am getting 
 a 404:
 
 org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: 
 Request Error. HEAD '/' on Host 
 
 Probably not the best list to look for help, any clues appreciated.
 
 C
 
 
 
 -- 
 Håvard Wahl Kongsgård
 Faculty of Medicine 
 Department of Mathematical Sciences
 NTNU
 
 http://havard.security-review.net/



Re: HBase and MapReduce data locality

2012-08-29 Thread N Keywal
Inline. Just a set of you're right :-).
It's documented here:
http://hbase.apache.org/book.html#regions.arch.locality

On Wed, Aug 29, 2012 at 8:06 AM, Robert Dyer rd...@iastate.edu wrote:

 Ok but does that imply that only 1 of your compute nodes is promised
 to have all of the data for any given row?  The blocks will replicate,
 but they don't necessarily all replicate to the same nodes right?


Right.


 So if I have say 2 column families (cf1, cf2) and there is 2 physical
 files on the HDFS for those (per region) then those files are created
 on one datanode (dn1) which will have all blocks local to that node.


Yes. Nit: datanodes don't see files, only blocks. But the logic remains
the same.


 Once it replicates those blocks 2 more times by default, isn't it
 possible the blocks for cf1 will go to dn2, dn3 while the blocks for
 cf2 goes to dn4, dn5?


Yes, it's possible (and even likely).


Re: Custom InputFormat errer

2012-08-29 Thread Harsh J
Hi Chen,

Does your record reader and mapper handle the case where one map split
may not exactly get the whole record? Your case is not very different
from the newlines logic presented here:
http://wiki.apache.org/hadoop/HadoopMapReduce

On Wed, Aug 29, 2012 at 11:13 AM, Chen He airb...@gmail.com wrote:
 Hi guys

 I met a interesting problem when I implement my own custom InputFormat which
 extends the FileInputFormat.(I rewrite the RecordReader class but not the
 InputSplit class)

 My recordreader will take following format as a basic record: (my
 recordreader extends the LineRecordReader. It returns a record if it meets
 #Trailer# and contains #Header#. I only have one input file that is composed
 of many of following basic record)

 #Header#
 .(many lines, may be 0 lines or 1000 lines, it varies)
 #Trailer#

 Everything works fine if above basic input unit in a file is integer times
 of mapper. For example, I use 2 mappers and there are two basic records in
 my input file. Or I use 3 mappers and there are 6 basic units in the input
 file.

 However, if I use 4 mappers but there are 3 basic units in the input
 file(not integer times). The final output is incorrect. The Map Input
 Bytes in the job counter is also less than the input file size. How can I
 fix it? Do I need to rewrite the inputSplit?

 Any reply will be appreciated!

 Regards!

 Chen



-- 
Harsh J


Re: MRBench Maps strange behaviour

2012-08-29 Thread Bejoy KS
Hi Gaurav

You can get the information on the num of map tasks in the job from the JT web 
UI itself.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-Original Message-
From: Gaurav Dasgupta gdsay...@gmail.com
Date: Wed, 29 Aug 2012 13:14:11 
To: user@hadoop.apache.org
Reply-To: user@hadoop.apache.org
Subject: Re: MRBench Maps strange behaviour

Hi Hemanth,

Thanks for the reply.
Can you tell me how can I calculate or ensure from the counters what should
be the exact number of Maps?
Thanks,
Gaurav Dasgupta
On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala yhema...@gmail.comwrote:

 Hi,

 The number of maps specified to any map reduce program (including
 those part of MRBench) is generally only a hint, and the actual number
 of maps will be influenced in typical cases by the amount of data
 being processed. You can take a look at this wiki link to understand
 more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces

 In the examples below, since the data you've generated is different,
 the number of mappers are different. To be able to judge your
 benchmark results, you'd need to benchmark against the same data (or
 at least same type of type - i.e. size and type).

 The number of maps printed at the end is straight from the input
 specified and doesn't reflect what the job actually ran with. The
 information from the counters is the right one.

 Thanks
 Hemanth

 On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta gdsay...@gmail.com
 wrote:
  Hi All,
 
  I executed the MRBench program from hadoop-test.jar in my 12 node
 CDH3
  cluster. After executing, I had some strange observations regarding the
  number of Maps it ran.
 
  First I ran the command:
  hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
 200
  -reduces 200 -inputLines 1024 -inputType random
  And I could see that the actual number of Maps it ran was 201 (for all
 the 3
  runs) instead of 200 (Though the end report displays the launched to be
  200). Here is the console report:
 
 
  12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
 job_201208230144_0035
 
  12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
 
  12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
 
  12/08/28 04:34:35 INFO mapred.JobClient: Launched reduce tasks=200
 
  12/08/28 04:34:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=617209
 
  12/08/28 04:34:35 INFO mapred.JobClient: Total time spent by all
 reduces
  waiting after reserving slots (ms)=0
 
  12/08/28 04:34:35 INFO mapred.JobClient: Total time spent by all maps
  waiting after reserving slots (ms)=0
 
  12/08/28 04:34:35 INFO mapred.JobClient: Rack-local map tasks=137
 
  12/08/28 04:34:35 INFO mapred.JobClient: Launched map tasks=201
 
  12/08/28 04:34:35 INFO mapred.JobClient: Data-local map tasks=64
 
  12/08/28 04:34:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=1756882
 
 
 
  Again, I ran the MRBench for just 10 Maps and 10 Reduces:
 
  hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
 -reduces 10
 
 
 
  This time the actual number of Maps were only 2 and again the end report
  displays Maps Lauched to be 10. The console output:
 
 
 
  12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
 job_201208230144_0040
  12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
  12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
  12/08/28 05:05:35 INFO mapred.JobClient: Launched reduce tasks=20
  12/08/28 05:05:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6648
  12/08/28 05:05:35 INFO mapred.JobClient: Total time spent by all
 reduces
  waiting after reserving slots (ms)=0
  12/08/28 05:05:35 INFO mapred.JobClient: Total time spent by all maps
  waiting after reserving slots (ms)=0
  12/08/28 05:05:35 INFO mapred.JobClient: Launched map tasks=2
  12/08/28 05:05:35 INFO mapred.JobClient: Data-local map tasks=2
  12/08/28 05:05:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=163257
  12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
  12/08/28 05:05:35 INFO mapred.JobClient: FILE_BYTES_READ=407
  12/08/28 05:05:35 INFO mapred.JobClient: HDFS_BYTES_READ=258
  12/08/28 05:05:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1072596
  12/08/28 05:05:35 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=3
  12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
  12/08/28 05:05:35 INFO mapred.JobClient: Map input records=1
  12/08/28 05:05:35 INFO mapred.JobClient: Reduce shuffle bytes=647
  12/08/28 05:05:35 INFO mapred.JobClient: Spilled Records=2
  12/08/28 05:05:35 INFO mapred.JobClient: Map output bytes=5
  12/08/28 05:05:35 INFO mapred.JobClient: CPU time spent (ms)=17070
  12/08/28 05:05:35 INFO mapred.JobClient: Total committed heap usage
  (bytes)=6218842112
  12/08/28 05:05:35 INFO mapred.JobClient: Map input bytes=2
  12/08/28 05:05:35 INFO mapred.JobClient: Combine input records=0
  12/08/28 05:05:35 INFO 

Re: Job does not run with EOFException

2012-08-29 Thread Caetano Sauer
I am able to browse the web UI and telnet/netcat the tasktracker host and
port, so the connection is being established. Is there any way I can
confirm whether it is really some kind of version conflict? The EOF when
doing readInt() seems like a protocol incompatibility.

By the way, the tastracker is killed every time this happens, and I am left
with some kind of JVM dump in a hs_err_*.log file. The tasktracker logs
show nothing.

Some facts that may help find the problem are:
1) I am not running with a hadoop user as it is usually suggested in
tutorials
2) There is an older version of hadoop which I am absolutely sure is not
running, and even so, it is configured on different ports.

Thank you for your help and regards,
Caetano Sauer

On Wed, Aug 29, 2012 at 10:08 AM, Hemanth Yamijala yhema...@gmail.comwrote:

 Are you able to browse the web UI for the jobtracker. If not
 configured separately, it should be at hostname:50030 ? It would also
 help if you can telnet to the jobtracker server port and see if it is
 able to connect.

 Thanks
 hemanth

 On Tue, Aug 28, 2012 at 7:23 PM, Caetano Sauer caetanosa...@gmail.com
 wrote:
  The host on top of the stack trace contains the host and port I defined
 on
  mapred.job.tracker in mapred-site.xml
 
  Other than that, I don't know how to verify what you asked me. Any tips?
 
 
  On Tue, Aug 28, 2012 at 3:47 PM, Harsh J ha...@cloudera.com wrote:
 
  Are you sure you're reaching the right port for your JobTrcker?
 
  On Tue, Aug 28, 2012 at 7:15 PM, Caetano Sauer caetanosa...@gmail.com
  wrote:
   Hello,
  
   I am getting the following error when trying to execute a hadoop job
 on
   a
   5-node cluster:
  
   Caused by: java.io.IOException: Call to *** failed on local exception:
   java.io.EOFException
   at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103)
   at org.apache.hadoop.ipc.Client.call(Client.java:1071)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
   at org.apache.hadoop.mapred.$Proxy2.submitJob(Unknown Source)
   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:921)
   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at
  
  
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
   at
  
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
   ... 9 more
   Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at
  
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:800)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:745)
  
   (My jobtracker host was substituted by ***)
  
   After 3 hours of searching, everything points to an incompatibility
   between
   the hadoop versions of the client and the server, but this is not the
   case,
   since I can run the job on a pseudo-distributed setup on a different
   machine. Both are running the exact same version (same svn revision
 and
   source checksum).
  
   Does anyone have a solution or a suggestion on how to find more debug
   information?
  
   Thank you in advance,
   Caetano Sauer
 
 
 
  --
  Harsh J
 
 



Re: MRBench Maps strange behaviour

2012-08-29 Thread praveenesh kumar
Then the question arises how MRBench is using the parameters :
According to the mail he send... he is running MRBench with the following
parameter:
*
hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10
*

I guess he is assuming the MRbench to launch 10 mappers and 10 reducers.
But he is getting some different results which are visible in the counters
and we can use all our map and input-split logics to justify the counter
outputs.

The question arises here -- how can we use MRBench -- what it provides you
? How can we control it to run with different parameters to do some
benchmarking ? Can someone explain how to use MRBench and what it exactly
does.

Regards,
Praveenesh

On Wed, Aug 29, 2012 at 3:31 AM, Hemanth Yamijala yhema...@gmail.comwrote:

 Assume you are asking about what is the exact number of maps launched.
 If yes, then the output of the MRBench run is printing the counter
 Launched map tasks. That is the exact value of maps launched.

 Thanks
 Hemanth

 On Wed, Aug 29, 2012 at 1:14 PM, Gaurav Dasgupta gdsay...@gmail.com
 wrote:
  Hi Hemanth,
 
  Thanks for the reply.
  Can you tell me how can I calculate or ensure from the counters what
 should
  be the exact number of Maps?
  Thanks,
  Gaurav Dasgupta
  On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala yhema...@gmail.com
  wrote:
 
  Hi,
 
  The number of maps specified to any map reduce program (including
  those part of MRBench) is generally only a hint, and the actual number
  of maps will be influenced in typical cases by the amount of data
  being processed. You can take a look at this wiki link to understand
  more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
 
  In the examples below, since the data you've generated is different,
  the number of mappers are different. To be able to judge your
  benchmark results, you'd need to benchmark against the same data (or
  at least same type of type - i.e. size and type).
 
  The number of maps printed at the end is straight from the input
  specified and doesn't reflect what the job actually ran with. The
  information from the counters is the right one.
 
  Thanks
  Hemanth
 
  On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta gdsay...@gmail.com
  wrote:
   Hi All,
  
   I executed the MRBench program from hadoop-test.jar in my 12 node
   CDH3
   cluster. After executing, I had some strange observations regarding
 the
   number of Maps it ran.
  
   First I ran the command:
   hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3
 -maps
   200
   -reduces 200 -inputLines 1024 -inputType random
   And I could see that the actual number of Maps it ran was 201 (for all
   the 3
   runs) instead of 200 (Though the end report displays the launched to
 be
   200). Here is the console report:
  
  
   12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
   job_201208230144_0035
  
   12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
  
   12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
  
   12/08/28 04:34:35 INFO mapred.JobClient: Launched reduce tasks=200
  
   12/08/28 04:34:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=617209
  
   12/08/28 04:34:35 INFO mapred.JobClient: Total time spent by all
   reduces
   waiting after reserving slots (ms)=0
  
   12/08/28 04:34:35 INFO mapred.JobClient: Total time spent by all
   maps
   waiting after reserving slots (ms)=0
  
   12/08/28 04:34:35 INFO mapred.JobClient: Rack-local map tasks=137
  
   12/08/28 04:34:35 INFO mapred.JobClient: Launched map tasks=201
  
   12/08/28 04:34:35 INFO mapred.JobClient: Data-local map tasks=64
  
   12/08/28 04:34:35 INFO mapred.JobClient:
   SLOTS_MILLIS_REDUCES=1756882
  
  
  
   Again, I ran the MRBench for just 10 Maps and 10 Reduces:
  
   hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
   -reduces 10
  
  
  
   This time the actual number of Maps were only 2 and again the end
 report
   displays Maps Lauched to be 10. The console output:
  
  
  
   12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
   job_201208230144_0040
   12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
   12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
   12/08/28 05:05:35 INFO mapred.JobClient: Launched reduce tasks=20
   12/08/28 05:05:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6648
   12/08/28 05:05:35 INFO mapred.JobClient: Total time spent by all
   reduces
   waiting after reserving slots (ms)=0
   12/08/28 05:05:35 INFO mapred.JobClient: Total time spent by all
   maps
   waiting after reserving slots (ms)=0
   12/08/28 05:05:35 INFO mapred.JobClient: Launched map tasks=2
   12/08/28 05:05:35 INFO mapred.JobClient: Data-local map tasks=2
   12/08/28 05:05:35 INFO mapred.JobClient:
 SLOTS_MILLIS_REDUCES=163257
   12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
   12/08/28 05:05:35 INFO mapred.JobClient: FILE_BYTES_READ=407
   12/08/28 05:05:35 INFO mapred.JobClient: 

RE: hadoop 1.0.3 equivalent of MultipleTextOutputFormat

2012-08-29 Thread Tony Burton
Or, is it possible to request that the functionality provided by 
MultipleTextOutputFormat be supported by the new Hadoop API?

Thanks,

Tony

-Original Message-
From: Tony Burton [mailto:tbur...@sportingindex.com] 
Sent: 28 August 2012 14:37
To: user@hadoop.apache.org
Subject: RE: hadoop 1.0.3 equivalent of MultipleTextOutputFormat

Hi Harsh

Thanks for the reply - my understanding is that with MultipleOutputs I can 
write differently named files into the same target directory. With 
MultipleTextOutputFormat I was able to override the target directory name to 
perform the segmentation, by overriding generateFileNameForKeyValue().

Does the 1.0.3 MultipleOutputs give me the ability to alter the target 
directory name as well as the file name?

Thanks,

Tony



-Original Message-
From: Harsh J [mailto:ha...@cloudera.com] 
Sent: 28 August 2012 13:44
To: user@hadoop.apache.org
Subject: Re: hadoop 1.0.3 equivalent of MultipleTextOutputFormat

The Multiple*OutputFormat have been deprecated in favor of the generic
MultipleOutputs API. Would using that instead work for you?

On Tue, Aug 28, 2012 at 6:05 PM, Tony Burton tbur...@sportingindex.com wrote:
 Hi,

 I've seen that org.apache.hadoop.mapred.lib.MultipleTextOutputFormat is good 
 for writing results into (for example) different directories created on the 
 fly. However, now I'm implementing a MapReduce job using Hadoop 1.0.3, I see 
 that the new API no longer supports MultipleTextOutputFormat. Is there an 
 equivalent that I can use, or will it be supported in a future release?

 Thanks,

 Tony


 **
 This email and any attachments are confidential, protected by copyright and 
 may be legally privileged.  If you are not the intended recipient, then the 
 dissemination or copying of this email is prohibited. If you have received 
 this in error, please notify the sender by replying by email and then delete 
 the email completely from your system.  Neither Sporting Index nor the sender 
 accepts responsibility for any virus, or any other defect which might affect 
 any computer or IT system into which the email is received and/or opened.  It 
 is the responsibility of the recipient to scan the email and no 
 responsibility is accepted for any loss or damage arising in any way from 
 receipt or use of this email.  Sporting Index Ltd is a company registered in 
 England and Wales with company number 2636842, whose registered office is at 
 Gateway House, Milverton Street, London, SE11 4AP.  Sporting Index Ltd is 
 authorised and regulated by the UK Financial Services Authority (reg. no. 
 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).  Any 
 financial promotion contained herein has been issued
 and approved by Sporting Index Ltd.

 Outbound email has been scanned for viruses and SPAM




-- 
Harsh J
www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM 
**
This email and any attachments are confidential, protected by copyright and may 
be legally privileged.  If you are not the intended recipient, then the 
dissemination or copying of this email is prohibited. If you have received this 
in error, please notify the sender by replying by email and then delete the 
email completely from your system.  Neither Sporting Index nor the sender 
accepts responsibility for any virus, or any other defect which might affect 
any computer or IT system into which the email is received and/or opened.  It 
is the responsibility of the recipient to scan the email and no responsibility 
is accepted for any loss or damage arising in any way from receipt or use of 
this email.  Sporting Index Ltd is a company registered in England and Wales 
with company number 2636842, whose registered office is at Gateway House, 
Milverton Street, London, SE11 4AP.  Sporting Index Ltd is authorised and 
regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling 
Commission (reg. no. 000-027343-R-308898-001).  Any financial promotion 
contained herein has been issued 
and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM
www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM 
**
This email and any attachments are confidential, protected by copyright and may 
be legally privileged.  If you are not the intended recipient, then the 
dissemination or copying of this email is prohibited. If you have received this 
in error, please notify the sender by replying by email and then delete the 
email completely from your system.  Neither Sporting Index nor the sender 
accepts responsibility for any virus, or any other defect which might affect 
any computer or IT system into which the email is received and/or opened.  It 
is the responsibility of the recipient to scan the 

Re: Controlling on which node a reducer will be executed

2012-08-29 Thread Eduard Skaley
Thank you for your reply Harsh J.
I really need this feature, because this will boost up execution of my
use case a lot.
Could you give me a hint where to look in the sources to get a good
starting point for implementation?
Which classes are involved?

 Hi Eduard,

 This isn't impossible, just unavailable at the moment. See
 https://issues.apache.org/jira/browse/MAPREDUCE-199 if you want to
 take a shot at implementing this. HBase would love to have this, I
 think.

 On Mon, Aug 27, 2012 at 10:41 PM, Eduard Skaley e.v.ska...@gmail.com wrote:
 Hi,

 i have a question concerning the execution of reducers.
 To use effectively the data locality of blocks in my use case i want to
 control on which node a reducer will be executed.

 In my scenario i have a chain of map-reduce jobs where each job will be
 executed by exactly N reducers.
 I want to achieve that for each job reducer 1 is executed on node 1,
 reducer 2 on node 2 and so on.

 Is it possible with a customization of the BlockPlacementPolicy in MRv2 ?
 Or is there some other way, maybe some indirect ? Or is it impossible ?

 Thanks for your help.
 Eduard





Re: Controlling on which node a reducer will be executed

2012-08-29 Thread Eduard Skaley
Thank you for your reply Harsh J.
I really need this feature, because this will boost up execution of my use case 
a lot.
Could you give me a hint where to look in the sources to get a good starting 
point for implementation?
Which classes are involved?

 Hi Eduard,

 This isn't impossible, just unavailable at the moment. See
 https://issues.apache.org/jira/browse/MAPREDUCE-199 if you want to
 take a shot at implementing this. HBase would love to have this, I
 think.

 On Mon, Aug 27, 2012 at 10:41 PM, Eduard Skaley e.v.ska...@gmail.com wrote:
 Hi,

 i have a question concerning the execution of reducers.
 To use effectively the data locality of blocks in my use case i want to
 control on which node a reducer will be executed.

 In my scenario i have a chain of map-reduce jobs where each job will be
 executed by exactly N reducers.
 I want to achieve that for each job reducer 1 is executed on node 1,
 reducer 2 on node 2 and so on.

 Is it possible with a customization of the BlockPlacementPolicy in MRv2 ?
 Or is there some other way, maybe some indirect ? Or is it impossible ?

 Thanks for your help.
 Eduard





Metrics ..

2012-08-29 Thread Mark Olimpiati
Hi,

  I enabled the metrics.properties to use FileContext, in which jvm
metrics values are written to a file as follows:

jvm.metrics: hostName= localhost, processName=MAP, sessionId=, gcCount=10,
gcTimeMillis=130, logError=0, logFatal=0, logInfo=21, logWarn=0,
memHeapCommittedM=180.1211, memHeapUsedM=102.630875,
memNonHeapCommittedM=23.191406, memNonHeapUsedM=11.828621,
threadsBlocked=0, threadsNew=0, threadsRunnable=2, threadsTerminated=0,
threadsTimedWaiting=3, threadsWaiting=2


Questions:
   - Is this line for a single Map jvm ? as the processName=MAP. If so, why
doesn't it show job-id in sessionId ???
   - Even though I ran maps and reducers tasks, I only got processName=MAP
/SHUFFLE, nothing for reducers why?

Thank you,
Mark


unsubscribe

2012-08-29 Thread Jay





 From: Dan Yi d...@mediosystems.com
To: user@hadoop.apache.org user@hadoop.apache.org 
Sent: Wednesday, August 29, 2012 12:57 PM
Subject: unsubscribe
 



unsubscribe

2012-08-29 Thread Fahd Albinali





unsubscribe

2012-08-29 Thread Ahmed Nagy
unsubscribe


Re: Custom InputFormat errer

2012-08-29 Thread Chen He
Hi Harsh

Thank you for your reply. Do you mean I need to change the FileSplit to
avoid those errors I mentioned happen?

Regards!

Chen

On Wed, Aug 29, 2012 at 2:46 AM, Harsh J ha...@cloudera.com wrote:

 Hi Chen,

 Does your record reader and mapper handle the case where one map split
 may not exactly get the whole record? Your case is not very different
 from the newlines logic presented here:
 http://wiki.apache.org/hadoop/HadoopMapReduce

 On Wed, Aug 29, 2012 at 11:13 AM, Chen He airb...@gmail.com wrote:
  Hi guys
 
  I met a interesting problem when I implement my own custom InputFormat
 which
  extends the FileInputFormat.(I rewrite the RecordReader class but not the
  InputSplit class)
 
  My recordreader will take following format as a basic record: (my
  recordreader extends the LineRecordReader. It returns a record if it
 meets
  #Trailer# and contains #Header#. I only have one input file that is
 composed
  of many of following basic record)
 
  #Header#
  .(many lines, may be 0 lines or 1000 lines, it varies)
  #Trailer#
 
  Everything works fine if above basic input unit in a file is integer
 times
  of mapper. For example, I use 2 mappers and there are two basic records
 in
  my input file. Or I use 3 mappers and there are 6 basic units in the
 input
  file.
 
  However, if I use 4 mappers but there are 3 basic units in the input
  file(not integer times). The final output is incorrect. The Map Input
  Bytes in the job counter is also less than the input file size. How can
 I
  fix it? Do I need to rewrite the inputSplit?
 
  Any reply will be appreciated!
 
  Regards!
 
  Chen



 --
 Harsh J



Re: Delays in worker node jobs

2012-08-29 Thread Vinod Kumar Vavilapalli

Do you know if you have enough job-load on the system? One way to look at this 
is to look for running map/reduce tasks on the JT UI at the same time you are 
looking at the node's cpu usage.

Collecting hadoop metrics via a metrics collection system say ganglia will let 
you match up the timestamps of idleness on the nodes with the job-load at that 
point of time.

HTH,
+vinod

On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:

 Running 1.0.2, in this case on Linux.
 
 I was watching the processes / loads on one TaskTracker instance and
 noticed that it completed it's first 8 map tasks and reported 8 free
 slots (the max for this system). It then waited doing nothing for more
 than 30 seconds before the next batch of work came in and started running.
 
 Likewise it also has relatively long periods with all 8 cores running at
 or near idle. There are no jobs failing or obvious errors in the
 TaskTracker log.
 
 What could be causing this?
 
 Should I increase the number of map jobs to greater than number of cores
 to try and keep it busier?
 
 -Terry



Re: Delays in worker node jobs

2012-08-29 Thread Terry Healy
Thanks guys. Unfortunately I had started the datanode by local command
rather than from start-all.sh, so the related parts of the logs were
lost. I was watching the cpu loads on all 8 cores via gkrellm at the
time and they were definitely quiet. After a few minutes the jobs seemed
to get in sync and it ran under a reasonable load (i.e. all cores mostly
busy, with only brief gaps between tasks) for the rest of the job.

I will attempt to re-create tomorrow with proper logging. I will look
into enabling Hadoop metrics.

-Terry



On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
 Do you know if you have enough job-load on the system? One way to look at 
 this is to look for running map/reduce tasks on the JT UI at the same time 
 you are looking at the node's cpu usage.

 Collecting hadoop metrics via a metrics collection system say ganglia will 
 let you match up the timestamps of idleness on the nodes with the job-load at 
 that point of time.

 HTH,
 +vinod

 On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:

 Running 1.0.2, in this case on Linux.

 I was watching the processes / loads on one TaskTracker instance and
 noticed that it completed it's first 8 map tasks and reported 8 free
 slots (the max for this system). It then waited doing nothing for more
 than 30 seconds before the next batch of work came in and started running.

 Likewise it also has relatively long periods with all 8 cores running at
 or near idle. There are no jobs failing or obvious errors in the
 TaskTracker log.

 What could be causing this?

 Should I increase the number of map jobs to greater than number of cores
 to try and keep it busier?

 -Terry

-- 
Terry Healy / the...@bnl.gov
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973





Re: Custom InputFormat errer

2012-08-29 Thread Harsh J
No, what I mean is that your RecordReader should be able to handle a
case where it may start from middle of a record and hence not be able
to read any record (i.e. return false or whatever right up front).

On Wed, Aug 29, 2012 at 1:27 PM, Chen He airb...@gmail.com wrote:
 Hi Harsh

 Thank you for your reply. Do you mean I need to change the FileSplit to
 avoid those errors I mentioned happen?

 Regards!

 Chen

 On Wed, Aug 29, 2012 at 2:46 AM, Harsh J ha...@cloudera.com wrote:

 Hi Chen,

 Does your record reader and mapper handle the case where one map split
 may not exactly get the whole record? Your case is not very different
 from the newlines logic presented here:
 http://wiki.apache.org/hadoop/HadoopMapReduce

 On Wed, Aug 29, 2012 at 11:13 AM, Chen He airb...@gmail.com wrote:
  Hi guys
 
  I met a interesting problem when I implement my own custom InputFormat
  which
  extends the FileInputFormat.(I rewrite the RecordReader class but not
  the
  InputSplit class)
 
  My recordreader will take following format as a basic record: (my
  recordreader extends the LineRecordReader. It returns a record if it
  meets
  #Trailer# and contains #Header#. I only have one input file that is
  composed
  of many of following basic record)
 
  #Header#
  .(many lines, may be 0 lines or 1000 lines, it varies)
  #Trailer#
 
  Everything works fine if above basic input unit in a file is integer
  times
  of mapper. For example, I use 2 mappers and there are two basic records
  in
  my input file. Or I use 3 mappers and there are 6 basic units in the
  input
  file.
 
  However, if I use 4 mappers but there are 3 basic units in the input
  file(not integer times). The final output is incorrect. The Map Input
  Bytes in the job counter is also less than the input file size. How can
  I
  fix it? Do I need to rewrite the inputSplit?
 
  Any reply will be appreciated!
 
  Regards!
 
  Chen



 --
 Harsh J





-- 
Harsh J


RE: How to unsubscribe (was Re: unsubscribe)

2012-08-29 Thread sathyavageeswaran
I have tried every trick to get self unsubscribed. Yesterday I got a mail
saying you can't unsubscribe once subscribed.

-Original Message-
From: Andy Isaacson [mailto:a...@cloudera.com] 
Sent: 30 August 2012 03:25
To: user@hadoop.apache.org
Cc: Dan Yi; Jay
Subject: How to unsubscribe (was Re: unsubscribe)

Hi folks,

Replying to this thread is not going to get you unsubscribed and will just
annoy everyone else who's subscribed. To unsubscribe please send an email to
user-unsubscr...@hadoop.apache.org from your subscribed address.

For more info please visit
http://hadoop.apache.org/common/mailing_lists.html#Users

Thanks,
-andy

On Wed, Aug 29, 2012 at 1:51 PM, Jay su1...@yahoo.com wrote:

 
 From: Dan Yi d...@mediosystems.com
 To: user@hadoop.apache.org user@hadoop.apache.org
 Sent: Wednesday, August 29, 2012 12:57 PM
 Subject: unsubscribe






-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2197 / Virus Database: 2437/5234 - Release Date: 08/29/12



Re: How to unsubscribe (was Re: unsubscribe)

2012-08-29 Thread Ted Dunning
That was a stupid joke.  It wasn't real advice.

Have you sent email to the specific email address listed?

On Thu, Aug 30, 2012 at 12:35 AM, sathyavageeswaran sat...@morisonmenon.com
 wrote:

 I have tried every trick to get self unsubscribed. Yesterday I got a mail
 saying you can't unsubscribe once subscribed.

 -Original Message-
 From: Andy Isaacson [mailto:a...@cloudera.com]
 Sent: 30 August 2012 03:25
 To: user@hadoop.apache.org
 Cc: Dan Yi; Jay
 Subject: How to unsubscribe (was Re: unsubscribe)

 Hi folks,

 Replying to this thread is not going to get you unsubscribed and will just
 annoy everyone else who's subscribed. To unsubscribe please send an email
 to
 user-unsubscr...@hadoop.apache.org from your subscribed address.

 For more info please visit
 http://hadoop.apache.org/common/mailing_lists.html#Users

 Thanks,
 -andy

 On Wed, Aug 29, 2012 at 1:51 PM, Jay su1...@yahoo.com wrote:
 
  
  From: Dan Yi d...@mediosystems.com
  To: user@hadoop.apache.org user@hadoop.apache.org
  Sent: Wednesday, August 29, 2012 12:57 PM
  Subject: unsubscribe
 
 
 
 
 
 
 -
 No virus found in this message.
 Checked by AVG - www.avg.com
 Version: 2012.0.2197 / Virus Database: 2437/5234 - Release Date: 08/29/12




RE: How to unsubscribe (was Re: unsubscribe)

2012-08-29 Thread sathyavageeswaran
Of course have sent emails to all permutations and combinations of emails
listed with appropriate subject matter. 

 

From: Ted Dunning [mailto:tdunn...@maprtech.com] 
Sent: 30 August 2012 10:12
To: user@hadoop.apache.org
Cc: Dan Yi; Jay
Subject: Re: How to unsubscribe (was Re: unsubscribe)

 

That was a stupid joke.  It wasn't real advice.

 

Have you sent email to the specific email address listed?

On Thu, Aug 30, 2012 at 12:35 AM, sathyavageeswaran
sat...@morisonmenon.com wrote:

I have tried every trick to get self unsubscribed. Yesterday I got a mail
saying you can't unsubscribe once subscribed.

-Original Message-
From: Andy Isaacson [mailto:a...@cloudera.com]
Sent: 30 August 2012 03:25
To: user@hadoop.apache.org
Cc: Dan Yi; Jay
Subject: How to unsubscribe (was Re: unsubscribe)

Hi folks,

Replying to this thread is not going to get you unsubscribed and will just
annoy everyone else who's subscribed. To unsubscribe please send an email to
user-unsubscr...@hadoop.apache.org from your subscribed address.

For more info please visit
http://hadoop.apache.org/common/mailing_lists.html#Users

Thanks,
-andy

On Wed, Aug 29, 2012 at 1:51 PM, Jay su1...@yahoo.com wrote:

 
 From: Dan Yi d...@mediosystems.com
 To: user@hadoop.apache.org user@hadoop.apache.org
 Sent: Wednesday, August 29, 2012 12:57 PM
 Subject: unsubscribe






-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2197 / Virus Database: 2437/5234 - Release Date: 08/29/12

 

  _  

No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2197 / Virus Database: 2437/5234 - Release Date: 08/29/12



Re: How to unsubscribe (was Re: unsubscribe)

2012-08-29 Thread Ted Dunning
Can you say which addresses you sent emails so?

The merging of mailing lists may have left you subscribed to a different
group than you expected.  Thus, your assumptions may not match what is
required.  If you provide a specific list, somebody might be able to help
you.

Also, I was asking about the *exact* email address listed in the one
message before mine, not anything you have typed.  Sorry to not trust that
your all permutations and combinations may not be correct, but any
generation of permutations is based on assumptions and you don't state your
assumptions.  By their nature, we often don't even know our own assumptions
so I would ask the same question even if you said what you thought your
assumptions are.

The reason that I ask this in this slightly abrasive way is that there may
have been a bug in the mailing list merge.  The only way to distinguish
that from a bug in your permutation generation is to inspect the list of
email addresses you used.

It may also be that your email from the mailing list has been forwarded so
that the return address when you send email to the bot isn't one that is
subscribed.

On Thu, Aug 30, 2012 at 12:49 AM, sathyavageeswaran sat...@morisonmenon.com
 wrote:

 Of course have sent emails to all permutations and combinations of emails
 listed with appropriate subject matter. 

 ** **

 *From:* Ted Dunning [mailto:tdunn...@maprtech.com]
 *Sent:* 30 August 2012 10:12
 *To:* user@hadoop.apache.org
 *Cc:* Dan Yi; Jay
 *Subject:* Re: How to unsubscribe (was Re: unsubscribe)

 ** **

 That was a stupid joke.  It wasn't real advice.

 ** **

 Have you sent email to the specific email address listed?

 On Thu, Aug 30, 2012 at 12:35 AM, sathyavageeswaran 
 sat...@morisonmenon.com wrote:

 I have tried every trick to get self unsubscribed. Yesterday I got a mail
 saying you can't unsubscribe once subscribed.

 -Original Message-
 From: Andy Isaacson [mailto:a...@cloudera.com]
 Sent: 30 August 2012 03:25
 To: user@hadoop.apache.org
 Cc: Dan Yi; Jay
 Subject: How to unsubscribe (was Re: unsubscribe)

 Hi folks,

 Replying to this thread is not going to get you unsubscribed and will just
 annoy everyone else who's subscribed. To unsubscribe please send an email
 to
 user-unsubscr...@hadoop.apache.org from your subscribed address.

 For more info please visit
 http://hadoop.apache.org/common/mailing_lists.html#Users

 Thanks,
 -andy

 On Wed, Aug 29, 2012 at 1:51 PM, Jay su1...@yahoo.com wrote:
 
  
  From: Dan Yi d...@mediosystems.com
  To: user@hadoop.apache.org user@hadoop.apache.org
  Sent: Wednesday, August 29, 2012 12:57 PM
  Subject: unsubscribe
 
 
 
 
 
 
 -
 No virus found in this message.
 Checked by AVG - www.avg.com
 Version: 2012.0.2197 / Virus Database: 2437/5234 - Release Date: 08/29/12*
 ***

 ** **
 --

 No virus found in this message.
 Checked by AVG - www.avg.com
 Version: 2012.0.2197 / Virus Database: 2437/5234 - Release Date: 08/29/12*
 ***



RE: How to unsubscribe (was Re: unsubscribe)

2012-08-29 Thread sathyavageeswaran
I sent to user-unsubscr...@hadoop.apache.org in the beginning. 

 

Followed by that I mailed to the links that the automatic mail sends in
vain.

 

Later I thought of all possible permutations that are possible using the 15
alphabets and once character that is in the email address of
user-unsubscribe. While deciding the permutations combinations to ensure
that i don't miss any, I even sent to those permutations of email ID's that
do not have any literal meaning in English language. 

 

In the address the alphabet 'u' repeated 3 times, 'r' repeated 2 times, 's'
repeated 3 times 'e' repeated 2 times 'b' repeated 2 times and rest 1 time.
So i used the classical method of solving such combinations using the
methods given in Higher Algebra by Hall and Knight published first in 1897
to arrive at the all possible permutations using the formula 

 

!16/ (!3 !2 !3 !2 !2 ) 

 

Then using mail merge i sent email to all the email ID's generated. 

 

But no success so far. 

 

I also tried some keywords to block mails but that led to more chaos.

 

Fe important email ID's of my boss, wife, etc. were also blocked leading to
consequences that have potential to affect me. 

 

Hence have decided to live with mails from u...@hadoop.org and enjoy!!! till
all users of hadoop start blocking my email ID and organizers of hadoop also
block me specially io their mailing list. 

 

 

 

user-unsubscr...@hadoop.apache.org

 

From: Ted Dunning [mailto:tdunn...@maprtech.com] 
Sent: 30 August 2012 10:28
To: user@hadoop.apache.org
Cc: Dan Yi; Jay
Subject: Re: How to unsubscribe (was Re: unsubscribe)

 

Can you say which addresses you sent emails so?

 

The merging of mailing lists may have left you subscribed to a different
group than you expected.  Thus, your assumptions may not match what is
required.  If you provide a specific list, somebody might be able to help
you.

 

Also, I was asking about the *exact* email address listed in the one message
before mine, not anything you have typed.  Sorry to not trust that your all
permutations and combinations may not be correct, but any generation of
permutations is based on assumptions and you don't state your assumptions.
By their nature, we often don't even know our own assumptions so I would ask
the same question even if you said what you thought your assumptions are.

 

The reason that I ask this in this slightly abrasive way is that there may
have been a bug in the mailing list merge.  The only way to distinguish that
from a bug in your permutation generation is to inspect the list of email
addresses you used.

 

It may also be that your email from the mailing list has been forwarded so
that the return address when you send email to the bot isn't one that is
subscribed.

On Thu, Aug 30, 2012 at 12:49 AM, sathyavageeswaran
sat...@morisonmenon.com wrote:

Of course have sent emails to all permutations and combinations of emails
listed with appropriate subject matter. 

 

From: Ted Dunning [mailto:tdunn...@maprtech.com] 
Sent: 30 August 2012 10:12
To: user@hadoop.apache.org
Cc: Dan Yi; Jay
Subject: Re: How to unsubscribe (was Re: unsubscribe)

 

That was a stupid joke.  It wasn't real advice.

 

Have you sent email to the specific email address listed?

On Thu, Aug 30, 2012 at 12:35 AM, sathyavageeswaran
sat...@morisonmenon.com wrote:

I have tried every trick to get self unsubscribed. Yesterday I got a mail
saying you can't unsubscribe once subscribed.

-Original Message-
From: Andy Isaacson [mailto:a...@cloudera.com]
Sent: 30 August 2012 03:25
To: user@hadoop.apache.org
Cc: Dan Yi; Jay
Subject: How to unsubscribe (was Re: unsubscribe)

Hi folks,

Replying to this thread is not going to get you unsubscribed and will just
annoy everyone else who's subscribed. To unsubscribe please send an email to
user-unsubscr...@hadoop.apache.org from your subscribed address.

For more info please visit
http://hadoop.apache.org/common/mailing_lists.html#Users

Thanks,
-andy

On Wed, Aug 29, 2012 at 1:51 PM, Jay su1...@yahoo.com wrote:

 
 From: Dan Yi d...@mediosystems.com
 To: user@hadoop.apache.org user@hadoop.apache.org
 Sent: Wednesday, August 29, 2012 12:57 PM
 Subject: unsubscribe






-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2197 / Virus Database: 2437/5234 - Release Date: 08/29/12

 

  _  

No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2197 / Virus Database: 2437/5234 - Release Date: 08/29/12

 

  _  

No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2197 / Virus Database: 2437/5234 - Release Date: 08/29/12



Re: How to unsubscribe (was Re: unsubscribe)

2012-08-29 Thread Ted Dunning
Nicely done.

This seems to indicate that there might be a bug in the mailing list
management configuration.  Not sure how that would happen with exmlm, but
it seems more plausible.

One last question, though, is the email address that you sent the
unsubscribe requests from the same as the mailing list sends to?

Also, you should only need to reply to the confirmation email.  But was the
email you got back from the bot a confirmation of unsubscribe email?

On Thu, Aug 30, 2012 at 1:15 AM, sathyavageeswaran
sat...@morisonmenon.comwrote:

 I sent to user-unsubscr...@hadoop.apache.org in the beginning. 

 ** **

 Followed by that I mailed to the links that the automatic mail sends in
 vain.

 ** **

 Later I thought of all possible permutations that are possible using the
 15 alphabets and once character that is in the email address of
 user-unsubscribe. While deciding the permutations combinations to ensure
 that i don’t miss any, I even sent to those permutations of email ID’s that
 do not have any literal meaning in English language. 

 ** **

 In the address the alphabet ‘u’ repeated 3 times, ‘r’ repeated 2 times,
 ‘s’ repeated 3 times ‘e’ repeated 2 times ‘b’ repeated 2 times and rest 1
 time. So i used the classical method of solving such combinations using the
 methods given in Higher Algebra by Hall and Knight published first in 1897
 to arrive at the all possible permutations using the formula 

 ** **

 !16/ (!3 !2 !3 !2 !2 ) 

 ** **

 Then using mail merge i sent email to all the email ID’s generated. 

 ** **

 But no success so far. 

 ** **

 I also tried some keywords to block mails but that led to more chaos.

 ** **

 Fe important email ID’s of my boss, wife, etc. were also blocked leading
 to consequences that have potential to affect me. 

 ** **

 Hence have decided to live with mails from u...@hadoop.org and enjoy!!!
 till all users of hadoop start blocking my email ID and organizers of
 hadoop also block me specially io their mailing list. 

 ** **

 ** **

 ** **

 user-unsubscr...@hadoop.apache.org

 ** **

 *From:* Ted Dunning [mailto:tdunn...@maprtech.com]
 *Sent:* 30 August 2012 10:28
 *To:* user@hadoop.apache.org
 *Cc:* Dan Yi; Jay
 *Subject:* Re: How to unsubscribe (was Re: unsubscribe)

 ** **

 Can you say which addresses you sent emails so?

 ** **

 The merging of mailing lists may have left you subscribed to a different
 group than you expected.  Thus, your assumptions may not match what is
 required.  If you provide a specific list, somebody might be able to help
 you.

 ** **

 Also, I was asking about the *exact* email address listed in the one
 message before mine, not anything you have typed.  Sorry to not trust that
 your all permutations and combinations may not be correct, but any
 generation of permutations is based on assumptions and you don't state your
 assumptions.  By their nature, we often don't even know our own assumptions
 so I would ask the same question even if you said what you thought your
 assumptions are.

 ** **

 The reason that I ask this in this slightly abrasive way is that there may
 have been a bug in the mailing list merge.  The only way to distinguish
 that from a bug in your permutation generation is to inspect the list of
 email addresses you used.

 ** **

 It may also be that your email from the mailing list has been forwarded so
 that the return address when you send email to the bot isn't one that is
 subscribed.

 On Thu, Aug 30, 2012 at 12:49 AM, sathyavageeswaran 
 sat...@morisonmenon.com wrote:

 Of course have sent emails to all permutations and combinations of emails
 listed with appropriate subject matter. 

  

 *From:* Ted Dunning [mailto:tdunn...@maprtech.com]
 *Sent:* 30 August 2012 10:12
 *To:* user@hadoop.apache.org
 *Cc:* Dan Yi; Jay
 *Subject:* Re: How to unsubscribe (was Re: unsubscribe)

  

 That was a stupid joke.  It wasn't real advice.

  

 Have you sent email to the specific email address listed?

 On Thu, Aug 30, 2012 at 12:35 AM, sathyavageeswaran 
 sat...@morisonmenon.com wrote:

 I have tried every trick to get self unsubscribed. Yesterday I got a mail
 saying you can't unsubscribe once subscribed.

 -Original Message-
 From: Andy Isaacson [mailto:a...@cloudera.com]
 Sent: 30 August 2012 03:25
 To: user@hadoop.apache.org
 Cc: Dan Yi; Jay
 Subject: How to unsubscribe (was Re: unsubscribe)

 Hi folks,

 Replying to this thread is not going to get you unsubscribed and will just
 annoy everyone else who's subscribed. To unsubscribe please send an email
 to
 user-unsubscr...@hadoop.apache.org from your subscribed address.

 For more info please visit
 http://hadoop.apache.org/common/mailing_lists.html#Users

 Thanks,
 -andy

 On Wed, Aug 29, 2012 at 1:51 PM, Jay su1...@yahoo.com wrote:
 
  
  From: Dan Yi d...@mediosystems.com
  To: user@hadoop.apache.org