Re: reading input for a map function from 2 different files?

2008-11-12 Thread Milind Bhandarkar
Since you need to pass only one number (average) to all mappers, you can
pass it through jobconf with a config variable defined by you, say
my.average..

- milind


On 11/11/08 8:25 PM, some speed [EMAIL PROTECTED] wrote:

 Thanks for the response. What I am trying is to do is finding the average
 and then the standard deviation for a very large set (say a million) of
 numbers. The result would be used in further calculations.
 I have got the average from the first map-reduce chain. now i need to read
 this average as well as the set of numbers to calculate the standard
 deviation.  so one file would have the input set and the other resultant
 file would have just the average.
 Please do tell me in case there is a better way of doing things than what i
 am doing. Any input/suggestion is appreciated.:)
 
 
 
 On Mon, Nov 10, 2008 at 4:22 AM, Amar Kamat [EMAIL PROTECTED] wrote:
 
 Amar Kamat wrote:
 
 some speed wrote:
 
 I was wondering if it was possible to read the input for a map function
 from
 2 different files:
  1st file --- user-input file from a particular location(path)
 
 Is the input/user file sorted? If yes then you can use map-side join for
 performance reasons. See org.apache.hadoop.mapred.join for more details.
 
 2nd file=--- A resultant file (has just one key,value pair) from a
 previous MapReduce job. (I am implementing a chain MapReduce function)
 
 Can you explain in more detail the contents of 2nd file?
 
 
 Now, for every key,value pair in the user-input file, I would like to
 use
 the same key,value pair from the 2nd file for some calculations.
 
 Can you explain this in more detail? Can you give some abstracted example
 of how file1 and file2 look like and what operation/processing you want to
 do?
 
 
 
 I guess you might need to do some kind of join on the 2 files. Look at
 contrib/data_join for more details.
 Amar
 
 Is it possible for me to do so? Can someone guide me in the right
 direction
 please?
 
 
 Thanks!
 
 
 
 
 
 


-- 
Milind Bhandarkar
Y!IM: GridSolutions
408-349-2136 
([EMAIL PROTECTED])



Re: Hadoop Streaming - running a jar file

2008-11-12 Thread Milind Bhandarkar
You should specify A.jar on the bin/hadoop command line with -file A.jar,
so that streaming knows to copy that file on the tasktracker node.

- milind


On 11/11/08 10:50 AM, Amit_Gupta [EMAIL PROTECTED] wrote:

 
 
 Hi
 
 I have a jar file which takes input from stdin and writes something on
 stdout. i.e. When I run
 
 java -jar A.jar  input
 
 It prints the required output.
 
 However, when I run it as a mapper in hadoop streaming using the command
 
 $HADOOP_HOME/bin/hadoop jar streaming.jar -input .. -output ...  -mapper
 'java -jar A.jar'  -reducer NONE
 
 i get the broken pipe exception.
 
 
 the error message is
 
 additionalConfSpec_:null
 null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
 packageJobJar:
 [/mnt/hadoop/HADOOP/hadoop-0.16.3/tmp/dir/hadoop-hadoop/hadoop-unjar45410/]
 [] /tmp/streamjob45411.jar tmpDir=null
 08/11/11 23:20:14 INFO mapred.FileInputFormat: Total input paths to process
 : 1
 08/11/11 23:20:14 INFO streaming.StreamJob: getLocalDirs():
 [/mnt/hadoop/HADOOP/hadoop-0.16.3/tmp/mapred]
 08/11/11 23:20:14 INFO streaming.StreamJob: Running job:
 job_20081724_0014
 08/11/11 23:20:14 INFO streaming.StreamJob: To kill this job, run:
 08/11/11 23:20:14 INFO streaming.StreamJob:
 /mnt/hadoop/HADOOP/hadoop-0.16.3/bin/../bin/hadoop job
 -Dmapred.job.tracker=10.105.41.25:54311 -kill job_20081724_0014
 08/11/11 23:20:15 INFO streaming.StreamJob: Tracking URL:
 http://sayali:50030/jobdetails.jsp?jobid=job_20081724_0014
 08/11/11 23:20:16 INFO streaming.StreamJob:  map 0%  reduce 0%
 08/11/11 23:21:00 INFO streaming.StreamJob:  map 100%  reduce 100%
 08/11/11 23:21:00 INFO streaming.StreamJob: To kill this job, run:
 08/11/11 23:21:00 INFO streaming.StreamJob:
 /mnt/hadoop/HADOOP/hadoop-0.16.3/bin/../bin/hadoop job
 -Dmapred.job.tracker=10.105.41.25:54311 -kill job_20081724_0014
 08/11/11 23:21:00 INFO streaming.StreamJob: Tracking URL:
 http://sayali:50030/jobdetails.jsp?jobid=job_20081724_0014
 08/11/11 23:21:00 ERROR streaming.StreamJob: Job not Successful!
 08/11/11 23:21:00 INFO streaming.StreamJob: killJob...
 Streaming Job Failed!
 
 Could some one please help me with any ideas or pointers.
 
 regards
 Amit
 
 
 --
 View this message in context:
 http://www.nabble.com/Hadoop-Streamingrunning-a-jar-file-tp20445877p204458
 77.html
 Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
 


-- 
Milind Bhandarkar
Y!IM: GridSolutions
408-349-2136 
([EMAIL PROTECTED])



Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Milind Bhandarkar
Tom,

Please consider adding it to: http://wiki.apache.org/hadoop/HadoopArticles

Thanks,

- milind


On 11/2/08 5:57 PM, Tom Wheeler [EMAIL PROTECTED] wrote:

 The article I've written about Hadoop has just been published:
 
http://www.ociweb.com/jnb/jnbNov2008.html
 
 I'd like to again thank Mafish Liu and Amit Kumar Saha for reviewing
 my draft and offering suggestions for helping me improve it.  I hope
 the article is compelling, clear and technically accurate.  However,
 if you notice anything in need of correction, please contact me
 offlist and I will address it ASAP.
 
 Tom Wheeler
 
 On Thu, Oct 23, 2008 at 5:31 PM, Tom Wheeler [EMAIL PROTECTED] wrote:
 Each month the developers at my company write a short article about a
 Java technology we find exciting. I've just finished one about Hadoop
 for November and am seeking a volunteer knowledgeable about Hadoop to
 look it over to help ensure it's both clear and technically accurate.
 
 If you're interested in helping me, please contact me offlist and I
 will send you the draft.  Meanwhile, you can get a feel for the length
 and general style of the articles from our archives:
 
   http://www.ociweb.com/articles/publications/jnb.html
 
 Thanks in advance,
 
 Tom Wheeler
 


-- 
Milind Bhandarkar
Y!IM: GridSolutions
408-349-2136 
([EMAIL PROTECTED])



Re: Hadoop Camp next month

2008-10-21 Thread Milind Bhandarkar
Hi,

Just received an email from ApacheCon US organizers that they are giving 50%
discount for Hadoop Training
(http://us.apachecon.com/c/acus2008/sessions/93).

 
We just created a discount code for you to give people 50% off of the
cost of the training. The code is   Hadoop50The discount would
apply only to the Training.


Hope to see you there !

- Milind


On 10/2/08 9:03 AM, Owen O'Malley [EMAIL PROTECTED] wrote:

 Hi all,
I'd like to remind everyone that the Hadoop Camp  ApacheCon US is
 coming up in New Orleans next month. http://tinyurl.com/hadoop-camp
 
 It will be the largest gathering of Hadoop developers outside of
 California. We'll have:
 
 Core: Doug Cutting, Dhruba Borthakur, Arun Murthy, Owen O'Malley,
 Sameer Paranjpye,
 Sanjay Radia, Tom White
 Zookeeper: Ben Reed
 
 There will also be a training session on Practical Problem Solving
 with Hadoop by Milind Bhandarkar on Monday.
 
 So if you'd like to meet the developers or find out more about Hadoop,
 come join us!
 
 -- Owen


-- 
Milind Bhandarkar
Y!IM: GridSolutions
408-349-2136 
([EMAIL PROTECTED])



Re: Backup Tasks in Hadoop MapReduce.

2008-04-16 Thread Milind Bhandarkar
Yes. In hadoop, you can enable backup tasks by setting
mapred.speculative.execution to true.

- milind


On 4/16/08 8:07 AM, Chaman Singh Verma [EMAIL PROTECTED] wrote:

 
 
 Hello,
 
 I am curious to know whether Hadoop MapReduce has the feature of Backup
 Tasks as described in the
 seminal paper MapReduce:Simplified Data Processing in Large Cluster ( Dean
 and Ghemawat ).
 
 Any implementation detail will be extremely valuable.
 
 Thanks.
 
 With Regards.
 Chaman Singh Verma
 Poona,India
 
 -
 Chaman Singh Verma
 Poona, India
 --
 View this message in context:
 http://www.nabble.com/Backup-Tasks-in-Hadoop-MapReduce.-tp16722539p16722539.ht
 ml
 Sent from the Hadoop core-user mailing list archive at Nabble.com.
 

- Milind
-- 
Milind Bhandarkar, Chief Spammer, Grid Team
Y!IM: GridSolutions
408-349-2136 
([EMAIL PROTECTED])



Re: Aborting Map Function

2008-04-16 Thread Milind Bhandarkar
If you want to kill the whole job (I assume that's what you mean by
aborting all map tasks) from a mapper, you can use:

new JobClient(jobConf).getJob(job.get(mapred.job.id)).killJob();

- milind


On 4/16/08 10:25 AM, Owen O'Malley [EMAIL PROTECTED] wrote:

 On Apr 16, 2008, at 8:28 AM, Chaman Singh Verma wrote:
 
 I am developing one application with MapReduce and in that whenever
 some
 MapTask condition is
 met, I would like to broadcast to all other MapTask to abort their
 work. I
 am not quite sure whether
 such broadcasting functionality currently exist in Hadoop
 MapReduce. Could
 someone give some
 hints.
 
 This is pretty atypical behavior, but you could have each map look
 for the existence of an hdfs file every 1 minute or so. When the
 condition is true, create the file and your maps will exit in the
 next minute. Except on very large clusters, that wouldn't be too
 expensive...
 
 -- Owen

- Milind
-- 
Milind Bhandarkar, Chief Spammer, Grid Team
Y!IM: GridSolutions
408-349-2136 
([EMAIL PROTECTED])