Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer
F. put a mongodb replica set on all hadoop workernodes and let the tasks query the mongodb at localhost. (this is what I did recently with a multi GiB dataset) -- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel ) Op 30 dec. 2012 20:01 schreef "Jonathan Bishop" het volgende: > E. Store them in hbase... > > > On Sun, Dec 30, 2012 at 12:24 AM, Hemanth Yamijala < > yhema...@thoughtworks.com> wrote: > >> If it is a small number, A seems the best way to me. >> >> On Friday, December 28, 2012, Kshiva Kps wrote: >> >>> >>> Which one is current .. >>> >>> >>> What is the preferred way to pass a small number of configuration >>> parameters to a mapper or reducer? >>> >>> >>> >>> >>> >>> *A. *As key-value pairs in the jobconf object. >>> >>> * * >>> >>> *B. *As a custom input key-value pair passed to each mapper or >>> reducer. >>> >>> * * >>> >>> *C. *Using a plain text file via the Distributedcache, which each >>> mapper or reducer reads. >>> >>> * * >>> >>> *D. *Through a static variable in the MapReduce driver class (i.e., >>> the class that submits the MapReduce job). >>> >>> >>> >>> *Answer: B* >>> >>> >>> >> >
Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer
Nagarjuna, Can you explain in more detail - what is the cost of using hbase as a configuration storage for MR jobs, say if there are many of them. Jon On Sun, Dec 30, 2012 at 11:02 AM, nagarjuna kanamarlapudi < nagarjuna.kanamarlap...@gmail.com> wrote: > Only if u have few mappers and reducers > > > On Monday, December 31, 2012, Jonathan Bishop wrote: > >> E. Store them in hbase... >> >> >> On Sun, Dec 30, 2012 at 12:24 AM, Hemanth Yamijala < >> yhema...@thoughtworks.com> wrote: >> >> If it is a small number, A seems the best way to me. >> >> On Friday, December 28, 2012, Kshiva Kps wrote: >> >> >> Which one is current .. >> >> >> What is the preferred way to pass a small number of configuration >> parameters to a mapper or reducer? >> >> >> >> >> >> *A. *As key-value pairs in the jobconf object. >> >> * * >> >> *B. *As a custom input key-value pair passed to each mapper or reducer. >> >> * * >> >> *C. * >> >> > > -- > Sent from iPhone >
Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer
Only if u have few mappers and reducers On Monday, December 31, 2012, Jonathan Bishop wrote: > E. Store them in hbase... > > > On Sun, Dec 30, 2012 at 12:24 AM, Hemanth Yamijala < > yhema...@thoughtworks.com> wrote: > > If it is a small number, A seems the best way to me. > > On Friday, December 28, 2012, Kshiva Kps wrote: > > > Which one is current .. > > > What is the preferred way to pass a small number of configuration > parameters to a mapper or reducer? > > > > > > *A. *As key-value pairs in the jobconf object. > > * * > > *B. *As a custom input key-value pair passed to each mapper or reducer. > > * * > > *C. * > > -- Sent from iPhone
Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer
E. Store them in hbase... On Sun, Dec 30, 2012 at 12:24 AM, Hemanth Yamijala < yhema...@thoughtworks.com> wrote: > If it is a small number, A seems the best way to me. > > On Friday, December 28, 2012, Kshiva Kps wrote: > >> >> Which one is current .. >> >> >> What is the preferred way to pass a small number of configuration >> parameters to a mapper or reducer? >> >> >> >> >> >> *A. *As key-value pairs in the jobconf object. >> >> * * >> >> *B. *As a custom input key-value pair passed to each mapper or reducer. >> >> * * >> >> *C. *Using a plain text file via the Distributedcache, which each >> mapper or reducer reads. >> >> * * >> >> *D. *Through a static variable in the MapReduce driver class (i.e., the >> class that submits the MapReduce job). >> >> >> >> *Answer: B* >> >> >> >
Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer
Ed, There are some who are of the opinion that these certifications are worthless. I tend to disagree, however, I don't think that they are the best way to demonstrate one's abilities. IMHO they should provide a baseline. We have seen these types of questions on the list and in the forums. They appear to be taken from a certain vendor's prior certification tests and accumulated over time. The sad thing is that when we respond to newbie questions we need to ask ourselves if the question is real or if they are asking the question because its a certification question. I'd also be careful in expressing your opinion... I wonder how long before a certain someone expresses their displeasure in your comment. ;-) Just saying! :-) On Dec 28, 2012, at 7:20 PM, Edward Capriolo wrote: > Yes. another big data, data scientist, no ops, devops, cloud computing > specialist is born. Thank goodness we have multiple choice tests to identify > the best coders and administrators. > > On Friday, December 28, 2012, Michel Segel wrote: > > Sounds like someone is cheating on a test... > > > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Dec 28, 2012, at 3:10 PM, Ted Dunning wrote: > > > > Answer B sounds pathologically bad to me. > > A or C are the only viable options. > > Neither B nor D work. B fails because it would be extremely hard to get > > the right records to the right components and because it pollutes data > > input with configuration data. D fails because statics don't work in > > parallel programs. > > > > On Fri, Dec 28, 2012 at 12:17 AM, Kshiva Kps wrote: > > > > Which one is current .. > > > > What is the preferred way to pass a small number of configuration > > parameters to a mapper or reducer? > > > > > > > > > > > > A. As key-value pairs in the jobconf object. > > > > > > > > B. As a custom input key-value pair passed to each mapper or reducer. > > > >
Re: Hadoop harddrive space usage
Perfect, thanks. It's what I was looking for. I have few nodes, all with 2TB drives, but one with 2x1TB. Which mean that at the end, for Hadoop, it's almost the same thing. JM 2012/12/28, Robert Molina : > Hi Jean, > Hadoop will not factor in number of disks or directories, but rather mainly > allocated free space. Hadoop will do its best to spread the data across > evenly amongst the nodes. For instance, let's say you had 3 datanodes > (replication factor 1) and all have allocated 10GB each, but one of the > nodes split the 10GB into two directories. Now if we try to store a file > that takes up 3 blocks, Hadoop will just place 1 block in each node. > > Hope that helps. > > Regards, > Robert > > On Fri, Dec 28, 2012 at 9:12 AM, Jean-Marc Spaggiari < > jean-m...@spaggiari.org> wrote: > >> Hi, >> >> Quick question regarding hard drive space usage. >> >> Hadoop will distribute the data evenly on the cluster. So all the >> nodes are going to receive almost the same quantity of data to store. >> >> Now, if on one node I have 2 directories configured, is hadoop going >> to assign twice the quantity on this node? Or is each directory going >> to receive half the load? >> >> Thanks, >> >> JM >> >
Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer
If it is a small number, A seems the best way to me. On Friday, December 28, 2012, Kshiva Kps wrote: > > Which one is current .. > > > What is the preferred way to pass a small number of configuration > parameters to a mapper or reducer? > > > > > > *A. *As key-value pairs in the jobconf object. > > * * > > *B. *As a custom input key-value pair passed to each mapper or reducer. > > * * > > *C. *Using a plain text file via the Distributedcache, which each mapper > or reducer reads. > > * * > > *D. *Through a static variable in the MapReduce driver class (i.e., the > class that submits the MapReduce job). > > > > *Answer: B* > > >