Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer

2012-12-30 Thread Niels Basjes
F. put a mongodb replica set on all hadoop workernodes and let the tasks
query the mongodb at localhost.

(this is what I did recently with a multi GiB dataset)

-- 
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 30 dec. 2012 20:01 schreef "Jonathan Bishop"  het
volgende:

> E. Store them in hbase...
>
>
> On Sun, Dec 30, 2012 at 12:24 AM, Hemanth Yamijala <
> yhema...@thoughtworks.com> wrote:
>
>> If it is a small number, A seems the best way to me.
>>
>> On Friday, December 28, 2012, Kshiva Kps wrote:
>>
>>>
>>> Which one is current ..
>>>
>>>
>>> What is the preferred way to pass a small number of configuration
>>> parameters to a mapper or reducer?
>>>
>>>
>>>
>>>
>>>
>>> *A.  *As key-value pairs in the jobconf object.
>>>
>>> * *
>>>
>>> *B.  *As a custom input key-value pair passed to each mapper or
>>> reducer.
>>>
>>> * *
>>>
>>> *C.  *Using a plain text file via the Distributedcache, which each
>>> mapper or reducer reads.
>>>
>>> * *
>>>
>>> *D.  *Through a static variable in the MapReduce driver class (i.e.,
>>> the class that submits the MapReduce job).
>>>
>>>
>>>
>>> *Answer: B*
>>>
>>>
>>>
>>
>


Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer

2012-12-30 Thread Jonathan Bishop
Nagarjuna,

Can you explain in more detail - what is the cost of using hbase as a
configuration storage for MR jobs, say if there are many of them.

Jon


On Sun, Dec 30, 2012 at 11:02 AM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlap...@gmail.com> wrote:

> Only if u have few mappers and reducers
>
>
> On Monday, December 31, 2012, Jonathan Bishop wrote:
>
>> E. Store them in hbase...
>>
>>
>> On Sun, Dec 30, 2012 at 12:24 AM, Hemanth Yamijala <
>> yhema...@thoughtworks.com> wrote:
>>
>> If it is a small number, A seems the best way to me.
>>
>> On Friday, December 28, 2012, Kshiva Kps wrote:
>>
>>
>> Which one is current ..
>>
>>
>> What is the preferred way to pass a small number of configuration
>> parameters to a mapper or reducer?
>>
>>
>>
>>
>>
>> *A.  *As key-value pairs in the jobconf object.
>>
>> * *
>>
>> *B.  *As a custom input key-value pair passed to each mapper or reducer.
>>
>> * *
>>
>> *C.  *
>>
>>
>
> --
> Sent from iPhone
>


Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer

2012-12-30 Thread nagarjuna kanamarlapudi
Only if u have few mappers and reducers

On Monday, December 31, 2012, Jonathan Bishop wrote:

> E. Store them in hbase...
>
>
> On Sun, Dec 30, 2012 at 12:24 AM, Hemanth Yamijala <
> yhema...@thoughtworks.com> wrote:
>
> If it is a small number, A seems the best way to me.
>
> On Friday, December 28, 2012, Kshiva Kps wrote:
>
>
> Which one is current ..
>
>
> What is the preferred way to pass a small number of configuration
> parameters to a mapper or reducer?
>
>
>
>
>
> *A.  *As key-value pairs in the jobconf object.
>
> * *
>
> *B.  *As a custom input key-value pair passed to each mapper or reducer.
>
> * *
>
> *C.  *
>
>

-- 
Sent from iPhone


Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer

2012-12-30 Thread Jonathan Bishop
E. Store them in hbase...


On Sun, Dec 30, 2012 at 12:24 AM, Hemanth Yamijala <
yhema...@thoughtworks.com> wrote:

> If it is a small number, A seems the best way to me.
>
> On Friday, December 28, 2012, Kshiva Kps wrote:
>
>>
>> Which one is current ..
>>
>>
>> What is the preferred way to pass a small number of configuration
>> parameters to a mapper or reducer?
>>
>>
>>
>>
>>
>> *A.  *As key-value pairs in the jobconf object.
>>
>> * *
>>
>> *B.  *As a custom input key-value pair passed to each mapper or reducer.
>>
>> * *
>>
>> *C.  *Using a plain text file via the Distributedcache, which each
>> mapper or reducer reads.
>>
>> * *
>>
>> *D.  *Through a static variable in the MapReduce driver class (i.e., the
>> class that submits the MapReduce job).
>>
>>
>>
>> *Answer: B*
>>
>>
>>
>


Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer

2012-12-30 Thread Michael Segel
Ed, 

There are some who are of the opinion that these certifications are worthless. 
I tend to disagree, however, I don't think that they are the best way to 
demonstrate one's abilities. 

IMHO they should provide a baseline. 

We have seen these types of questions on the list and in the forums. They 
appear to be taken from a certain vendor's prior certification tests and 
accumulated over time. 

The sad thing is that when we respond to newbie questions we need to ask 
ourselves if the question is real or if they are asking the question because 
its a certification question.

I'd also be careful in expressing your opinion... I wonder how long before a 
certain someone expresses their displeasure in your comment. ;-) 

Just saying! 

:-)

On Dec 28, 2012, at 7:20 PM, Edward Capriolo  wrote:

> Yes. another big data, data scientist, no ops, devops, cloud computing 
> specialist is born. Thank goodness we have multiple choice tests to identify 
> the best coders and administrators.
> 
> On Friday, December 28, 2012, Michel Segel  wrote:
> > Sounds like someone is cheating on a test...
> >
> > Sent from a remote device. Please excuse any typos...
> > Mike Segel
> > On Dec 28, 2012, at 3:10 PM, Ted Dunning  wrote:
> >
> > Answer B sounds pathologically bad to me.
> > A or C are the only viable options.
> > Neither B nor D work.  B fails because it would be extremely hard to get 
> > the right records to the right components and because it pollutes data 
> > input with configuration data.  D fails because statics don't work in 
> > parallel programs.
> >
> > On Fri, Dec 28, 2012 at 12:17 AM, Kshiva Kps  wrote:
> >
> > Which one is current ..
> >
> > What is the preferred way to pass a small number of configuration 
> > parameters to a mapper or reducer?
> >
> >  
> >
> >  
> >
> > A.  As key-value pairs in the jobconf object.
> >
> >  
> >
> > B.  As a custom input key-value pair passed to each mapper or reducer.
> >
> >  



Re: Hadoop harddrive space usage

2012-12-30 Thread Jean-Marc Spaggiari
Perfect, thanks. It's what I was looking for.

I have few nodes, all with 2TB drives, but one with 2x1TB. Which mean
that at the end, for Hadoop, it's almost the same thing.

JM

2012/12/28, Robert Molina :
> Hi Jean,
> Hadoop will not factor in number of disks or directories, but rather mainly
> allocated free space.  Hadoop will do its best to spread the data across
> evenly amongst the nodes.  For instance, let's say you had 3 datanodes
> (replication factor 1) and all have allocated 10GB each, but one of the
> nodes split the 10GB into two directories.  Now if we try to store a file
> that takes up 3 blocks, Hadoop will just place 1 block in each node.
>
> Hope that helps.
>
> Regards,
> Robert
>
> On Fri, Dec 28, 2012 at 9:12 AM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
>> Hi,
>>
>> Quick question regarding hard drive space usage.
>>
>> Hadoop will distribute the data evenly on the cluster. So all the
>> nodes are going to receive almost the same quantity of data to store.
>>
>> Now, if on one node I have 2 directories configured, is hadoop going
>> to assign twice the quantity on this node? Or is each directory going
>> to receive half the load?
>>
>> Thanks,
>>
>> JM
>>
>


Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer

2012-12-30 Thread Hemanth Yamijala
If it is a small number, A seems the best way to me.

On Friday, December 28, 2012, Kshiva Kps wrote:

>
> Which one is current ..
>
>
> What is the preferred way to pass a small number of configuration
> parameters to a mapper or reducer?
>
>
>
>
>
> *A.  *As key-value pairs in the jobconf object.
>
> * *
>
> *B.  *As a custom input key-value pair passed to each mapper or reducer.
>
> * *
>
> *C.  *Using a plain text file via the Distributedcache, which each mapper
> or reducer reads.
>
> * *
>
> *D.  *Through a static variable in the MapReduce driver class (i.e., the
> class that submits the MapReduce job).
>
>
>
> *Answer: B*
>
>
>