Re: Assignment of data splits to mappers

2013-06-13 Thread Bertrand Dechoux
The first question can be split (no pun intended) into two topics because there is actually two distinct steps. First, the InputFormat partitions the data source into InputSplits. Its implementation will determine the exact logic. Then the scheduler is responsible for ordering where/when the InputS

Re: Assignment of data splits to mappers

2013-06-13 Thread Harsh J
Hey John, I don't see the similarity. If you take the case of a normal record file, such as a text file, you read data from the next block. That is, n-1 blocks are "opened" twice, but not read entirely in both attempts. In the link you refer to, a specific block will always be read by all readers

RE: Assignment of data splits to mappers

2013-06-14 Thread John Lilley
under most file formats, records *will* span blocks. But if it were simple to prevent them from spanning blocks, would that be of benefit? john From: Bertrand Dechoux [mailto:decho...@gmail.com] Sent: Thursday, June 13, 2013 3:37 PM To: user@hadoop.apache.org Subject: Re: Assignment of data splits

Re: Assignment of data splits to mappers

2013-06-18 Thread Bertrand Dechoux
mple > to prevent them from spanning blocks, would that be of benefit? > > john > > ** ** > > *From:* Bertrand Dechoux [mailto:decho...@gmail.com] > *Sent:* Thursday, June 13, 2013 3:37 PM > *To:* user@hadoop.apache.org > *Subject:* Re: Assignment of data spli

RE: Assignment of data splits to mappers

2013-07-01 Thread John Lilley
s a connection. Cheers, John From: Bertrand Dechoux [mailto:decho...@gmail.com] Sent: Tuesday, June 18, 2013 3:54 PM To: user@hadoop.apache.org Subject: Re: Assignment of data splits to mappers 1) The tradeoff is between reducing the overhead of distributed computing and reducing the cost of failu