Re: 1 file per record
suppose i use TextInputFormat.. i set issplitable false.. and there are 5 files.. so what happens to numsplits now... will that be set to 0.. S.Chandravadana owen.omalley wrote: On Oct 2, 2008, at 1:50 AM, chandravadana wrote: If we dont specify numSplits in getsplits(), then what is the default number of splits taken... The getSplits() is either library or user code, so it depends which class you are using as your InputFormat. The FileInputFormats (TextInputFormat and SequenceFileInputFormat) basically divide input files by blocks, unless the requested number of mappers is really high. -- Owen -- View this message in context: http://www.nabble.com/1-file-per-record-tp19644985p19794194.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: 1 file per record
hi all... i have doubt.. If we dont specify numSplits in getsplits(), then what is the default number of splits taken... -- Best Regards S.Chandravadana -- View this message in context: http://www.nabble.com/1-file-per-record-tp19644985p19775580.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: 1 file per record
On Oct 2, 2008, at 1:50 AM, chandravadana wrote: If we dont specify numSplits in getsplits(), then what is the default number of splits taken... The getSplits() is either library or user code, so it depends which class you are using as your InputFormat. The FileInputFormats (TextInputFormat and SequenceFileInputFormat) basically divide input files by blocks, unless the requested number of mappers is really high. -- Owen
Re: 1 file per record
hi i'm writing an appln which computes using the entire data from a file. for that purpose i dont want to split my file and the entire file shd go to map task.. i've been able to override isSplitable() do it and the file is not getting split now.. then.. i had to store the input values to an array..(in map func) and then proceed with my computation. when i displayed that array i found only the last line of the file getting displayed... does this mean that data is read line by line by the line reader and not continously. if so, what shd i do inorder to read complete contents of the file... Thank you Chandravadana S Enis Soztutar wrote: Yes, you can use MultiFileInputFormat. You can extend the MultiFileInputFormat to return a RecordReader, which reads a record for each file in the MultiFileSplit. Enis chandra wrote: hi.. By setting isSplitable false, we can set 1 file with n records 1 mapper. Is there any way to set 1 complete file per record.. Thanks in advance Chandravadana S This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful. -- View this message in context: http://www.nabble.com/1-file-per-record-tp19644985p19685269.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: 1 file per record
hi By setting isSplitable false, we prevent the files from splitting. we can check that from the no. of map tasks.. but how do we check, if the records are proper.. Chandravadana S Enis Soztutar wrote: Nope, not right now. But this has came up before. Perhaps you will contribute one? chandravadana wrote: thanks is there any built in record reader which performs this function.. Enis Soztutar wrote: Yes, you can use MultiFileInputFormat. You can extend the MultiFileInputFormat to return a RecordReader, which reads a record for each file in the MultiFileSplit. Enis chandra wrote: hi.. By setting isSplitable false, we can set 1 file with n records 1 mapper. Is there any way to set 1 complete file per record.. Thanks in advance Chandravadana S This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful. -- View this message in context: http://www.nabble.com/1-file-per-record-tp19644985p19667750.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: 1 file per record
Yes, you can use MultiFileInputFormat. You can extend the MultiFileInputFormat to return a RecordReader, which reads a record for each file in the MultiFileSplit. Enis chandra wrote: hi.. By setting isSplitable false, we can set 1 file with n records 1 mapper. Is there any way to set 1 complete file per record.. Thanks in advance Chandravadana S This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful.
Re: 1 file per record
thanks is there any built in record reader which performs this function.. Enis Soztutar wrote: Yes, you can use MultiFileInputFormat. You can extend the MultiFileInputFormat to return a RecordReader, which reads a record for each file in the MultiFileSplit. Enis chandra wrote: hi.. By setting isSplitable false, we can set 1 file with n records 1 mapper. Is there any way to set 1 complete file per record.. Thanks in advance Chandravadana S This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful. -- View this message in context: http://www.nabble.com/1-file-per-record-tp19644985p19646442.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: 1 file per record
Nope, not right now. But this has came up before. Perhaps you will contribute one? chandravadana wrote: thanks is there any built in record reader which performs this function.. Enis Soztutar wrote: Yes, you can use MultiFileInputFormat. You can extend the MultiFileInputFormat to return a RecordReader, which reads a record for each file in the MultiFileSplit. Enis chandra wrote: hi.. By setting isSplitable false, we can set 1 file with n records 1 mapper. Is there any way to set 1 complete file per record.. Thanks in advance Chandravadana S This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful.