[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

BitsOfInfo (JIRA) Fri, 13 Nov 2009 18:02:18 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777802#action_12777802
 ]


BitsOfInfo commented on MAPREDUCE-1176:
---------------------------------------

I followed the instructions listed @ 
http://wiki.apache.org/hadoop/HowToContribute  

"Finally, patches should be attached to an issue report in Jira via the Attach 
File link on the issue's Jira. When you believe that your patch is ready to be 
committed, select the Submit Patch link on the issue's Jira. "

So are you saying to delete the 2 *.java files and only upload the .patch?

The *.patch file does contain a unit test in it so I am not sure why the 
comment above reported no tests were included. I ran this patch file against a 
clean trunk copy locally on my test machine and also verified it was ok through 
the "test-patch" task on the contribute how-to page.

I'll remove the 4 space indents, when looking through other sources I found the 
2 spaces and tons of wrapping unreadable.

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1176
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 0.20.1, 0.20.2
>         Environment: Any
>            Reporter: BitsOfInfo
>            Priority: Minor
>         Attachments: FixedLengthInputFormat.java, 
> FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

Reply via email to