[jira] [Commented] (MAPREDUCE-1176) FixedLengthInputFormat and FixedLengthRecordReader

2013-11-12 Thread BitsOfInfo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820470#comment-13820470
 ] 

BitsOfInfo commented on MAPREDUCE-1176:
---

Glad to see this finally making it in. Thanks to those who picked it up and 
helped push it through.

 FixedLengthInputFormat and FixedLengthRecordReader
 --

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.2.0
 Environment: Any
Reporter: BitsOfInfo
Assignee: Mariappan Asokan
 Fix For: 2.3.0

 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch, mapreduce-1176_v1.patch, 
 mapreduce-1176_v2.patch, mapreduce-1176_v3.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2013-08-08 Thread BitsOfInfo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733518#comment-13733518
 ] 

BitsOfInfo commented on MAPREDUCE-1176:
---

Thanks, anyways, the implementation I originally threw together worked great 
for our needs. If this is an optimization for it in just computing the split 
size, then go ahead.

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Assignee: Mariappan Asokan
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2013-08-06 Thread BitsOfInfo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731149#comment-13731149
 ] 

BitsOfInfo commented on MAPREDUCE-1176:
---

Asokan: Sure go ahead make whatever changes are necessary; as I have no time to 
work on this anymore; yet would like to see this put into the project as I had 
a use for it when I created it and I'm sure others do as well.

BTW: Never had my original question answered from a few years ago in regards to 
the design, maybe I'm was missing something.

bq. Hmm, ok, do you have suggestion on how I detect where one record begins 
and one record ends when records are not identifiable by any sort of consistent 
start character or end character boundary but just flow together? I could 
see the RecordReader detecting that it only read  RECORD LENGTH bytes and 
hitting the end of the split and discarding it. But I am not sure how it would 
detect the start of a record, with a split that has partial data at the start 
of it. Especially if there is no consistent boundary/char marker that 
identifies the start of a record.



 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2010-03-04 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Status: Patch Available  (was: Open)

regtriggering hudson on latest patch, could not see output from last run


 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.2, 0.20.1
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2010-03-04 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Status: Open  (was: Patch Available)

regtriggering hudson on latest patch, could not see output from last run


 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.2, 0.20.1
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2010-02-03 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Status: Patch Available  (was: Open)

triggering hudson

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2010-02-03 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Status: Open  (was: Patch Available)

triggering hudson

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2010-02-01 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: MAPREDUCE-1176-v4.patch

latest patch file, containing changes mentioned above

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2010-02-01 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Status: Patch Available  (was: Open)

triggering hudson for v4 patch file

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-12-10 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Status: Patch Available  (was: Open)

trying to trigger hudson to check v3

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-12-10 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Status: Open  (was: Patch Available)

trying to trigger hudson to check v3

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-12-09 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Status: Open  (was: Patch Available)

triggering hudson

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-12-09 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Status: Patch Available  (was: Open)

triggering hudson

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-12-01 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: MAPREDUCE-1176-v3.patch

- Got rid of tabs, cleaned up formatting
- Added static setters for config
- throw exception if record length property is not set
- handle EOF -1 in loop

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-19 Thread BitsOfInfo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779998#action_12779998
 ] 

BitsOfInfo commented on MAPREDUCE-1176:
---

What is the next step for this issue? Does anything else need to be submitted?

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-19 Thread BitsOfInfo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780367#action_12780367
 ] 

BitsOfInfo commented on MAPREDUCE-1176:
---

Why can't you just keep defaultSize and recordLength as longs?

Because the findbugs threw warnings if they were not cast, secondly the code 
works as expected. Please just shoot over how you want that calculation 
re-written and I can certainly change it. 

- In isSplitable, you catch the exception generated by getRecordLength and 
turn off splitting.
 If there is no record length specified doesn't that mean the input format 
 won't work at all?

Nope, it would still work, as I have yet to see an original raw data file 
containing records of a fixed width, that for some reason does not contain 
complete records. But that's fine, we can just exit out here to let the user 
know they need to configure that property. If there is a better place to check 
for the existence of that property please let me know.

- FixedLengthRecordReader: This record reader does not support compressed 
files. Is this true?

Correct, as stated in the docs. Reason being is that in my case, when I wrote 
this I was not dealing with compressed files. Secondly, if a input file were 
compressed, I was not sure the procedure to properly compute the splits against 
a file that is compressed and the byte lengths of the records would be 
different in a compressed form, vs. once passed to the RecordReader. 

- Throughout, you've still got 4-space indentation in the method bodies. 
Indentation should be by 2.

Does anyone know of a automated tool that will fix this? Driving me nut going 
line by line and hitting delete 2x.. When I look at this in eclipse I am 
not seeing 4 spaces.

- In FixedLengthRecordReader, you hard code a 64KB buffer. Why's this? You 
should let the filesystem use its default.

Sure, I can get rid of that

- In your read loop, you're not accounting for the case of read returning 0 
or -1, which I believe
 can happen at EOF, right? Consider using o.a.h.io.IOUtils.readFully() to 
 replace this loop.

Ditto, I can change to that.

As a general note, I'm not sure I agree with the design here. Rather than 
forcing the split to lie on record boundaries,

Ok, thats fine, I just wanted to contribute what I wrote that is working for my 
case. 

 open the record reader, skip forward to the next record boundary 

Hmm, ok, do you have suggestion on how I detect where one record begins and one 
record ends when records are not identifiable by any sort of consistent start 
character or end character boundary but just flow together?  I could see 
the RecordReader detecting that it only read  RECORD LENGTH bytes and hitting 
the end of the split and discarding it. But I am not sure how it would detect 
the start of a record, with a split that has partial data at the start of it. 
Especially if there is no consistent boundary/char marker that identifies the 
start of a record.






 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing 

[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: MAPREDUCE-1176-v1.patch

Attached is a patch file which adds FixedLengthInputFormat, 
FixedLengthRecordReader and a unit test for the new input format. This patch 
was made against the trunk as of 11/13.

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: FixedLengthRecordReader.java

Re-attached source to match the same as in patch file

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: FixedLengthInputFormat.java, 
 FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: FixedLengthInputFormat.java

Re-attached FixedLengthInputFormat source file, to contain same version as in 
patch file

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: FixedLengthInputFormat.java, 
 FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: (was: FixedLengthRecordReader.java)

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: FixedLengthInputFormat.java, 
 FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Affects Version/s: 0.20.2
 Release Note: 
Addition of FixedLengthInputFormat and FixedLengthRecordReader in the 
org.apache.hadoop.mapreduce.lib.input package. These two classes can be used 
when you need to read data from files containing fixed length (fixed width) 
records. Such files have no CR/LF (or any combination thereof), no delimiters 
etc, but each record is a fixed length, and extra data is padded with spaces. 
The data is one gigantic line within a file. When creating a job that specifies 
this input format, the job must have the 
mapreduce.input.fixedlengthinputformat.record.length property set as follows 
myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 

Please see javadoc for more details.
   Status: Patch Available  (was: Open)

Attached is a patch file which adds FixedLengthInputFormat, 
FixedLengthRecordReader and a unit test for the new input format. This patch 
was made against the trunk as of 11/13.

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: FixedLengthInputFormat.java, 
 FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12777802#action_12777802
 ] 

BitsOfInfo commented on MAPREDUCE-1176:
---

I followed the instructions listed @ 
http://wiki.apache.org/hadoop/HowToContribute  

Finally, patches should be attached to an issue report in Jira via the Attach 
File link on the issue's Jira. When you believe that your patch is ready to be 
committed, select the Submit Patch link on the issue's Jira. 

So are you saying to delete the 2 *.java files and only upload the .patch?

The *.patch file does contain a unit test in it so I am not sure why the 
comment above reported no tests were included. I ran this patch file against a 
clean trunk copy locally on my test machine and also verified it was ok through 
the test-patch task on the contribute how-to page.

I'll remove the 4 space indents, when looking through other sources I found the 
2 spaces and tons of wrapping unreadable.

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: FixedLengthInputFormat.java, 
 FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: (was: FixedLengthRecordReader.java)

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: (was: FixedLengthInputFormat.java)

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-01 Thread BitsOfInfo (JIRA)
Contribution: FixedLengthInputFormat and FixedLengthRecordReader


 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor


Hello,
I would like to contribute the following two classes for incorporation into the 
mapreduce.lib.input package. These two classes can be used when you need to 
read data from files containing fixed length (fixed width) records. Such files 
have no CR/LF (or any combination thereof), no delimiters etc, but each record 
is a fixed length, and extra data is padded with spaces. The data is one 
gigantic line within a file.

Provided are two classes first is the FixedLengthInputFormat and its 
corresponding FixedLengthRecordReader. When creating a job that specifies this 
input format, the job must have the 
mapreduce.input.fixedlengthinputformat.record.length property set as follows

myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);

OR

myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
[myFixedRecordLength]);

This input format overrides computeSplitSize() in order to ensure that 
InputSplits do not contain any partial records since with fixed records there 
is no way to determine where a record begins if that were to occur. Each 
InputSplit passed to the FixedLengthRecordReader will start at the beginning of 
a record, and the last byte in the InputSplit will be the last byte of a 
record. The override of computeSplitSize() delegates to FileInputFormat's 
compute method, and then adjusts the returned split size by doing the 
following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) * 
fixedRecordLength)

This suite of fixed length input format classes, does not support compressed 
files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-01 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: FixedLengthRecordReader.java
FixedLengthInputFormat.java

Attached source files

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: FixedLengthInputFormat.java, FixedLengthRecordReader.java


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.