[
https://issues.apache.org/jira/browse/CRUNCH-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tomáš Čechal updated CRUNCH-577:
--------------------------------
Attachment: 0001-CRUNCH-577-Use-getLongBytes-to-correctly-parse-dfs-b.patch
One-liner that fixes the problem.
> NumberFormatException when parsing dfs.block.size
> -------------------------------------------------
>
> Key: CRUNCH-577
> URL: https://issues.apache.org/jira/browse/CRUNCH-577
> Project: Crunch
> Issue Type: Bug
> Components: IO
> Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.8.2, 0.10.0, 0.8.3, 0.8.4,
> 0.11.0, 0.12.0
> Reporter: Tomáš Čechal
> Priority: Minor
> Labels: patch
> Attachments:
> 0001-CRUNCH-577-Use-getLongBytes-to-correctly-parse-dfs-b.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> When using file size abbreviations (like "128m") for the HDFS configuration
> property "dfs.block.size" the Crunch job crashes with a
> NumberFormatException. According to the Hadoop documentation
> (https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml),
> this style of abbreviations should be supported.
> The problem occurs at line 38 in CrunchCombineFileInputFormat.java when the
> configuration property is parsed using the getLong() method instead of
> getLongBytes() method. Furthermore, obsolete configuration key
> "dfs.block.size" is used instead of "dfs.blocksize" (see
> https://issues.apache.org/jira/browse/HDFS-631) which leads to a warning
> message being emitted when starting a MR pipeline.
> The proposed solution discussed on the crunch-users mailing list
> (http://mail-archives.apache.org/mod_mbox/crunch-user/201511.mbox/browser) is
> to use the getLongBytes() method and the new config key "dfs.blocksize".
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)