Tomáš Čechal created CRUNCH-577:
-----------------------------------

             Summary: NumberFormatException when parsing dfs.block.size
                 Key: CRUNCH-577
                 URL: https://issues.apache.org/jira/browse/CRUNCH-577
             Project: Crunch
          Issue Type: Bug
          Components: IO
    Affects Versions: 0.12.0, 0.11.0, 0.8.4, 0.8.3, 0.10.0, 0.8.2, 0.9.0, 
0.8.1, 0.8.0
            Reporter: Tomáš Čechal
            Priority: Minor


When using file size abbreviations (like "128m") for the HDFS configuration 
property "dfs.block.size" the Crunch job crashes with a NumberFormatException. 
According to the Hadoop documentation 
(https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml),
 this style of abbreviations should be supported.

The problem occurs at line 38 in CrunchCombineFileInputFormat.java when the 
configuration property is parsed using the getLong() method instead of 
getLongBytes() method. Furthermore, obsolete configuration key "dfs.block.size" 
is used instead of "dfs.blocksize" (see 
https://issues.apache.org/jira/browse/HDFS-631) which leads to a warning 
message being emitted when starting a MR pipeline.

The proposed solution discussed on the crunch-users mailing list 
(http://mail-archives.apache.org/mod_mbox/crunch-user/201511.mbox/browser) is 
to use the getLongBytes() method and the new config key "dfs.blocksize".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to