[ https://issues.apache.org/jira/browse/HDFS-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605064#comment-13605064 ]
rajat agarwal commented on HDFS-2541: ------------------------------------- {code} long period = Math.min(scanPeriod, Math.max(blockMap.size(),1) * 600 * 1000L); {code} The problem here is that when the blockMap.size becomes >3579139 then the result of its multiplication with 600 exceeds the INTEGER.MAX_VALUE and gives negative value. Had it been like {code} long period = Math.min(scanPeriod, Math.max(blockMap.size(),1) * 600L * 1000L); {code} "period" would have been dependent on scanPeriod only OR always positive. Now there can be two cases here: 1) If scanPeriod is taken up the default value (21*24L three weeks) or its less than 596 hours (596*3600*1000 < INTEGER.MAX_VALUE) the "period" would always been a positive value. 2) If the scan period is more than that, the "period" would again be negative. In this case we can have something like this {code} if((int)period < 0) period=scanPeriod return System.currentTimeMillis() - scanPeriod + random.nextInt((int)period); {code} > For a sufficiently large value of blocks, the DN Scanner may request a random > number with a negative seed value. > ---------------------------------------------------------------------------------------------------------------- > > Key: HDFS-2541 > URL: https://issues.apache.org/jira/browse/HDFS-2541 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 0.20.1 > Reporter: Harsh J > Assignee: Harsh J > Fix For: 0.23.1, 1.1.0 > > Attachments: BSBugTest.java, HDFS-2541.patch > > > Running off 0.20-security, I noticed that one could get the following > exception when scanners are used: > {code} > DataXceiver > java.lang.IllegalArgumentException: n must be positive > at java.util.Random.nextInt(Random.java:250) > at > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.getNewBlockScanTime(DataBlockScanner.java:251) > > at > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.addBlock(DataBlockScanner.java:268) > > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:432) > > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122) > {code} > This is cause the period, determined in the DataBlockScanner (0.20+) or > BlockPoolSliceScanner (0.23+), is cast to an integer before its sent to a > Random.nextInt(...) call. For sufficiently large values of the long 'period', > the casted integer may be negative. This is not accounted for. I'll attach a > sample test that shows this possibility with the numbers. > We should ensure we do a Math.abs(...) before we send it to the > Random.nextInt(...) call to avoid this. > With this bug, the maximum # of blocks a scanner may hold in its blocksMap > without opening up the chance for beginning this exception (intermittent, as > blocks continue to grow) would be 3582718. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira