[ https://issues.apache.org/jira/browse/HADOOP-17905?focusedWorklogId=656566&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656566 ]
ASF GitHub Bot logged work on HADOOP-17905: ------------------------------------------- Author: ASF GitHub Bot Created on: 28/Sep/21 20:00 Start Date: 28/Sep/21 20:00 Worklog Time Spent: 10m Work Description: pbacsko commented on a change in pull request #3423: URL: https://github.com/apache/hadoop/pull/3423#discussion_r717676453 ########## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/Text.java ########## @@ -301,9 +305,18 @@ public void clear() { */ private boolean ensureCapacity(final int capacity) { if (bytes.length < capacity) { + // use long to allow overflow + long tmpLength = bytes.length; + long tmpCapacity = capacity; + // Try to expand the backing array by the factor of 1.5x - // (by taking the current size + diving it by half) - int targetSize = Math.max(capacity, bytes.length + (bytes.length >> 1)); + // (by taking the current size + diving it by half). + // + // If the calculated value is beyond the size + // limit, we cap it to ARRAY_MAX_SIZE + int targetSize = (int)Math.min(ARRAY_MAX_SIZE, Review comment: The problem is that `bytes.length + (bytes.length >> 1);` might overflow and end up being negative which results in choosing `capacity` all the time instead of `ARRAY_MAX_SIZE`. So we either temporarily store this as `long` and cast back to `int` or we directly check if `targetSize` is negative after `int targetSize = (int) bytes.length + (bytes.length >> 1);`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 656566) Time Spent: 3h (was: 2h 50m) > Modify Text.ensureCapacity() to efficiently max out the backing array size > -------------------------------------------------------------------------- > > Key: HADOOP-17905 > URL: https://issues.apache.org/jira/browse/HADOOP-17905 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Peter Bacsko > Assignee: Peter Bacsko > Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > This is a continuation of HADOOP-17901. > Right now we use a factor of 1.5x to increase the byte array if it's full. > However, if the size reaches a certain point, the increment is only (current > size + length). This can cause performance issues if the textual data which > we intend to store is beyond this point. > Instead, let's max out the array to the maximum. Based on different sources, > a safe choice seems to be Integer.MAX_VALUE - 8 (see ArrayList, > AbstractCollection, HashTable, etc). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org