[ 
https://issues.apache.org/jira/browse/HADOOP-17905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated HADOOP-17905:
----------------------------------
    Description: 
This is a continuation of HADOOP-17901.

Right now we use a factor of 1.5x to increase the byte array if it's full. 
However, if the size reaches a certain point, the increment is only (current 
size + length). This can cause performance issues if the textual data which we 
intend to store is beyond this point.

Instead, let's max out the array to the maximum. Based on different sources, a 
safe choice seems to be Integer.MAX_VALUE - 8 (see ArrayList, 
AbstractCollection, HashTable, etc).

  was:
This is a continuation of HADOOP-17901.

Right now we use a factor of 1.5x to increase the byte array if it's full. 
However, if the size reaches a certain point, the increment is only (current 
size + length). This can cause performance issues if the textual data which we 
intend to store is beyond this point.

Instead, let's max out the array to the maximum. Based on different sources, 
this is usually determined to be Integer.MAX_VALUE - 8 (see ArrayList, 
AbstractCollection, HashTable, etc).


> Modify Text.ensureCapacity() to efficiently max out the backing array size
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-17905
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17905
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>
> This is a continuation of HADOOP-17901.
> Right now we use a factor of 1.5x to increase the byte array if it's full. 
> However, if the size reaches a certain point, the increment is only (current 
> size + length). This can cause performance issues if the textual data which 
> we intend to store is beyond this point.
> Instead, let's max out the array to the maximum. Based on different sources, 
> a safe choice seems to be Integer.MAX_VALUE - 8 (see ArrayList, 
> AbstractCollection, HashTable, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to