[ 
https://issues.apache.org/jira/browse/PHOENIX-34?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898439#comment-13898439
 ] 

Maryann Xue commented on PHOENIX-34:
------------------------------------

The current formula for size estimate is:

estimatedSize = SizedUtil.sizeOfMap(nRows, 
SizedUtil.IMMUTABLE_BYTES_WRITABLE_SIZE, SizedUtil.RESULT_SIZE) + 
hashCacheBytes.length;

Looks like some legacy code there, did not read into it before. But now looking 
at it, I understand the first part as the fixed overhead for the hash-cache 
map, and second part as the real value size, with the assumption that the real 
key size can be small enough to be ignored.

What looks weird here is with this current formula, the calculated 
"estimatedSize" should be way below 2GB by all means.
Please correct me if I'm wrong:

There are 4 fields including PK, and every field is of CHAR(10) type. Every row 
has been projected as KeyValueSchema type, which means is transformed into one 
single KeyValue. So given it 60 bytes each row, the size of hashCacheBytes is 
about 250K * 60 = 15M. And SizedUtil.sizeOfMap(nRows, 
SizedUtil.IMMUTABLE_BYTES_WRITABLE_SIZE, SizedUtil.RESULT_SIZE) = 250K * (44 + 
48 + 88) = 45M. Added together, the size estimate is 60M.

I will try logging some information and see what goes wrong here.




> Insufficient memory exception on join when RHS rows count > 250K 
> -----------------------------------------------------------------
>
>                 Key: PHOENIX-34
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-34
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>         Environment: HBase 0.94.14, r1543222, Hadoop 1.0.4, r1393290, 2 RS + 
> 1 Master, Heap 4GB per RS
>            Reporter: Mujtaba Chohan
>             Fix For: 3.0.0
>
>
> Join fails when rows count of RHS table is >250K. Detail on table schema is 
> and performance numbers with different LHS/RHS row count is on 
> http://phoenix-bin.github.io/client/performance/phoenix-20140210023154.htm.
> James comment:
> So that's with a 4GB heap allowing Phoenix to use 50% of it. With a pretty 
> narrow table: 3 KV columns of 30bytes. Topping out at 250K is a bit low. I 
> wonder if our memory estimation matches reality.
> What do you think Maryann?
> How about filing a JIRA, Mujtaba. This is a good conversation to have on the 
> dev list. Can we move it there, please? 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to