[ 
https://issues.apache.org/jira/browse/PHOENIX-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15752811#comment-15752811
 ] 

Geoffrey Jacoby commented on PHOENIX-541:
-----------------------------------------

Attached second version of patch for comment. A couple of points:

1. I wasn't able to increase the DEFAULT_MUTATE_BATCH_SIZE to Integer.MAX_VALUE 
because quite a few places in Phoenix use that value to initialize arraylist 
capacities, so increasing the value led to tons of OOM exceptions. (These cases 
will need to be changed when DEFAULT_MUTATE_BATCH_SIZE is removed in a future 
JIRA.)

2. So far I haven't created MAX_MUTATION_SIZE_BYTES_ATTRIB, because I'm not 
sure it's necessary. Right now the only place I can see that's using the 
row-based equivalent is in MutationState.throwIfTooBig(), and I'm not sure if 
that needs to continue to exist, since it's only called when a MutationState is 
joined to another one, and we now handle the "overly-large MutationState" case 
by partitioning our batched Mutations to HBase. 

For example, PhoenixIndexImportDirectMapper already Math.min()'s it with the 
max batch row size, and DeleteCompiler only grabs the 
MAX_MUTATION_SIZE_BYTES_ATTRIB to pass it into MutationState's constructor so 
it can be used in throwIfTooBig(). 

[~jamestaylor] [~samarthjain]



> Make mutable batch size bytes-based instead of row-based
> --------------------------------------------------------
>
>                 Key: PHOENIX-541
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-541
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 3.0-Release
>            Reporter: mujtaba
>            Assignee: Geoffrey Jacoby
>              Labels: newbie
>             Fix For: 4.10.0
>
>         Attachments: PHOENIX-541-v2.patch, PHOENIX-541.patch
>
>
> With current configuration of row-count based mutable batch size, ideal value 
> for batch size is around 800 rather then current 15k when creating indexes 
> based on memory consumption, CPU and GC (data size: key: ~60 bytes, 14 
> integer column in separate CFs)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to