[ 
https://issues.apache.org/jira/browse/PHOENIX-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723695#comment-15723695
 ] 

Geoffrey Jacoby commented on PHOENIX-541:
-----------------------------------------

Thanks for the feedback, [~jamestaylor]. Will get an updated patch up soon. 

One question: is it worth keeping a distinction between 
MAX_MUTATION_SIZE_ATTRIB and MUTATE_BATCH_SIZE_ATTRIB in the _BYTES version? 
Couldn't the logic just be:

1. If a single mutation is bigger than MUTATE_BATCH_SIZE_BYTES, throw the 
IllegalArgumentException in MutationState.throwIfTooBig rather than using 
MAX_MUTATION_SIZE as it currently does in most cases. (And do the same in any 
similar checks elsewhere.) 
2. If each individual mutation is smaller than the threshold, make sure we 
commit the requests in batches no larger than MUTATE_BATCH_SIZE_BYTES_ATTRIB, 
either in MutationState or in the UngroupedAggregateRegionObserver. (And during 
the transition, also apply the existing logic for row count until the 
deprecated properties are removed.) 

This way, when the deprecated properties are eventually removed, there's only 
one easy to understand knob -- MUTATE_BATCH_SIZE_BYTES_ATTRIB -- for guarding 
against giant WALEdits rather than two which could be misconfigured to be 
contradictory or nonsensical .

> Make mutable batch size bytes-based instead of row-based
> --------------------------------------------------------
>
>                 Key: PHOENIX-541
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-541
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 3.0-Release
>            Reporter: mujtaba
>            Assignee: Geoffrey Jacoby
>              Labels: newbie
>             Fix For: 4.10.0
>
>         Attachments: PHOENIX-541.patch
>
>
> With current configuration of row-count based mutable batch size, ideal value 
> for batch size is around 800 rather then current 15k when creating indexes 
> based on memory consumption, CPU and GC (data size: key: ~60 bytes, 14 
> integer column in separate CFs)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to