[ 
https://issues.apache.org/jira/browse/PHOENIX-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15716203#comment-15716203
 ] 

James Taylor commented on PHOENIX-541:
--------------------------------------

This sounds like a good approach, [~gjacoby]. Here's some feedback:
- Search for all occurrences of QueryServices.MUTATE_BATCH_SIZE_ATTRIB and make 
sure we're using the new MUTATE_BATCH_SIZE_BYTES_ATTRIB to track when to 
write/commit. There are times when we don't go through MutationState (in 
particular in UngroupedAggregateRegionObserver which is the code path when auto 
commit is on).
- Deprecate JDBCUtil.getMutateBatchSize(), 
QueryServices.MUTATE_BATCH_SIZE_ATTRIB, and any related methods.
- Change the default we have for QueryServicesOptions.DEFAULT_MUTATE_BATCH_SIZE 
to Integer.MAX_VALUE and for b/w compat, track both bytes and row count and 
send/write the mutations if either of them is met.
- Create a good default value for MUTATE_BATCH_SIZE_BYTES_ATTRIB instead of 
using Long.MAX_VALUE.
- Make similar changes QueryServices.MAX_MUTATION_SIZE_ATTRIB - making it 
byte-based instead of row-count-based. Usage of this config parameter would be 
isolated to MutationState, I believe. We should be able to come up with an 
accurate size based on the underlying Mutation and/or Delete info we store in 
PRowImpl.
- Have a reasonable (smaller) default for the new 
QueryServices.MAX_MUTATION_SIZE_BYTES_ATTRIB

> Make mutable batch size bytes-based instead of row-based
> --------------------------------------------------------
>
>                 Key: PHOENIX-541
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-541
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 3.0-Release
>            Reporter: mujtaba
>            Assignee: Geoffrey Jacoby
>              Labels: newbie
>             Fix For: 4.10.0
>
>         Attachments: PHOENIX-541.patch
>
>
> With current configuration of row-count based mutable batch size, ideal value 
> for batch size is around 800 rather then current 15k when creating indexes 
> based on memory consumption, CPU and GC (data size: key: ~60 bytes, 14 
> integer column in separate CFs)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to