[
https://issues.apache.org/jira/browse/PHOENIX-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145019#comment-14145019
]
Lars Hofhansl commented on PHOENIX-1278:
----------------------------------------
bq. how would you characterize the difference on the server between say doing
10 scans over 1/10 of the data versus 2 scans over 1/2?
Do you mean 10 scans in parellel vs. 2 scans in parallel? Or just breaking up a
scan into 10 chunks vs 2 and executing them serially?
In the former case it depends on the whether the machines is already busy and
how resource (in terms of CPU and IO) are available. If CPU and IO are
available 10 scans in parallel are faster - just to state the obvious.
In the latter case it almost does not matter. Just one extra seek per scan and
region. Obviously with smaller chunks you better predictability in the latency
of the server work.
In this case, how many more chunks need be merge with vs. without the patch?
> Performance degradation for salted tables with guideposts
> ---------------------------------------------------------
>
> Key: PHOENIX-1278
> URL: https://issues.apache.org/jira/browse/PHOENIX-1278
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
> Assignee: Anoop Sam John
>
> When a table is salted, we're seeing a degradation in performance using our
> new guidepost-based parallelization. With salted tables, we do a merge sort
> with the results from all the parallel scans. I suspect the cause here is
> that we're doing a merge sort now between more chunks than before (since we
> chunk everything up more now than we used to). We should group the scans
> we're doing for the same bucket together and do a concat with those results
> and then do a merge sort only with the concatenated batches.
> Pls revert PHOENIX-1279 when we implement this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)