[ 
https://issues.apache.org/jira/browse/PHOENIX-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981913#comment-13981913
 ] 

James Taylor commented on PHOENIX-36:
-------------------------------------

One other thought I had that is somewhat related to this one: frequently folks 
mention running into timeouts occurring when scans over multiple regions are 
performed. For example, at SFDC we want to set the timeout down pretty low so 
we detect when a server is down pretty quickly. However, the negative side of 
this is that scans over an entire region will time out. It'd be good to have 
some way of dividing up a scan that spans multiple regions even when we 
normally wouldn't with our parallel chunking logic to prevent these timeouts. 
Another frequent timeout users hit is for index building for an existing big 
table. Again, the same holds true: if we divided the scan up into smaller 
segments, these timeouts wouldn't occur.

> Parallel Scaling
> ----------------
>
>                 Key: PHOENIX-36
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-36
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>
> Right now the parallel scaling is defined by a constant (I think 32) that 
> defines the number of threads/splits that can drive a single query.
> This number might be too large for a small cluster and too small for a large 
> cluster; and this value should change as a cluster grows.
> One idea is to instead have a "scaling number". This would be a floating 
> point number define the the number of threads to use per involved 
> RegionServer.
> Say a query touches 10 RegionServers, than a scaling factor
> * of 1.0 would mean 10 threads
> * 0.1 means 1 thread
> * 10.0 means 100 thread
> * etc
> That way one can define the cost of a query in terms of cluster resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to