[ 
https://issues.apache.org/jira/browse/PHOENIX-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudarshan Kadambi updated PHOENIX-2606:
---------------------------------------
    Description: 
Phoenix should look to support a cursor model where the user could set the 
fetch size to limit the number of rows that are fetched in each batch. Each 
batch of result rows would be accompanied by a flag indicating if there are 
more rows to be fetched for a given query or not. 

The state management for the cursor could be done in the client side or server 
side (i.e. HBase, not the Query Server). The client side state management could 
involve capturing the last key in the batch and using that as the start key for 
the subsequent scan operation. The downside of this model is that if there were 
any intervening inserts or deletes in the result set of the query, backtracking 
on the cursor would reflect these additional rows (consider a page down, 
followed by a page up showing a different set of result rows). Similarly, if 
the cursor is defined over the results of a join or an aggregation, these 
operations would need to be performed again when the next batch of result rows 
are to be fetched. 

So an alternate approach could be to manage the state server side, wherein 
there is a query context area in the Regionservers (or, maybe just a temporary 
table) and the cursor results are fetched from there. This ensures that the 
cursor has snapshot isolation semantics. I think both models make sense but it 
might make sense to start with the state management completely on the client 
side.

  was:
Phoenix should look to support a cursor model where the user could set the 
fetch size to limit the number of rows that are fetched in each batch. Each 
batch of result rows would be accompanied by a flag indicating if there are 
more rows to be fetched for a given query or not. 

The state management for the cursor could be done in the client side or server 
side (i.e. HBase, not the Query Server). The client side state management could 
involve capturing the last key in the batch and using that as the start key for 
the subsequent scan operation. The downside of this model is that if there were 
any intervening inserts or deletes in the result set of the query, backtracking 
on the cursor would reflect these additional rows (consider a page down, 
followed by a page up showing a different set of result rows). Similarly, if 
the cursor is defined over the results of a join or an aggregation, these 
operations would need to be performed again when the next batch of result rows 
are to be fetched. So an alternate approach could be to manage the state server 
side, wherein there is a query context area in the Regionservers (or, maybe 
just a temporary table) and the cursor results are fetched from there. This 
ensures that the cursor has snapshot isolation semantics. I think both models 
make sense but it might make sense to start with the state management 
completely on the client side.


> Cursor support in Phoenix
> -------------------------
>
>                 Key: PHOENIX-2606
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2606
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Sudarshan Kadambi
>
> Phoenix should look to support a cursor model where the user could set the 
> fetch size to limit the number of rows that are fetched in each batch. Each 
> batch of result rows would be accompanied by a flag indicating if there are 
> more rows to be fetched for a given query or not. 
> The state management for the cursor could be done in the client side or 
> server side (i.e. HBase, not the Query Server). The client side state 
> management could involve capturing the last key in the batch and using that 
> as the start key for the subsequent scan operation. The downside of this 
> model is that if there were any intervening inserts or deletes in the result 
> set of the query, backtracking on the cursor would reflect these additional 
> rows (consider a page down, followed by a page up showing a different set of 
> result rows). Similarly, if the cursor is defined over the results of a join 
> or an aggregation, these operations would need to be performed again when the 
> next batch of result rows are to be fetched. 
> So an alternate approach could be to manage the state server side, wherein 
> there is a query context area in the Regionservers (or, maybe just a 
> temporary table) and the cursor results are fetched from there. This ensures 
> that the cursor has snapshot isolation semantics. I think both models make 
> sense but it might make sense to start with the state management completely 
> on the client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to