[ 
https://issues.apache.org/jira/browse/PHOENIX-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229410#comment-17229410
 ] 

ASF GitHub Bot commented on PHOENIX-5998:
-----------------------------------------

kadirozde commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-724872173


   > @kadirozde overall looks like a great improvement. I have added a few 
comments. Some questions:
   > 
   > 1. Is it more beneficial to have paging based on row size rather than 
number of rows, since each row can be arbitrarily large?
   > 2. Server-side pagination will help _reduce_ the chance of the race 
conditions mentioned in the Jira description, but does not aim at eliminating 
them, correct?
   > 3. Though this is aimed at such race conditions related to mutations 
(server-side UPSERT SELECT/DELETE), it seems like it will also affect the 
normal read path for non-Group_By aggregate queries. Is there any negative 
effect/extra slowness during reads due to this pagination, and if yes, do we 
want to make sure that changes only affect the write paths?
   > 
   > Let's also please add some tests for this.
   
   1. Not sure about it but we can introduce additional constraints like the 
total size of scanned bytes as you suggested to further improve this feature 
later. 
   2. This is correct. By itself, it does not eliminate. However, the client 
can wait for all the page operation to complete or fail before returning to the 
application, as an additional improvement. This will further reduce the race 
conditions. I think we have to enforce the client side timestamp to make the 
race almost impossible.
   3. I expect this feature will improve the overall performance and 
availability since paging limits the memory usage and the time to hold server 
resources. My experience with paging on a real cluster is very positive.  I 
have not seen any negative impact yet as long as the page size is not very 
small (e.g., less than 1000).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Paged server side ungrouped aggregate operations 
> -------------------------------------------------
>
>                 Key: PHOENIX-5998
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5998
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Kadir OZDEMIR
>            Assignee: Kadir OZDEMIR
>            Priority: Major
>             Fix For: 4.16.0
>
>         Attachments: PHOENIX-5998.4.x.001.patch, PHOENIX-5998.4.x.002.patch, 
> PHOENIX-5998.4.x.003.patch
>
>
> Phoenix provides the option of performing upsert select and delete query 
> operations on the client or server side.  This is decided by the Phoenix 
> optimizer based on configuration parameters. For the server side option, the 
> table operation (upsert select/delete query) is parallelized such that 
> multiple table regions are scanned and the mutations derived from these scans 
> can also be executed in parallel on the server side. However, currently there 
> is no paging capability and the server side operation can take long enough 
> lead to HBase client timeouts. When this happens, Phoenix can return failure 
> to its applications and the rest of the parallel scans and mutations on the 
> server side can still continue since  Phoenix has no mechanism in place to 
> stop these operations before returning failure to applications. This can 
> create unexpected race conditions between these left-over operations and the 
> new operations issued by applications. Putting a limit on the number of rows 
> to be processed within a single RPC call (i.e., the next operation on the 
> scanner) on the server side using a Phoenix level paging is highly desirable 
> and a required step to prevent the possible race conditions. This paging 
> mechanism has been already implemented for index rebuild and verification 
> operations and proven to be effective to prevent timeouts. This paging can be 
> implemented for all server side operations including aggregates, upsert 
> selects, delete queries and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to