[
https://issues.apache.org/jira/browse/PHOENIX-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229410#comment-17229410
]
ASF GitHub Bot commented on PHOENIX-5998:
-----------------------------------------
kadirozde commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-724872173
> @kadirozde overall looks like a great improvement. I have added a few
comments. Some questions:
>
> 1. Is it more beneficial to have paging based on row size rather than
number of rows, since each row can be arbitrarily large?
> 2. Server-side pagination will help _reduce_ the chance of the race
conditions mentioned in the Jira description, but does not aim at eliminating
them, correct?
> 3. Though this is aimed at such race conditions related to mutations
(server-side UPSERT SELECT/DELETE), it seems like it will also affect the
normal read path for non-Group_By aggregate queries. Is there any negative
effect/extra slowness during reads due to this pagination, and if yes, do we
want to make sure that changes only affect the write paths?
>
> Let's also please add some tests for this.
1. Not sure about it but we can introduce additional constraints like the
total size of scanned bytes as you suggested to further improve this feature
later.
2. This is correct. By itself, it does not eliminate. However, the client
can wait for all the page operation to complete or fail before returning to the
application, as an additional improvement. This will further reduce the race
conditions. I think we have to enforce the client side timestamp to make the
race almost impossible.
3. I expect this feature will improve the overall performance and
availability since paging limits the memory usage and the time to hold server
resources. My experience with paging on a real cluster is very positive. I
have not seen any negative impact yet as long as the page size is not very
small (e.g., less than 1000).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Paged server side ungrouped aggregate operations
> -------------------------------------------------
>
> Key: PHOENIX-5998
> URL: https://issues.apache.org/jira/browse/PHOENIX-5998
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Kadir OZDEMIR
> Assignee: Kadir OZDEMIR
> Priority: Major
> Fix For: 4.16.0
>
> Attachments: PHOENIX-5998.4.x.001.patch, PHOENIX-5998.4.x.002.patch,
> PHOENIX-5998.4.x.003.patch
>
>
> Phoenix provides the option of performing upsert select and delete query
> operations on the client or server side. This is decided by the Phoenix
> optimizer based on configuration parameters. For the server side option, the
> table operation (upsert select/delete query) is parallelized such that
> multiple table regions are scanned and the mutations derived from these scans
> can also be executed in parallel on the server side. However, currently there
> is no paging capability and the server side operation can take long enough
> lead to HBase client timeouts. When this happens, Phoenix can return failure
> to its applications and the rest of the parallel scans and mutations on the
> server side can still continue since Phoenix has no mechanism in place to
> stop these operations before returning failure to applications. This can
> create unexpected race conditions between these left-over operations and the
> new operations issued by applications. Putting a limit on the number of rows
> to be processed within a single RPC call (i.e., the next operation on the
> scanner) on the server side using a Phoenix level paging is highly desirable
> and a required step to prevent the possible race conditions. This paging
> mechanism has been already implemented for index rebuild and verification
> operations and proven to be effective to prevent timeouts. This paging can be
> implemented for all server side operations including aggregates, upsert
> selects, delete queries and so on.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)