[
https://issues.apache.org/jira/browse/PHOENIX-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17221095#comment-17221095
]
ASF GitHub Bot commented on PHOENIX-5998:
-----------------------------------------
kadirozde commented on pull request #936:
URL: https://github.com/apache/phoenix/pull/936#issuecomment-716946162
> @kadirozde The changes are substantial and I will need some heads-down
time to review them. If it is urgent, please feel free to rely on other's
reviews and don't wait for me. I plan on taking a look in detail within the
next couple of days.
No, it is not urgent. I was going to start working on PHOENIX-6207 and there
is some dependency between them and so I wanted to push this before starting
the other. But it is okay and I do not have to wait for this PR to be checked
in. Please take your time.
Yes, the changes are substantial but mostly mechanic. The core of the change
is that instead of scanning the entire table region in the postScannerOpen hook
and returning the result of the aggregate operation for the entire table region
in one result iteration, this PR just returns a region scanner (i.e., an new
scanner called UngroupedAggregateRegionScanner) in the postScannerOpen hook for
the UngroupedAggregateRegionObserver coproc, and then applies the aggregate
operation on a chunk (i.e, page) of a table region in each result iteration.
This means the client needs to do many iterations in order to process a table
region and aggregate the results of these pages on the client side. Please note
that previously, the client needed to aggregate the results of server side
aggregations, one for each table region (not for each table region page). Hope
this helps.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Paged server side ungrouped aggregate operations
> -------------------------------------------------
>
> Key: PHOENIX-5998
> URL: https://issues.apache.org/jira/browse/PHOENIX-5998
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Kadir OZDEMIR
> Assignee: Kadir OZDEMIR
> Priority: Major
> Fix For: 4.x
>
> Attachments: PHOENIX-5998.4.x.001.patch, PHOENIX-5998.4.x.002.patch,
> PHOENIX-5998.4.x.003.patch
>
>
> Phoenix provides the option of performing upsert select and delete query
> operations on the client or server side. This is decided by the Phoenix
> optimizer based on configuration parameters. For the server side option, the
> table operation (upsert select/delete query) is parallelized such that
> multiple table regions are scanned and the mutations derived from these scans
> can also be executed in parallel on the server side. However, currently there
> is no paging capability and the server side operation can take long enough
> lead to HBase client timeouts. When this happens, Phoenix can return failure
> to its applications and the rest of the parallel scans and mutations on the
> server side can still continue since Phoenix has no mechanism in place to
> stop these operations before returning failure to applications. This can
> create unexpected race conditions between these left-over operations and the
> new operations issued by applications. Putting a limit on the number of rows
> to be processed within a single RPC call (i.e., the next operation on the
> scanner) on the server side using a Phoenix level paging is highly desirable
> and a required step to prevent the possible race conditions. This paging
> mechanism has been already implemented for index rebuild and verification
> operations and proven to be effective to prevent timeouts. This paging can be
> implemented for all server side operations including aggregates, upsert
> selects, delete queries and so on.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)