[
https://issues.apache.org/jira/browse/PHOENIX-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriel Reid resolved PHOENIX-412.
----------------------------------
Resolution: Fixed
Bulk resolve of closed issues imported from GitHub. This status was reached by
first re-opening all closed imported issues and then resolving them in bulk.
> Pipeline and buffer UPSERT SELECT to prevent writing results of SELECT to
> client
> --------------------------------------------------------------------------------
>
> Key: PHOENIX-412
> URL: https://issues.apache.org/jira/browse/PHOENIX-412
> Project: Phoenix
> Issue Type: Task
> Reporter: James Taylor
> Assignee: James Taylor
> Labels: enhancement
>
> A non limited SELECT currently runs in parallel, buffering the results on the
> client side. This works well in the typical use case of a selective WHERE
> clause (since the scan runs in parallel), but not so well otherwise. For
> UPSERT SELECT, a typical use case would be to create a new table based on an
> existing table. Often times, no WHERE clause will be present, thus causing us
> to write the entire table being selected on to the client machine, which is
> obviously bad.
> With secondary indexing coming in soon, and given that we use UPSERT SELECT
> to initially populate the index table, we should optimize this doing the
> following:
> * Modify ParallelIterators to be able to provide a factory to create the
> SpoolingResultIterator
> * In the case of UPSERT SELECT, create a spooling iterator that buffers the
> results into a MutationState (see existing code in UpsertCompiler:359 for
> upsert select run on client-side)
> * When the MutationState reaches the batch size limit, commit the batch
> (again as is done in UpsertCompiler) and clear the MutationState
> This will perform much better. Probably can just move the UpsertCompile code
> for this case into the new spooling iterator implementation.
--
This message was sent by Atlassian JIRA
(v6.2#6252)