[
https://issues.apache.org/jira/browse/PHOENIX-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885153#comment-13885153
]
James Taylor edited comment on PHOENIX-10 at 1/29/14 9:02 AM:
--------------------------------------------------------------
Wanted to provide a bit more detail on how this could be implemented. It's
fairly similar to the way we manage multiple aggregate functions in a single
query and return all the values in a single KV:
- At compile time, while parsing the SELECT clause, keep a
LinkedHashSet<KeyValueColumnExpression> of all the unique array index
expressions in a query (i.e. that is, where a ArrayIndexExpression contains a
KeyValueColumnExpression).
- Filter out of this list any occurrences of a KeyValueColumnExpression in the
SELECT that returns the entire array (as in that case, you do need to return
the entire array)
- Replace the remaining ArrayIndexExpressions in the LinkedHashSet with a
different, new expression called something like ArrayIndexPositionalExpression
which is constructed with the position of the KeyValueColumnExpression. It will
use this as an index to look up the array element value in a known KeyValue
(see below).
- Push this information through a Scan attribute for the ScanRegionObserver
- Post filter the List<KeyValue> that will be returned by removing any
KeyValueColumnExpression (based on its cf:cq values). Add a new single,
constant named KeyValue that appends (in order of the LinkedHashSet) the value
of the element being looked up. We support this through our KeyValueSchema
(which is used for both aggregation and hash joins)
- The expression you created and replaced the original ArrayIndexExpression
will lookup the value by positional index using the KeyValueSchema.iterator
method
That gives a somewhat more detailed way to implement this through our
coprocessor as opposed to a filter.
was (Author: jamestaylor):
Wanted to provide a bit more detail on how this could be implemented. It's
fairly similar to the way we manage multiple aggregate functions in a single
query and return all the values in a single KV:
- At compile time, while parsing the SELECT clause, keep a
LinkedHashSet<KeyValueColumnExpression> of all the unique array index
expressions in a query (i.e. that is, where a ArrayIndexExpression contains a
KeyValueColumnExpression).
- Filter out of this list any occurrences of a KeyValueColumnExpression in the
SELECT that returns the entire array (as in that case, you do need to return
the entire array)
- Replace the remaining ArrayIndexExpressions in the LinkedHashSet with a
different, new expression called something like ArrayIndexPositionalExpression
which is constructed with the position of the KeyValueColumnExpression. It will
use this as an index to look up the array element value in a known KeyValue
(see below).
- Push this information through a Scan attribute for the ScanRegionObserver
- Post filter the List<KeyValue> that will be returned by removing any
KeyValueColumnExpression (based on its cf:cq values). Add a new single,
constant named KeyValue that appends (in order of the LinkedHashSet) the value
of the element being looked up. We support this through our KeyValueSchema
(which is used for both aggregation and hash joins)
- The expression you created and replaced the original ArrayIndexExpression
will lookup the value by positional index using the KeyValueSchema.iterator
method
> Push projection of a single ARRAY element to the server
> -------------------------------------------------------
>
> Key: PHOENIX-10
> URL: https://issues.apache.org/jira/browse/PHOENIX-10
> Project: Phoenix
> Issue Type: Improvement
> Reporter: James Taylor
>
> If only a single array element is selected, we'll still return the entire
> array back to the client. Instead, we should push this to the server and only
> return the single array element. The same goes for the reference to an ARRAY
> in the WHERE clause. There's a general HBase fix for this (i.e. the ability
> to define a separate set of key values that will be returned versus key
> values available to filters) that has a patch here, but is deemed not
> possible to pull into the 0.94 branch by @lhofhansl.
> My thought is that we can add a Filter at the end our our filter chain that
> filters out any KeyValues that aren't in the SELECT expressions (i.e. filter
> out if a column is referenced in the WHERE clause, but not in the SELECT
> expressions). This same Filter could handle returning only the elements of
> the array that are referenced in the SELECT expression rather than the entire
> array.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)