[ 
https://issues.apache.org/jira/browse/PHOENIX-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885153#comment-13885153
 ] 

James Taylor edited comment on PHOENIX-10 at 1/29/14 9:02 AM:
--------------------------------------------------------------

Wanted to provide a bit more detail on how this could be implemented. It's 
fairly similar to the way we manage multiple aggregate functions in a single 
query and return all the values in a single KV:

- At compile time, while parsing the SELECT clause, keep a 
LinkedHashSet<KeyValueColumnExpression> of all the unique array index 
expressions in a query (i.e. that is, where a ArrayIndexExpression contains a 
KeyValueColumnExpression).
- Filter out of this list any occurrences of a KeyValueColumnExpression in the 
SELECT that returns the entire array (as in that case, you do need to return 
the entire array)
- Replace the remaining ArrayIndexExpressions in the LinkedHashSet with a 
different, new expression called something like ArrayIndexPositionalExpression 
which is constructed with the position of the KeyValueColumnExpression. It will 
use this as an index to look up the array element value in a known KeyValue 
(see below).
- Push this information through a Scan attribute for the ScanRegionObserver
- Post filter the List<KeyValue> that will be returned by removing any 
KeyValueColumnExpression (based on its cf:cq values). Add a new single, 
constant named KeyValue that appends (in order of the LinkedHashSet) the value 
of the element being looked up. We support this through our KeyValueSchema 
(which is used for both aggregation and hash joins)
- The expression you created and replaced the original ArrayIndexExpression 
will lookup the value by positional index using the KeyValueSchema.iterator 
method

That gives a somewhat more detailed way to implement this through our 
coprocessor as opposed to a filter.


was (Author: jamestaylor):
Wanted to provide a bit more detail on how this could be implemented. It's 
fairly similar to the way we manage multiple aggregate functions in a single 
query and return all the values in a single KV:

- At compile time, while parsing the SELECT clause, keep a 
LinkedHashSet<KeyValueColumnExpression> of all the unique array index 
expressions in a query (i.e. that is, where a ArrayIndexExpression contains a 
KeyValueColumnExpression).
- Filter out of this list any occurrences of a KeyValueColumnExpression in the 
SELECT that returns the entire array (as in that case, you do need to return 
the entire array)
- Replace the remaining ArrayIndexExpressions in the LinkedHashSet with a 
different, new expression called something like ArrayIndexPositionalExpression 
which is constructed with the position of the KeyValueColumnExpression. It will 
use this as an index to look up the array element value in a known KeyValue 
(see below).
- Push this information through a Scan attribute for the ScanRegionObserver
- Post filter the List<KeyValue> that will be returned by removing any 
KeyValueColumnExpression (based on its cf:cq values). Add a new single, 
constant named KeyValue that appends (in order of the LinkedHashSet) the value 
of the element being looked up. We support this through our KeyValueSchema 
(which is used for both aggregation and hash joins)
- The expression you created and replaced the original ArrayIndexExpression 
will lookup the value by positional index using the KeyValueSchema.iterator 
method

> Push projection of a single ARRAY element to the server
> -------------------------------------------------------
>
>                 Key: PHOENIX-10
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-10
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>
> If only a single array element is selected, we'll still return the entire 
> array back to the client. Instead, we should push this to the server and only 
> return the single array element. The same goes for the reference to an ARRAY 
> in the WHERE clause. There's a general HBase fix for this (i.e. the ability 
> to define a separate set of key values that will be returned versus key 
> values available to filters) that has a patch here, but is deemed not 
> possible to pull into the 0.94 branch by @lhofhansl.
> My thought is that we can add a Filter at the end our our filter chain that 
> filters out any KeyValues that aren't in the SELECT expressions (i.e. filter 
> out if a column is referenced in the WHERE clause, but not in the SELECT 
> expressions). This same Filter could handle returning only the elements of 
> the array that are referenced in the SELECT expression rather than the entire 
> array.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to