[ https://issues.apache.org/jira/browse/PHOENIX-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gabriel Reid resolved PHOENIX-309. ---------------------------------- Resolution: Fixed Bulk resolve of closed issues imported from GitHub. This status was reached by first re-opening all closed imported issues and then resolving them in bulk. > Support ARRAY data type > ----------------------- > > Key: PHOENIX-309 > URL: https://issues.apache.org/jira/browse/PHOENIX-309 > Project: Phoenix > Issue Type: Task > Affects Versions: 3.0-Release > Reporter: James Taylor > Assignee: ramkrishna > Labels: enhancement > > Benchmark outline from Sudarshan Kadambi which describes a good use case for > supporting an ARRAY data type. Supporting an ARRAY data type, an UNNEST > built-in function, plus derived tables would allow the following standard > query to be used (as opposed to creating custom aggregate functions): > select avg(v) > from (select unnest(value) v from t > where object_id in (O1,O2,...O250K) and field_type = 'F1' and > attrib_id = 'A1') > It'd be nice if you could just do an average over an array directly, but this > would be non standard. > On 04/26/2013 11:17 AM, Sudarshan Kadambi (BLOOMBERG) wrote: > Hi James: > Yes, I saw the email. Thank you for this generous offer. I wanted some time > to make sure the benchmark correctly represents my use case. > If you wish, here's a benchmark setup you could use: > 1. 1 Billion keys in table > 2. Query contains 250K randomly chosen object_ids and a single field_type (F1) > 3. Aggregation done on a single attribute (A1) > 4. Query: > select avg(value) where object_id in (O1,O2,...O250K) and field_type = > 'F1' and attrib_id = 'A1' > We would want the test done with and without the skip scan filter for the > purpose of comparison. > The reason why I wanted some time to think about it is that the values within > each attribute is a JSON number array. So an avg across 2 values is a average > of the averages. > For e.g. Object_id=O1, Field_type=F1, Attrib_id=A1, Value 1: {1,2,3,4,5,} > Object_id=O2, Field_type=F1, Attrib_id=A1, Value 2: {1,2,3,4,5} > The query should produce: Avg{Avg{Value1}, Avg{Value2}} = Avg{3,3} = 3 > If we were doing a sum, the query would produce: Sum{Avg{Value1}, Avg{Value2} > = Sum{3,3} = 6. > This might require customization of the aggregate function code. -- This message was sent by Atlassian JIRA (v6.2#6252)