Re: Collecting unique index values

Ryan Zezeski Thu, 02 Feb 2012 12:40:04 -0800

Sorry, I misunderstood you.  Yes, a range over the flavor index which you
know to include all flavors will work.  You can also use the undocumented
$bucket feature to get all keys in a specific bucket, e.g.


http://localhost:8098/buckets/test/index/$bucket/$bucket

riakc_pb_socket:get_index(Pid, <<"test">>, <<"$bucket">>, <<"$bucket">>)

This will traverse the minimal amount of data required to build the key
list.  You still need to get the value (or just the metadata since that's
where indexes are kept) since this only gives you the keys.  I would use
the query as input to map/reduce to build the distinct list.

On Thu, Feb 2, 2012 at 1:23 PM, Carl <[email protected]> wrote:

>  Actually it is even easier than that.  I am not trying to make a
> histogram.  I just need to find out what the various flavors are.  The
> flavors are not predefined - they are attributes of the data records - so I
> can't ask for them specifically as in your second suggestion.   The SQL
> equivalent would be something like:
>    SELECT DISTINCT FLAVOR FROM BUCKET
> I am just looking for a faster way to do it that does not slow down if
> there a lot of *other* buckets in the whole db.
>
> Since new records are added rarely, and in batches, it it not outrageous
> to have to sweep through the flavor index at that time to regenerate the
> summary information.   By using a range query, the impact on the database
> should be less than if I got all the keys from the bucket, shouldn't it?
> Because indexes are specific to one bucket as I understand it.
>
>
> On 2/2/2012 9:40 AM, Ryan Zezeski wrote:
>
> Carl,
>
>  There is currently no GROUP BY or aggregate function support for Riak
> indexes.  I think range query fed into map/reduce is a good option.
>  Another option is to run a 2i term query for each flavor.  Although, I
> would make sure to measure both.  Also, since you are storing the histogram
> in a meta-object that the application will read you don't have to worry
> about the query runtime affecting your application directly.
>
>  -Ryan
>
> On Wed, Feb 1, 2012 at 5:23 PM, Carl <[email protected]> wrote:
>
>> I have a bucket containing about 1,000 records.  These records are
>> indexed on a property let's call "flavor".  There are only about 10
>> different flavors.
>>
>> Is there a way I could query the flavor_bin index to discover what these
>> unique values are?  The brute-force method would be a map-reduce doing a
>> range query on 'flavor_bin' from 'AAAA' to 'zzzz' and then boiling that
>> down to the unique values in the reduce phase.  But that would have to
>> retrieve every record in the bucket.
>>
>> Is there some meta-data about the index I could query more efficiently
>> to do this?
>>
>> This is not something I would need to do very often, and I would store
>> the resulting list in one record for actual use by the application.
>>
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Collecting unique index values

Reply via email to