Hi Sean, Sweet! Thanks for the explanation. Much appreciated and very helpful. Just a bit more clarification, on an equality lookup, where the ‘foobar’ key has a value ‘barfoo’ that is the very low-cardinality, are those indexed objects individually stored as a key/value term which then is enumerated over, versus a key/list_of_values term? In other words, if I have 600K records where foobar=barfoo, will this be 600K reads, or 1 read with a list of 600K entries which are then enumerated over in memory?
Cheers, Bryan ---- Bryan Hughes Go Factory http://www.go-factory.net On Aug 29, 2014, at 6:46 AM, Sean Cribbs <s...@basho.com> wrote: > I made a minor mistake in my example, the PrimaryKey is part of the > index key, whereas the value contains nothing. It's more like this: > > {i, IndexName, IndexTerm, PrimaryKey} => <<>> > > So for the initial seek, we construct a key like so: > > {i, <<"foobar_bin">>, <<"baz">>, <<>>} > > On Fri, Aug 29, 2014 at 8:44 AM, Sean Cribbs <s...@basho.com> wrote: >> Hi Bryan, >> >> Index entries are just keys in LevelDB like normal values are. So, >> performance is relatively constant at write time but is O(N) at read >> (because you are scanning the index). The high-cardinality term will >> definitely be expensive to enumerate, but the low-cardinality terms >> will be much less so. The index entries in LevelDB look roughly like >> this: >> >> {i, IndexName, IndexTerm} => PrimaryKey >> >> Since the keys are encoded with sext, we encode the start key of the >> range (or the equality query), start an iterator and then seek to that >> start key and start reading values. When a key is reached that exceeds >> the range or equality query, we stop iterating. >> >> Of course, that is an oversimplification, there are issues with >> backpressure from the request coordinator process against the >> iterators, streaming merge sort if you want the results back in order >> or paginated, etc. Also, all 2I queries are run via coverage, which >> ensures that the entire keyspace is "covered" but hits most if not all >> nodes in the cluster. >> >> On Thu, Aug 28, 2014 at 9:18 PM, Bryan <br...@go-factory.net> wrote: >>> Hi Everyone, >>> >>> Apologies as this has probably been asked before. Unfortunately I have not >>> been able to parse through the list serve to find a reasonable answer and >>> the Basho wiki docs seem to be missing this information. I have read up on >>> the secondary index docs. >>> >>> I am interested to better understand how the secondary indexes perform when >>> there is a very low distribution of values that are indexed. For example, >>> lets say I have a bucket with 1 million objects that I create a secondary >>> index on. Now lets say the index is on a value that has an uneven >>> distribution where one of the values is not selective while the others are, >>> such that 60% of the values fall into a single indexed value, while the >>> remaining 40% have a good distribution. >>> >>> For example, I have a record (i.e. object) where the indexed field is >>> ‘foobar_bin'. I have 1 million objects in the bucket that have 100 unique >>> ‘foobar’ values distributed over the 1 million objects. One of the values >>> repeats for 60% of the records (600K) and the rest have an even distribution >>> of about 4%. >>> >>> How will the secondary indexes perform with this and is this an appropriate >>> use of the secondary indexes? Finally, what I have read is not completely >>> clear on what happens if the indexed value is updated when the value has >>> such a low degree of selectivity? >>> >>> We have less than 512 partitions and are using the erlang client. >>> >>> Thanks in advance - any insights will be much appreciated! >>> >>> >>> Cheers, >>> Bryan >>> >>> ---- >>> >>> Bryan Hughes >>> Go Factory >>> http://www.go-factory.net >>> >>> _______________________________________________ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >> >> >> >> -- >> Sean Cribbs <s...@basho.com> >> Software Engineer >> Basho Technologies, Inc. >> http://basho.com/ > > > > -- > Sean Cribbs <s...@basho.com> > Software Engineer > Basho Technologies, Inc. > http://basho.com/ _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com