Re: newbie questions: sorted keys, ADT primitives, and link manipulation

Eric Moritz Sun, 16 Jan 2011 00:45:56 -0800

I have also been toying with the idea of a linked list using a "next"
links on entries.  Theoretically I could use link walking to get the
next 19 entries given a "HEAD" entry.  It feels like a maintenance
nightmare though.


On Sat, Jan 15, 2011 at 7:23 PM, Gary William Flake <g...@flake.org> wrote:
> I am building a backend for a web service and Riak looks to be a strong fit
> for my needs at this point.  However, there is one really simple requirement
> that I can't figure out how to implement on Riak with any sort of
> efficiency.  To simplify the question, suppose that I want a twitter-like
> service that has billions of smallish documents that enter into the backend
> over a period of time.  Typical read pattern will be:
> 1. List the last N documents
> 2. Given a document, see the N that came before this one.
> 3. From a random point, see the N documents before this one.
> Hence, what I really want is to be able to access the documents in sorted
> order by timestamp.  I know that I can do all of this with a map/reduce job,
> but this solution literally requires the backend to scan each and every
> document which seems less than ideal.
> Is there a generally accepted "right" way to do this on Riak?  Or is there
> no known way to do this gracefully, and I need to be prepared to roll my own
> secondary indices?  Or, can this be handled with a Riak Search range query?
> ###
> The second topic is concerned with finding an elegant way to address the
> first question.  One way to solve the problem above is to use a bucket as a
> poor man's list.  At the application layer, when I insert a new document, I
> could create a new k:v pair in my "list" bucket with the key being an
> integer corresponding to this document's position in the corpus, and the
> value being the actual key in the document bucket.  With this scheme, I
> could satisfy the scenario above with the following observations:
> 1. Once I have the list key for a document, I could find the N before it in
> O(N) time which is an improvement over the map/reduce solution which is
> O(size of corpus) / O(cluster size).
> 2. This solution assumes atomicity that doesn't exist in practice (e.g., the
> operations "new id <- number of docs", and "add a new doc with key new id"
> have to complete as an atomic operation if multiple clients are attempting
> an insert).
> 3. This solution also wants a multiget styled function to map index values
> to true document keys, and then to get the resulting document.
> I know what you are thinking: just use a serialized JSON array as the value
> of the list that you want, then just get it, deserialize it, mutate it, then
> update its value.  But this is horrific because each insertion takes
> O(corpus size) time.
> Bottom line: life sucks no matter which route we take.
> What I think is needed are redis styled objects that support O(1) atomic
> operations.  For example, if I simply had a k:v pair where the value was a
> list which supported insert/delete front/rear, then we are golden.
> ###
> This brings me to my third question / observation.  In scanning over the
> archives of this list, I see that others have requested this feature before,
> but I haven't seen any mention of the feature being considered.  (Please,
> please!  I hope I am wrong).
> So, I want to offer what I think is the minimal feature request that
> supports the scenarios that I am trying to achieve.  Basically, I want to
> build off of links, allowing them to generalize into container types.  My
> assumptions:
> 1. Links have a natural order in that they alway come out in the same order
> that they are created (at least that's what the python client API states,
> but I can't confirm it in the Riak docs).
> 2. Links seem to support mutation operations, meaning that I can add a link
> and remove a link in O(1) time (and not in time proportional to the number
> of links on the record).
> If both of these are true, then all that is needed is to add simple O(1)
> mutation operations allowing one to treat a records links as a special
> container that references to other records.  Insert/delete front/rear will
> give you stacks and queues.  A small addition to the API would support sets.
> ###
> That's all for now.  I hope someone can chime in with something to add on
> top.  My real hope is that I've misunderstood the capabilities of Riak and
> that what I want to do can be done today.  But if that's not the case, I am
> curious if others have the same needs and if they know how to work around
> them.
> Thanks,
> -- GWF
>
>
>
>
>
>
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: newbie questions: sorted keys, ADT primitives, and link manipulation

Reply via email to