I have also been toying with the idea of a linked list using a "next" links on entries. Theoretically I could use link walking to get the next 19 entries given a "HEAD" entry. It feels like a maintenance nightmare though.
On Sat, Jan 15, 2011 at 7:23 PM, Gary William Flake <g...@flake.org> wrote: > I am building a backend for a web service and Riak looks to be a strong fit > for my needs at this point. However, there is one really simple requirement > that I can't figure out how to implement on Riak with any sort of > efficiency. To simplify the question, suppose that I want a twitter-like > service that has billions of smallish documents that enter into the backend > over a period of time. Typical read pattern will be: > 1. List the last N documents > 2. Given a document, see the N that came before this one. > 3. From a random point, see the N documents before this one. > Hence, what I really want is to be able to access the documents in sorted > order by timestamp. I know that I can do all of this with a map/reduce job, > but this solution literally requires the backend to scan each and every > document which seems less than ideal. > Is there a generally accepted "right" way to do this on Riak? Or is there > no known way to do this gracefully, and I need to be prepared to roll my own > secondary indices? Or, can this be handled with a Riak Search range query? > ### > The second topic is concerned with finding an elegant way to address the > first question. One way to solve the problem above is to use a bucket as a > poor man's list. At the application layer, when I insert a new document, I > could create a new k:v pair in my "list" bucket with the key being an > integer corresponding to this document's position in the corpus, and the > value being the actual key in the document bucket. With this scheme, I > could satisfy the scenario above with the following observations: > 1. Once I have the list key for a document, I could find the N before it in > O(N) time which is an improvement over the map/reduce solution which is > O(size of corpus) / O(cluster size). > 2. This solution assumes atomicity that doesn't exist in practice (e.g., the > operations "new id <- number of docs", and "add a new doc with key new id" > have to complete as an atomic operation if multiple clients are attempting > an insert). > 3. This solution also wants a multiget styled function to map index values > to true document keys, and then to get the resulting document. > I know what you are thinking: just use a serialized JSON array as the value > of the list that you want, then just get it, deserialize it, mutate it, then > update its value. But this is horrific because each insertion takes > O(corpus size) time. > Bottom line: life sucks no matter which route we take. > What I think is needed are redis styled objects that support O(1) atomic > operations. For example, if I simply had a k:v pair where the value was a > list which supported insert/delete front/rear, then we are golden. > ### > This brings me to my third question / observation. In scanning over the > archives of this list, I see that others have requested this feature before, > but I haven't seen any mention of the feature being considered. (Please, > please! I hope I am wrong). > So, I want to offer what I think is the minimal feature request that > supports the scenarios that I am trying to achieve. Basically, I want to > build off of links, allowing them to generalize into container types. My > assumptions: > 1. Links have a natural order in that they alway come out in the same order > that they are created (at least that's what the python client API states, > but I can't confirm it in the Riak docs). > 2. Links seem to support mutation operations, meaning that I can add a link > and remove a link in O(1) time (and not in time proportional to the number > of links on the record). > If both of these are true, then all that is needed is to add simple O(1) > mutation operations allowing one to treat a records links as a special > container that references to other records. Insert/delete front/rear will > give you stacks and queues. A small addition to the API would support sets. > ### > That's all for now. I hope someone can chime in with something to add on > top. My real hope is that I've misunderstood the capabilities of Riak and > that what I want to do can be done today. But if that's not the case, I am > curious if others have the same needs and if they know how to work around > them. > Thanks, > -- GWF > > > > > > > > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com