I'm a little late to the party, but the way I've been handling
marshaling is using an explicit map/reduce phase to perform the
marshaling and/or data massaging. You can chain map phases together by
using the special bucket/key pair {none,none} and passing the
intermediate data via the KeyData. This also makes the phases more
portable if you wish to re-use them in other situations. I wrote a
blog post about chaining phases a while back, which might be useful:
http://cartesianfaith.wordpress.com/2011/07/27/mapreduce-tips-and-tricks-in-riak/HTH, Brian On Tue, Aug 23, 2011 at 10:01 AM, Jeremiah Peschka <[email protected]> wrote: > On Aug 22, 2011, at 8:50 PM, bill robertson wrote: > >> I wonder if it would be feasible to deploy an erlang web-service in the riak >> node's webmachine instance that could translate meta-data into Erlang funs >> and drive the map reduce operation that way. I'm not sure if I could get >> around having specific knowledge of the protobuf structures baked into that >> code, but I don't think it matters in this case. >> >> I also wonder how much 1.0 will change this picture. >> >> > Additionally, are secondary indexes meta-data? i.e. If I built some >> > secondary indices, these are stored in some form internal to Riak, and >> > therefore available for query regardless of the type of data its >> > associated with. Is this correct? >> >> Secondary indexes are a separate physical structure, or so I gather. (Rusty >> could be full of lies.) They're stored separately from the initial data and >> not as metadata in the object headers. So, yes, you can store whatever you >> want in secondary indexes and query it however you want, provided there's an >> API that supports what you're doing. >> >> Would secondary indexes eliminate the need for key-filtering? Logically, it >> would seem that you could do with indexes, but do they have similar >> performance characteristics? (i.e. does one suck more than the other?) > > Key filters will always perform a list-keys operation. Meaning that they > result in an in memory scan of all keys in the key space. > > Not knowing entirely how indexes are implemented internally (reading the > source is on my TO DO list), I can only guess from my experience with other > databases how this would work. Indexes generally work best when you have a > low search cardinality - when you're seeking only a few records from the > index. As long as you can structure secondary indexes to answer the questions > you're asking, then indexes make it easy to perform fast queries. > > The difference comes in based on your storage mechanism. With bitcask, all > keys are in memory so that list-keys scan only happens between RAM and CPU > and isn't THAT expensive of an operation. If indexes are not a memory > resident structure, then a scan of an index (when you're doing a search > that's some kind of substring or ends with operation) will be painfully slow > - much like when you have to perform a table scan in an RDBMS. > > The upside of key filtering, and composite key names in general, is that you > can create meaningful keys that you can assemble on the fly. e.g. To get > yesterday's trades of Ford stock in the NYSE, (assuming you have a trades > bucket) you could get at yesterday's trading history through something like > http://my_riak_server:8091/riak/trades/NYSE:F:20110822 Being able to perform > ad hoc seeks like that is really powerful. > > TL;DR - key filters and secondary indexes serve different purposes. > >> >> Thanks again, >> Bill Robertson > > > --- > Jeremiah Peschka - Founder, Brent Ozar PLF, LLC > Microsoft SQL Server MVP > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
