Re: Riak Search VS other query systems
Re. Riak pipes. What's the latest regarding accessing the pipe framework? Haven't heard t much about it lately, admittedly haven't been listening t hard either. The thought would be to do stormish stream processing in situ. @siculars http://siculars.posthaven.com Sent from my iRotaryPhone On Aug 20, 2014, at 22:28, Sargun Dhillon sar...@sargun.me wrote: I second John's opinions. Generally, I would have have one key which is the secondary index, being an observe-remove OR-Set (or a relevant type for your application, be a register, g-set, or a plain old OR-set) pointing to back to the keys. Unfortunately, this mechanism can become quite unwieldy in when you have a term with a high cardinality. Now, moving onto the Twitter use case, you care a lot about the speed. With this strategy, if you're doing this from a client where you (1) read the 2i OR-set, and then (2) read the keys, that can be expensive as you have to read the entire 2i value back to the Riak client before reading any of the keys. An example, the hashtag #beiber, would have high cardinality would result in a super large value, and reading that back over the network would be less than awesome. Also, having to pass this value around disterl would be poor. Fortunately, the folks at Basho have invented riak_pipe. Riak_pipe is a method to allow running the read locally on the node the 2i lives on, and then streaming reads for all of those keys to the nodes that they live on, and then all back to the reader. It's actually the framework that Riak MR uses under the hood. Talk: https://vimeo.com/53910999 (there might be newer ones as well) Docs: https://github.com/basho/riak_pipe Also, to deal with high-cardinality values, there are a variety of work arounds, such as sharding the secondary index to some known set of keys, and doing a read across these list of keys. Also, you can postfix a nonce to the 2i-key, and ensure that they all end up on one node (custom hashing function), or a subset of nodes, and utilize leveldb's key iteration over a range to handle this. The general patterns I like for 2i atop Riak is to specialize Peter Bailis, from UC Berkeley's work for RAMPs. If you build the framework for this, it'll be all sorts of useful in the future. One warning is that there is no easy way to garbage collect in Riak today. Paper: http://www.bailis.org/papers/ramp-sigmod2014.pdf Talk: https://www.youtube.com/watch?v=_rAdJkAbGls None of these methods gracefully handle range queries. You can do clever things with your 2i to handle this, but it the Twitter use case wouldn't need ranges. On Wed, Aug 20, 2014 at 12:28 PM, John Daily jda...@basho.com wrote: I don't have benchmarks to discuss query performance for different tools at different sizes, but I'd like to point out that the ultimate search tool for Riak is to not search at all. Riak Search, 2i, MapReduce are all capable tools, but they don't scale nearly as well as straight key/value requests, and it is often possible to model your data around the latter. I covered this in https://basho.com/riak-development-anti-patterns/ and the next edition of Eric Redmond's Little Riak Book (http://littleriakbook.com) will have more discussion on the topic, but if at all possible, create your query results as reports as the data is ingested, instead of attempting to find it all later. -John On Wed, Aug 20, 2014 at 3:21 PM, Alex De la rosa alex.rosa@gmail.com wrote: Any thoughts about this? One thing it worries me about Riak Search is that if one index has several millions of object to search for maybe it becomes slow? 2i might be faster then? Thanks! Alex On Tue, Aug 19, 2014 at 8:47 AM, Alex De la rosa alex.rosa@gmail.com wrote: Hi there, I had been seeing lately Riak Search as an ultimate way to query Riak... and it seems recommended to use over MapReduce and even 2i... said so... should we try to always use Riak Search over the other systems? Is there any situation in which MapReduce could be a better approach than Riak Search? Same goes for 2i... I believe 2i is an optimal approach if you just want keys and know very well what are you looking for, but out of that, should Riak Search try to replace all 2i uses? Practical example: If you are twitter and want to get twits for the hashtag #Riak, what would be the best approach? 2i? Riak Search? MapReduce? Thanks! Alex ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com
Re: Riak Search VS other query systems
We have some prototypes for how to expand Pipe's capabilities thanks to Chris Meiklejohn. We have not exposed it directly and I honestly think it may be the fundamentally wrong level of abstraction to present to users. It could be a way to implement higher-level query/processing features, but that has not panned out and is low on our priorities at this time. On Thu, Aug 21, 2014 at 11:55 AM, Alexander Sicular sicul...@gmail.com wrote: Re. Riak pipes. What's the latest regarding accessing the pipe framework? Haven't heard t much about it lately, admittedly haven't been listening t hard either. The thought would be to do stormish stream processing in situ. @siculars http://siculars.posthaven.com Sent from my iRotaryPhone On Aug 20, 2014, at 22:28, Sargun Dhillon sar...@sargun.me wrote: I second John's opinions. Generally, I would have have one key which is the secondary index, being an observe-remove OR-Set (or a relevant type for your application, be a register, g-set, or a plain old OR-set) pointing to back to the keys. Unfortunately, this mechanism can become quite unwieldy in when you have a term with a high cardinality. Now, moving onto the Twitter use case, you care a lot about the speed. With this strategy, if you're doing this from a client where you (1) read the 2i OR-set, and then (2) read the keys, that can be expensive as you have to read the entire 2i value back to the Riak client before reading any of the keys. An example, the hashtag #beiber, would have high cardinality would result in a super large value, and reading that back over the network would be less than awesome. Also, having to pass this value around disterl would be poor. Fortunately, the folks at Basho have invented riak_pipe. Riak_pipe is a method to allow running the read locally on the node the 2i lives on, and then streaming reads for all of those keys to the nodes that they live on, and then all back to the reader. It's actually the framework that Riak MR uses under the hood. Talk: https://vimeo.com/53910999 (there might be newer ones as well) Docs: https://github.com/basho/riak_pipe Also, to deal with high-cardinality values, there are a variety of work arounds, such as sharding the secondary index to some known set of keys, and doing a read across these list of keys. Also, you can postfix a nonce to the 2i-key, and ensure that they all end up on one node (custom hashing function), or a subset of nodes, and utilize leveldb's key iteration over a range to handle this. The general patterns I like for 2i atop Riak is to specialize Peter Bailis, from UC Berkeley's work for RAMPs. If you build the framework for this, it'll be all sorts of useful in the future. One warning is that there is no easy way to garbage collect in Riak today. Paper: http://www.bailis.org/papers/ramp-sigmod2014.pdf Talk: https://www.youtube.com/watch?v=_rAdJkAbGls None of these methods gracefully handle range queries. You can do clever things with your 2i to handle this, but it the Twitter use case wouldn't need ranges. On Wed, Aug 20, 2014 at 12:28 PM, John Daily jda...@basho.com wrote: I don't have benchmarks to discuss query performance for different tools at different sizes, but I'd like to point out that the ultimate search tool for Riak is to not search at all. Riak Search, 2i, MapReduce are all capable tools, but they don't scale nearly as well as straight key/value requests, and it is often possible to model your data around the latter. I covered this in https://basho.com/riak-development-anti-patterns/ and the next edition of Eric Redmond's Little Riak Book (http://littleriakbook.com) will have more discussion on the topic, but if at all possible, create your query results as reports as the data is ingested, instead of attempting to find it all later. -John On Wed, Aug 20, 2014 at 3:21 PM, Alex De la rosa alex.rosa@gmail.com wrote: Any thoughts about this? One thing it worries me about Riak Search is that if one index has several millions of object to search for maybe it becomes slow? 2i might be faster then? Thanks! Alex On Tue, Aug 19, 2014 at 8:47 AM, Alex De la rosa alex.rosa@gmail.com wrote: Hi there, I had been seeing lately Riak Search as an ultimate way to query Riak... and it seems recommended to use over MapReduce and even 2i... said so... should we try to always use Riak Search over the other systems? Is there any situation in which MapReduce could be a better approach than Riak Search? Same goes for 2i... I believe 2i is an optimal approach if you just want keys and know very well what are you looking for, but out of that, should Riak Search try to replace all 2i uses? Practical example: If you are twitter and want to get twits for the hashtag #Riak, what would be the best approach? 2i? Riak Search? MapReduce? Thanks! Alex ___ riak-users mailing list
Re: Riak Search VS other query systems
Any thoughts about this? One thing it worries me about Riak Search is that if one index has several millions of object to search for maybe it becomes slow? 2i might be faster then? Thanks! Alex On Tue, Aug 19, 2014 at 8:47 AM, Alex De la rosa alex.rosa@gmail.com wrote: Hi there, I had been seeing lately Riak Search as an ultimate way to query Riak... and it seems recommended to use over MapReduce and even 2i... said so... should we try to always use Riak Search over the other systems? Is there any situation in which MapReduce could be a better approach than Riak Search? Same goes for 2i... I believe 2i is an optimal approach if you just want keys and know very well what are you looking for, but out of that, should Riak Search try to replace all 2i uses? Practical example: If you are twitter and want to get twits for the hashtag #Riak, what would be the best approach? 2i? Riak Search? MapReduce? Thanks! Alex ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Search VS other query systems
I don't have benchmarks to discuss query performance for different tools at different sizes, but I'd like to point out that the ultimate search tool for Riak is to not search at all. Riak Search, 2i, MapReduce are all capable tools, but they don't scale nearly as well as straight key/value requests, and it is often possible to model your data around the latter. I covered this in https://basho.com/riak-development-anti-patterns/ and the next edition of Eric Redmond's Little Riak Book (http://littleriakbook.com) will have more discussion on the topic, but if at all possible, create your query results as reports as the data is ingested, instead of attempting to find it all later. -John On Wed, Aug 20, 2014 at 3:21 PM, Alex De la rosa alex.rosa@gmail.com wrote: Any thoughts about this? One thing it worries me about Riak Search is that if one index has several millions of object to search for maybe it becomes slow? 2i might be faster then? Thanks! Alex On Tue, Aug 19, 2014 at 8:47 AM, Alex De la rosa alex.rosa@gmail.com wrote: Hi there, I had been seeing lately Riak Search as an ultimate way to query Riak... and it seems recommended to use over MapReduce and even 2i... said so... should we try to always use Riak Search over the other systems? Is there any situation in which MapReduce could be a better approach than Riak Search? Same goes for 2i... I believe 2i is an optimal approach if you just want keys and know very well what are you looking for, but out of that, should Riak Search try to replace all 2i uses? Practical example: If you are twitter and want to get twits for the hashtag #Riak, what would be the best approach? 2i? Riak Search? MapReduce? Thanks! Alex ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com