to be honest, i don't know how Riak handles that, i really though querying a smaller bucked would be faster... i guess it would be nice if somebody at basho could give some input.

Also, as i said before, i agree that you should avoid MapReduce phases like that, I just wanted to build my model imitating as much as a relational db worked... another thing is that you use that functionality or not ; ) As i also said before, i believe more on data processing in the background and the UI just getting keys with generated data.

If somebody has any other input or ideas on how to manage data and this situations, would be great to hear them.

Rohman

On Tue, 26 Jul 2011 20:38:39 -0700, Kev Burns wrote:

Well... you would think that doing a map/reduce across a smaller bucket would take less time but this isn't as true as you might think.
If I remember correctly, Riak doesn't store bucket/key values in memory, it just stores the hash.
So if you use {"inputs": "Messages_Rohman", ... }, it hast to test every key in memory to see if any of them is in the bucket you specified.

See here
http://wiki.basho.com/MapReduce.html#Inputs

You may also pass just the name of a bucket ({"inputs":"mybucket",...}), which is equivalent to passing all of the keys in that bucket as inputs (i.e. “a map/reduce across the whole bucket”). You should be aware that this triggers the somewhat expensive “list keys” operation, so you should use it sparingly.

Here "somewhat expensive" is an understatement.
If you have more than 10,000 keys, list keys could take several minutes. Even if your bucket only has 10 keys.
A better solution right now is to use key filters or a search input to the map/reduce.

A bucket input may also be combined with Key Filters to limit the number of objects processed by the first query phase.
If you’re using Riak Search, the list of inputs can also reference a search query to be used as inputs.

Hopefully Secondary Indexes comes with a new map/reduce input type that does something similar.

- Kev
c: +001 (650) 521-7791


On Tue, Jul 26, 2011 at 8:21 PM, Antonio Rohman Fernandez <[email protected]> wrote:

 

"The problem I see with riak-ql and Antonio's thing is that they're invariably going to be slow.
_javascript_ Map/Reduce over an entire bucket is just not suitable for inline requests."

Yes, of course, i also think that the MapReduce phases should be done in the background with some cron jobs or other methods... you don't want to execute this kind of queries on your UI web app, but at least the development is done in case you need to do so ( of course, using your head on how you distribute data on the buckets )... is for example with facebook's "News feed"... if we call that MapReduce query everytime the user click on the "Home" tab, it will be terrible expensive, so a process in the background generating the "News feed" for you and updating it every 5min ( for example ) will be more ideal... but still, is good to have MapReduce options OnDemand in case you want to grab some special data, etc... even the transaction will be a bit costly.

Also it depends on how you store your data... if you just have a "Messages" bucket to store everybody's messages, will be very hard to query... instead, if you atomize it like "Messages_Rohman", "Messages_OtherUser", etc... you will have less data on each bucket and queries could be faster and the MapReduce could be an Option for OnDemand data.

Rohman

 

On Tue, 26 Jul 2011 20:12:08 -0700, Kev Burns wrote:

sorry i meant to post this to the list

- Kev
c: +001 (650) 521-7791


On Tue, Jul 26, 2011 at 8:11 PM, Kev Burns <[email protected]> wrote:
Francisco - How's performance on riak-ql?

The problem I see with riak-ql and Antonio's thing is that they're invariably going to be slow.
_javascript_ Map/Reduce over an entire bucket is just not suitable for inline requests.

Take PodCrazy
http://podcrazy.net/

It's backed entirely by RiakSearch and memcached.
That episode listing on the homepage is a map/reduce that calculates popularity based on votes over time.
But right now this simple _javascript_ map/reduce over less than a thousand items takes about 2 seconds to run.
It totally makes sense as a map/reduce because it's calculating popularity based on several decaying attributes.
But it has to happen in a background process.

The site is powered by this port of Ripple to PHP
http://ripple-php.hackyhack.net/test/?test=document

Right now ripple-php has remained pretty basic and for good reason.
I've put off creating something more full-featured until secondary indexes makes it into master.
The shape of the native secondary index mechanism will heavily influence the design of any Riak ODM.

Lastly, in my experience Riak Search is not very memory efficient as a non-fulltext index mechanism.
Also sort of useless without support for ranges on anything other than keys.
You wind up selecting limit 0,999 and doing the slice yourself.

I suppose I see the value of a tool like riak-ql for reporting.
And I imagine this sort of tool will continue to be useful to add the sort of features that Secondary Indexes will not support.
And I also imagine a tool like this would be faster if implemented in Erlang.
On Tue, Jul 26, 2011 at 7:35 PM, Antonio Rohman Fernandez <[email protected]> wrote:

thanks for porting it to Google Docs, even seems the text got a little compressed in there, too cluttered.
hope it can help somebody.

Rohman

On Tue, 26 Jul 2011 19:21:36 -0700, Kev Burns wrote:

Here's a virus-free version of Antonio's slide deck (Google Docs)
https://docs.google.com/present/view?id=dhpxng6q_51gdj6r9wn

- Kev
c: +001 (650) 521-7791


On Tue, Jul 26, 2011 at 6:23 PM, Antonio Rohman Fernandez <[email protected]> wrote:

for PHP you can take a look at this slides i made, is about "phpCloud Framework" a new PHP5 MVC framework i'm building with Riak integration in place : ) is based on CakePHP that borrows heavily on Ruby on Rails.
You can download the slides on this address ( seems the file is too big for the distribution list as my last mail couldn't be sent ):

http://mahalostudio.com/Riak_phpCloud.pptx

Rohman

--

line
logo   Antonio Rohman Fernandez
CEO, Founder & Lead Engineer
[email protected]
  Projects
MaruBatsu.es
PupCloud.com
Wedding Album
line

On Tue, 26 Jul 2011 20:00:27 -0400, Jonathan Langevin wrote:

Looks interesting, but doesn't appear very intuitive (at least, to a PHP dev)


Jonathan Langevin
Systems Administrator
Loom Inc.
Wilmington, NC: (910) 241-0433[email protected] - www.loomlearning.com - Skype: intel352




On Mon, Jul 25, 2011 at 9:40 AM, francisco treacy <[email protected]> wrote:
It's awesome for ad-hoc querying, at least. An example can better explain.

Consider this:

db.add('users').map('query', '.address .street where
.weight:expr(x !.expired').run()


as opposed to:

db.add('users').map(function(v) {
 v = Riak.mapValuesJson(v)[0];
 var result = [];
 if ((v.weight < 180 || v.exempt) && v.acl && v.acl.state === '1101'
&& !v.expired) {
   if (v.address) {
     result.push(v.address.street);
   }
 }
 return result;
}).run()


riak-ql is basically adding some query sugar (where, &&) on top of
JSONSelect... which you can try it out here:
http://jsonselect.org/#tryit


2011/7/25 Mark Phillips <[email protected]>:
> Hey Francisco,
>
> I for one would be interested in learning some more specifics on how
> you're using it. I suspect others might be, too...
>
> Mark
>
> On Sat, Jul 23, 2011 at 4:40 PM, francisco treacy
> <[email protected]> wrote:
>> Hey all,
>>
>> Just wondering... is anyone using, or have tried out riak-ql?
>> https://github.com/frank06/riak-ql
>>
>> Not because I developed it -- but I'm regularly making use of it and I
>> think it kicks ass. Especially in the repl in combo with riak-js.
>>
>> What do you guys think?
>>
>> Francisco
>>
>> ps: really curious/excited about the upcoming Secondary Indices functionality
>>
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

--
line
logo   Antonio Rohman Fernandez
CEO, Founder & Lead Engineer
[email protected]
  Projects
MaruBatsu.es
PupCloud.com
Wedding Album
line
--
line
logo   Antonio Rohman Fernandez
CEO, Founder & Lead Engineer
[email protected]
  Projects
MaruBatsu.es
PupCloud.com
Wedding Album
line
--
line
logo   Antonio Rohman Fernandez
CEO, Founder & Lead Engineer
[email protected]
  Projects
MaruBatsu.es
PupCloud.com
Wedding Album
line

<<inline: blocked.gif>>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to