Hi Marco,

I'm also new to riak and in the middle researching whether riak makes sense
for a new project I'm working on.

Here are my thoughts based on what I've read so far:
 - Riak can be pretty fast, but their game is more about rock solid
stability rather than raw performance
 - you're going to want a 3-node+ cluster; you'd want multiple nodes for
any m/r setup anyway
 - Riak won't be super great for ad-hoc queries so I'm assuming that we're
talking about a canned daily report
 - your schema needs to match your query patterns; a 2i of date, query by
specific date then group by session;  if you're doing this as described w/o
a 2i date filter first, you'll be going through page view data for the past
which presumably won't change after you've processed it
 - m/r using protocol buffers can be significantly faster than using HTTP
so try again with a client that uses pb
 - once you have multiple nodes set up, do_prereduce will split the load of
reducing across the nodes;  i think this would definitely be useful if
you're data reduces well per node like: for day d, x registered user page
views, y unregistered user page views
 - updating shouldn't be a problem as long as it isn't hard for you to
resolve collisions; if there's a collision you have to decide on the
strategy for resolution; since this is log data and not something that
requires transactions, that shouldn't be a problem

Some links that I've found useful:
http://joyeur.com/2010/10/31/riak-smartmachine-benchmark-the-technical-details/
(especially the comment)
http://devblog.seomoz.org/2011/10/using-riak-for-ranking-collection/
http://inakanetworks.com/blog/2011/08/25/when-to-use-riak/

Basho Vimeo channel has a pile of informative videos where you'll find good
nuggets here and there: http://vimeo.com/17604126

Hope that helps,
Steve

On Sun, Feb 12, 2012 at 9:00 AM, <[email protected]> wrote:

> Message: 1
> Date: Sun, 12 Feb 2012 11:27:22 +0000
> From: Marco Monteiro <[email protected]>
> To: [email protected]
> Subject: Is Riak a good solution for this problem?
>
> Hello!
>
> I'm considering Riak for the statistics of a site that is approaching  a
> billion page views per month.
> The plan is to log a little information about each the page view and then
> to query that data.
>
> I'm very new to Riak.  I've gone over the documentation on the wiki, and I
> know about map-reduce,
> secondary indexes and Riak search. I've installed Riak on a single node and
> made a test with the
> default configuration. The results were a little bellow what I expected.
> For the test is used the following
> requirement.
>
> We want the page view count by day for registered and unregistered users.
> We are storing session
> documents. Each document has a session identifier as it's key and a list of
> page views as the value
> (and a few additional properties we can ignore). This document structure
> comes from CouchDB,
> where I organised things like this to be able to more easily query the
> database. I've done a basic
> javascript map-reduce query for this. I just map over each session (every
> k/v in a bucket) returning
> the length of the page views array for either the registered or
> unregistered field (the other is zero), and
> the day of the request. In the reduce I collect them by hashing the day and
> summing the two number
> of page views. Then I have a second reduce to sort the list by day.
>
> This is very slow on a single machine setup with default Riak
> configuration. 1.000 sessions takes
> 6 seconds. 10.000 sessions takes more that 2 minutes (timeout). We want to
> handle 10.000.000
> sessions, at least. Is there a way, maybe with secondary indexes, to make
> this go faster using only Riak?
> Or must I use some kind of persistent cache to store this info as time goes
> by? Or can I make Riak
> run 100 times faster by tweaking the config? I don't want to have 1000
> machines for making this work.
>
> Also, will updating the session documents be a problem for Riak? Would it
> be better to store each
> page hit under a new key, to not update the the session document. Because
> of the "multilevel" map
> reduce this ca work on Riak, where it didn't work on CouchDB, because its
> view system limitation.
> Unfortunately, with the update of documents the CouchDB database was
> growing way too fast for it
> to be a feasible solution.
>
>
> Any advice to make Riak work for this problem is greatly appreciated.
>
> Thanks,
> Marco
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120212/10c8fc3d/attachment-0001.html
> >
>
> ------------------------------
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> End of riak-users Digest, Vol 31, Issue 16
> ******************************************
>



-- 
facebook: http://facebook.com/picturebookApp
twitter: http://twitter.com/maplekey
blog: http://maplekeycompany.blogspot.com/
site: http://www.maplekeycompany.com/mobile/
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to