Jan -
Which protocol (HTTP or protocol buffers) and client lib are you using?
--Kevin
On Nov 8, 2010, at 6:36 AM, Jan Buchholdt wrote:
> We are evaluating Riak for a project, but having a hard time making it fast
> enough for our need.
>
> Our model is very simple and looks like this:
>
> --------------------- * ---------------------
> | Person | ------------------------> | Document |
> --------------------- ---------------------
>
> We have a set of persons and each person can have many documents.
>
> Our typical queries are:
>
> Get an overview of all the persons documents. This query returns the person
> along with a subset of data from all the persons documents.
> Get document by id.
>
> Our requirements are that these quires should be performed under in under
> 100millis when we have 10 requests per second or less load.
>
> The size of the data:
> A document is approximately 1 kb
> No data for a persons except the personidentifier
> Around 6 million persons.
> Each person has from from 0 to a couple of thousand documents.
> All in all we have 120 mio documents.
> Most persons don't have more than 1 to 10 documents, but then we have some
> few "heavy" persons having 500 to 1000 documents.
>
> Riak setup:
> 4 Nodes.
> Hardware configuration for each node:
> HP ProLiant DL360 G7
> 18 gb ram
> SAS discs
> Intel(R) Xeon(R) CPU E5620 @ 2.40GHz Proc 1
> Solaris 10 update 9
>
> We use the default bitcask storage engine
> We replicate data to 3 machines when it is written.
> Reads are read from just one machine
>
> We tried implementing our datamodel using Riak links as described below:
>
> Persons are stored in a person bucket using their person identifier as key
> /person/
> {personid}
> Documents are saved in another bucket
> /document/
> {documented}
> At each person we store links to the persons documents.
>
> We are having problems with the query fetching all the documents for a
> person. Reading all the documents for a person is done using a link walk.
> The linkwalk start reading all the document keys using the personid. It then
> fetches all documents.
> For persons with 1 - 5 documents the response times are often over 100 mills.
> And for the "heavy" persons with many documents response times are several
> seconds. But we are very new to Riak and are probably using a wrong approach.
>
> Below are our thoughts (having almost no experience with Riak):
>
> The chosen datamodel is good for writes. Writing a new document results in 3
> operations against Riak. Writing the document using its id as key. Reading
> the Person to get all the persons document links. Append the new document's
> key to the persons links and write back the person.
>
> Reading, using linkwalk, is slow because it is expensive to fetch many
> documents even though the linkwalk can read their keys right away by reading
> the links for the person. Even though we have 4 nodes and linkwalks are
> parallelized many documents need to be retrieved from one node. Having to
> fetch for example 100 documents on one node (one disc) is expensive. We do
> not know how data is stored but are afraid Riak is doing a lot of disk seeks.
>
> We are considering another more denormalized approach where we write all the
> documents for a person in one "blob". But then we are afraid our writes
> become slow, because when adding a new document the blob must be read, the
> new document inserted and the blob written back.
>
> We could really need some input. Is our assumptions wrong? (we have not yet
> dug into the problems). Is there a good datamodel for our requirements? etc?.
> We haven't looked at Riak search at all. Maybe it could solve some of our
> problems.
>
>
>
> --
> --
> Jan Buchholdt
> Software Pilot
> Trifork A/S
> Cell +45 50761121
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com