Try distributing your reads in a round robin against all your nodes.
Best, Alexander
@siculars on twitter
http://siculars.posterous.com
Sent from my iPhone
On Nov 8, 2010, at 6:36, Jan Buchholdt <[email protected]> wrote:
We are evaluating Riak for a project, but having a hard time making
it fast enough for our need.
Our model is very simple and looks like this:
--------------------- * ---------------------
| Person | ------------------------> | Document |
--------------------- ---------------------
We have a set of persons and each person can have many documents.
Our typical queries are:
Get an overview of all the persons documents. This query returns the
person along with a subset of data from all the persons documents.
Get document by id.
Our requirements are that these quires should be performed under in
under 100millis when we have 10 requests per second or less load.
The size of the data:
A document is approximately 1 kb
No data for a persons except the personidentifier
Around 6 million persons.
Each person has from from 0 to a couple of thousand documents.
All in all we have 120 mio documents.
Most persons don't have more than 1 to 10 documents, but then we
have some few "heavy" persons having 500 to 1000 documents.
Riak setup:
4 Nodes.
Hardware configuration for each node:
HP ProLiant DL360 G7
18 gb ram
SAS discs
Intel(R) Xeon(R) CPU E5620 @ 2.40GHz Proc 1
Solaris 10 update 9
We use the default bitcask storage engine
We replicate data to 3 machines when it is written.
Reads are read from just one machine
We tried implementing our datamodel using Riak links as described
below:
Persons are stored in a person bucket using their person identifier
as key /person/{personid}
Documents are saved in another bucket /document/{documented}
At each person we store links to the persons documents.
We are having problems with the query fetching all the documents for
a person. Reading all the documents for a person is done using a
link walk. The linkwalk start reading all the document keys using
the personid. It then fetches all documents.
For persons with 1 - 5 documents the response times are often over
100 mills. And for the "heavy" persons with many documents response
times are several seconds. But we are very new to Riak and are
probably using a wrong approach.
Below are our thoughts (having almost no experience with Riak):
The chosen datamodel is good for writes. Writing a new document
results in 3 operations against Riak. Writing the document using its
id as key. Reading the Person to get all the persons document links.
Append the new document's key to the persons links and write back
the person.
Reading, using linkwalk, is slow because it is expensive to fetch
many documents even though the linkwalk can read their keys right
away by reading the links for the person. Even though we have 4
nodes and linkwalks are parallelized many documents need to be
retrieved from one node. Having to fetch for example 100 documents
on one node (one disc) is expensive. We do not know how data is
stored but are afraid Riak is doing a lot of disk seeks.
We are considering another more denormalized approach where we write
all the documents for a person in one "blob". But then we are afraid
our writes become slow, because when adding a new document the blob
must be read, the new document inserted and the blob written back.
We could really need some input. Is our assumptions wrong? (we have
not yet dug into the problems). Is there a good datamodel for our
requirements? etc?.
We haven't looked at Riak search at all. Maybe it could solve some
of our problems.
--
--
Jan Buchholdt
Software Pilot
Trifork A/S
Cell +45 50761121
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com