Re: Slow performance using linkwalk, help wanted

Kevin Smith Mon, 08 Nov 2010 05:20:32 -0800

Jan - 

Which protocol (HTTP or protocol buffers) and client lib are you using?


--Kevin
On Nov 8, 2010, at 6:36 AM, Jan Buchholdt wrote:

> We are evaluating Riak for a project, but having a hard time making it fast 
> enough for our need.
> 
> Our model is very simple and looks like this:
> 
> ---------------------                         * ---------------------
> |       Person      | ------------------------> |   Document        |
> ---------------------                           ---------------------
> 
> We have a set of persons and each person can have many documents. 
> 
> Our typical queries are:
> 
> Get an overview of all the persons documents. This query returns the person 
> along with a subset of data from all the persons documents.
> Get document by id.
> 
> Our requirements are that these quires should be performed under in under 
> 100millis when we have 10 requests per second or less load.
> 
> The size of the data:
> A document is approximately 1 kb 
> No data for a persons except the personidentifier
> Around 6 million persons.
> Each person has from from 0 to a couple of thousand documents.
> All in all we have 120 mio documents.
> Most persons don't have more than 1 to 10 documents, but then we have some 
> few "heavy" persons having 500 to 1000 documents.
> 
> Riak setup:
> 4 Nodes. 
> Hardware configuration for each node:
> HP ProLiant DL360 G7
> 18 gb ram
> SAS discs
> Intel(R) Xeon(R) CPU E5620 @ 2.40GHz Proc 1
> Solaris 10 update 9
> 
> We use the default bitcask storage engine
> We replicate data to 3 machines when it is written.
> Reads are read from just one machine
> 
> We tried implementing our datamodel using Riak links as described below:
> 
> Persons are stored in a person bucket using their person identifier as key 
> /person/
> {personid}
> Documents are saved in another bucket 
> /document/
> {documented}
> At each person we store links to the persons documents.
> 
> We are having problems with the query fetching all the documents for a 
> person.  Reading all the documents for a person is done using a link walk. 
> The linkwalk start reading all the document keys using the personid. It then 
> fetches all documents.
> For persons with 1 - 5 documents the response times are often over 100 mills. 
> And for the "heavy" persons with many documents response times are several 
> seconds. But we are very new to Riak and are probably using a wrong approach.
> 
> Below are our thoughts (having almost no experience with Riak):
> 
> The chosen datamodel is good for writes. Writing a new document results in 3 
> operations against Riak. Writing the document using its id as key. Reading 
> the Person to get all the persons document links. Append the new document's 
> key to the persons links and write back the person.
> 
> Reading, using linkwalk, is slow because it is expensive to fetch many 
> documents even though the linkwalk can read their keys right away by reading 
> the links for the person. Even though we have 4 nodes and linkwalks are 
> parallelized many documents need to be retrieved from one node. Having to 
> fetch for example 100 documents on one node (one disc) is expensive. We do 
> not know how data is stored but are afraid Riak is doing a lot of disk seeks.
> 
> We are considering another more denormalized approach where we write all the 
> documents for a person in one "blob". But then we are afraid our writes 
> become slow, because when adding a new document the blob must be read, the 
> new document inserted and the blob written back.
> 
> We could really need some input. Is our assumptions wrong? (we have not yet 
> dug into the problems). Is there a good datamodel for our requirements? etc?.
> We haven't looked at Riak search at all. Maybe it could solve some of our 
> problems.
> 
> 
> 
> -- 
> --
> Jan Buchholdt
> Software Pilot
> Trifork A/S
> Cell +45 50761121
> 
> 
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Slow performance using linkwalk, help wanted

Reply via email to