So the issue we’re having is only with bucket listing.

alxndrmlr@alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls 
s3://bonfirehub-resources-can-east-doc-conversion
                       DIR   
s3://bonfirehub-resources-can-east-doc-conversion/organizations/

real 2m0.747s
user 0m0.076s
sys 0m0.030s

where as…

alxndrmlr@alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls 
s3://bonfirehub-resources-can-east-doc-conversion/organizations/OrganizationID-1/documents/proposals
                       DIR   
s3://bonfirehub-resources-can-east-doc-conversion/organizations/OrganizationID-1/documents/proposals/

real 0m10.262s
user 0m0.075s
sys 0m0.028s

The contents of this bucket contains a lot of very small files (basically for 
each PDF we receive I split it to .JPG foreach page and store them here. Based 
on the my latest counts it looks like we have around 170,000 .JPG files in that 
bucket.

Now I’ve had a hunch this is just a fundamentally expensive operation which 
exceeds the 5000ms response time threshold set in our HAProxy config (which I 
raised during the video to illustrate what’s going on). After reading 
http://www.quora.com/Riak/Is-it-really-expensive-for-Riak-to-list-all-buckets-Why
 and http://www.paperplanes.de/2011/12/13/list-all-of-the-riak-keys.html I’m 
feeling like this is just a fundamental issue with the data structure in Riak. 

Based on this I’m thinking that cost of this type of query is only going to get 
worse over time as we add more keys to this bucket (unless secondary indexes 
can be added). Or am I totally out to lunch here and there’s some other 
underlying problem?

                 Alex Millar, CTO  
Office: 1-800-354-8010 ext. 704  
Mobile: 519-729-2539  
GoBonfire.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to