Alex,
Could you share your Riak CS app.config file with me? I'd like to look
over what you have for a few settings that could affect the bucket
listing performance.
Kelly
On 08/18/2014 11:22 AM, Alex Millar wrote:
Hey Kelly,
Thanks for reaching out! We’re using the following versions for RiakCS
& Riak
# Download RiakCS
# Version: 1.4.5
# OS: Ubuntu 12.04 (Precise) AMD 64
curl -O
http://s3.amazonaws.com/downloads.basho.com/riak-cs/1.4/1.4.5/ubuntu/precise/riak-cs_1.4.5-1_amd64.deb
# Download Riak
# Version: 1.4.8
# OS: Ubuntu 12.04 (Precise) AMD 64
curl -O
http://s3.amazonaws.com/downloads.basho.com/riak/1.4/1.4.8/ubuntu/precise/riak_1.4.8-1_amd64.deb
The intent of being able to have performant ls operations was to that
we could connection via Transmit <http://panic.com/transmit/> to view
and navigate the contents of the bucket, similar to how you can access
the contents of your S3 buckets in their webUI. That being said our
keys are akin to a folder structure, for example...
/organizations/OrganizationID-[OrganizationID]/documents/proposals/ProposalID-[ProposalID]/DocumentSlotID-[DocumentSlotID]
S3 must be doing some sort of secondary indexing to allow for fast
lookups here, because the bucket in question that has the performance
issues only has 2 “folders” under
s3://bonfirehub-resources-can-east-doc-conversion yet it takes the
longest to s3cmd ls since Riak is clearly traversing all the keys to
fulfill this request.
Short story, this is not a requirement for us in order to use RiakCS
however, going forward it would be desirable if RiakCS could maintain
this form of secondary indices (and potentially have a WebUI) to
better match some use cases that exist for clients who are used to
using S3.
Bonfire Logo *Alex Millar*, CTO
Office: 1-800-354-8010 ext. 704 <tel:+18003548010>
Mobile: 519-729-2539 <tel:+15197292539>
*GoBonfire*.com <http://GoBonfire.com>
From: Kelly McLaughlin <ke...@basho.com> <mailto:ke...@basho.com>
Reply: Kelly McLaughlin <ke...@basho.com>> <mailto:ke...@basho.com>
Date: August 15, 2014 at 7:03:47 PM
To: Alex Millar <a...@gobonfire.com>> <mailto:a...@gobonfire.com>,
riak-users@lists.basho.com <riak-users@lists.basho.com>>
<mailto:riak-users@lists.basho.com>
Subject: Re: Slow s3cmd ls queries + HAProxy 504 timeouts
Hello Alex. Would you mind sharing what version of Riak and Riak CS
you are using? Also if you can post the the contents of your Riak CS
app.config file
it might help give a better idea of what might be going on.
Generally listing the contents of a bucket is more expensive than a
normal download or upload request, but there have been performance
improvements in recent
versions of Riak CS and there are settings that can be adjusted
depending on the version you are using. The time required to list the
contents of the entire bucket
is definitely related to the number of objects in that bucket so the
time will continue to increase as the number of objects increases,
but we do continue to work to
make the process as efficient as possible.
Depending on why you need to list the contents of the bucket the
max-keys query parameter available with the bucket listing operation
may be useful. By default this
limit is 1000 keys, but s3cmd does not expose this that I'm aware of
and instead buffers all the results until the end of the contents is
reached. But if you need
to list the contents for the purpose of some processing step, it may
work better for you to break up this process into smaller chunks
using max-keys.
Kelly
On 08/15/2014 06:39 AM, Alex Millar wrote:
So the issue we’re having is only with bucket listing.
alxndrmlr@alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls
s3://bonfirehub-resources-can-east-doc-conversion
DIR
s3://bonfirehub-resources-can-east-doc-conversion/organizations/
real 2m0.747s
user 0m0.076s
sys 0m0.030s
where as…
alxndrmlr@alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls
s3://bonfirehub-resources-can-east-doc-conversion/organizations/OrganizationID-1/documents/proposals
DIR
s3://bonfirehub-resources-can-east-doc-conversion/organizations/OrganizationID-1/documents/proposals/
real 0m10.262s
user 0m0.075s
sys 0m0.028s
The contents of this bucket contains a lot of very small files
(basically for each PDF we receive I split it to .JPG foreach page
and store them here. Based on the my latest counts it looks like we
have around *170,000* .JPG files in that bucket.
Now I’ve had a hunch this is just a fundamentally expensive
operation which exceeds the 5000ms response time threshold set in
our HAProxy config (which I raised during the video to illustrate
what’s going on). After reading
http://www.quora.com/Riak/Is-it-really-expensive-for-Riak-to-list-all-buckets-Why and
http://www.paperplanes.de/2011/12/13/list-all-of-the-riak-keys.html I’m
feeling like this is just a fundamental issue with the data
structure in Riak.
Based on this I’m thinking that cost of this type of query is only
going to get worse over time as we add more keys to this bucket
(unless secondary indexes can be added). Or am I totally out to
lunch here and there’s some other underlying problem?
Bonfire Logo *Alex Millar*, CTO
Office: 1-800-354-8010 ext. 704 <tel:+18003548010>
Mobile: 519-729-2539 <tel:+15197292539>
*GoBonfire*.com <http://GoBonfire.com>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com