Alex,

The value you had set for the fold_objects_for_list_keys setting is one I was very interested to see and I highly recommend setting it to true for your cluster. The impact of setting this to true should be to make bucket listing operations generally more efficient. There should be no detrimental effects. There are also some optimizations for bucket listing queries that use the prefix request parameter so I would expect queries to list specific subdirectories in a bucket to show improved perforamnce as well. Changing the app.config and restarting the CS node is the correct way to have it take effect.


As for GC performance, I would recommend to add an entry to your app.config file to se||||t gc_paginated_indexes to true. This option causes the GC process to use a more efficient process for determining data that is eligible for collection and generally results in far fewer timeouts and better success for
users.


Kelly

On 08/19/2014 07:32 AM, Alex Millar wrote:
Hey Kota,

We’re currently using the following versions;

# Download RiakCS
# Version: 1.4.5
# OS: Ubuntu 12.04 (Precise) AMD 64
curl -O http://s3.amazonaws.com/downloads.basho.com/riak-cs/1.4/1.4.5/ubuntu/precise/riak-cs_1.4.5-1_amd64.deb

# Download Riak
# Version: 1.4.8
# OS: Ubuntu 12.04 (Precise) AMD 64
curl -O http://s3.amazonaws.com/downloads.basho.com/riak/1.4/1.4.8/ubuntu/precise/riak_1.4.8-1_amd64.deb

I checked our RiakCS app.config and fold_objects_for_list_keys is set to false. What impact would it have my cluster if I flip that to true? Would I simply update the app.config and restart RiakCS?

As for the consideration on garbage collection, the slow performance is happening consistently over the span of a week (since we noticed it as we don’t often list buckets). I suspect its not the case regarding large amounts of objects being deleted as generally all data going into that bucket is write-once (we process PDFs pages to .JPG and PUT them in that bucket, the only time overwrites occur is if we manually re-trigger the processing script to run on a specific document)

*Adding ke...@basho.com* as we have another thread going on about this same topic, I figured we could merge the discussion to reduce duplicate effort here.

Bonfire Logo    *Alex Millar*, CTO
Office: 1-800-354-8010 ext. 704 <tel:+18003548010>
Mobile: 519-729-2539 <tel:+15197292539>
*GoBonfire*.com <http://GoBonfire.com>


From: Kota Uenishi <k...@basho.com> <mailto:k...@basho.com>
Reply: Kota Uenishi <k...@basho.com>> <mailto:k...@basho.com>
Date: August 18, 2014 at 10:03:40 PM
To: Alex Millar <a...@gobonfire.com>> <mailto:a...@gobonfire.com>
Cc: Charlie Voiselle <cvoise...@basho.com>> <mailto:cvoise...@basho.com>, Tad Bickford <tbickf...@basho.com>> <mailto:tbickf...@basho.com>, Riak-Users <riak-users@lists.basho.com>> <mailto:riak-users@lists.basho.com>, Brandon Noad <bran...@gobonfire.com>> <mailto:bran...@gobonfire.com>
Subject: Re: Fwd: RiakCS 504 Timeout on s3cmd for certain keys

Alex,

Riak CS 1.4.5 and 1.5.0 had a lot of improvement after those articles you put the URL, not it is not using Riak's bucket listing but using Riak's internal API for more efficient listing. What version of Riak CS are you using? I want you to make sure you're using those versions and a line `{fold_objects_for_list_keys, true},` at riak_cs section of app.config (assuming all other Riak part correctly configured).

>Based on this I’m thinking that cost of this type of query is only going to get worse over time as we add more keys to this bucket (unless secondary indexes can be added). Or am I totally out to lunch here and there’s some other underlying problem?

The strange part is s3cmd. Riak CS has incremental bucket listing API that requires clients to iterate on every 1000 objects (common prefixes), but s3cmd iterates all the specified bucket before printing them all. You can observe how s3cmd and Riak CS interacts if you specify '-d' option like this:

```
s3cmd -d -c yours.s3cfg ls -r s3://yourbucket/yourdir/
```

I would expect Riak CS's listing API is not much slow as to need 5 seconds (or, say, >10 seconds) because, on each request it just returns 1000 objects.

There might be another possibility on slow query - if you had many (say, more than 10 thousands) deleted objects on the same bucket it might affect each 1000 listing. This will eventually be solved as Riak CS's garbage collection removes deleted manifests, which is just marked as deleted (and to be ignored correctly).

[1] http://www.quora.com/Riak/Is-it-really-expensive-for-Riak-to-list-all-buckets-Why

On Thu, Aug 14, 2014 at 6:05 AM, Alex Millar <a...@gobonfire.com <mailto:a...@gobonfire.com>> wrote:

    Good afternoon Charlie,

    So the issue we’re having is only with bucket listing.

    alxndrmlr@alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls
    s3://bonfirehub-resources-can-east-doc-conversion
           DIR
    s3://bonfirehub-resources-can-east-doc-conversion/organizations/

    real 2m0.747s
    user 0m0.076s
    sys 0m0.030s

    where as…

    alxndrmlr@alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls
    
s3://bonfirehub-resources-can-east-doc-conversion/organizations/OrganizationID-1/documents/proposals
           DIR
    
s3://bonfirehub-resources-can-east-doc-conversion/organizations/OrganizationID-1/documents/proposals/

    real 0m10.262s
    user 0m0.075s
    sys 0m0.028s

    The contents of this bucket contains a lot of very small files
    (basically for each PDF we receive I split it to .JPG foreach
    page and store them here. Based on the my latest counts it looks
    like we have around *170,000* .JPG files in that bucket.

    Here’s a snippet from the HAProxy log for the 504 timeouts…

    Aug 12 16:01:34
    <http://airmail.calendar/2014-08-12%2016:01:34%20EDT> localhost.localdomain
    haproxy[4718]: 192.0.223.236:48457 <http://192.0.223.236:48457>
    [12/Aug/2014:16:01:24.454] riak_cs~ riak_cs_backend/riak3
    161/0/0/-1/10162 504 194 - - sH-- 0/0/0/0/0 0/0
    {bonfirehub-resources-can-east-doc-conversion.bf-riakcs.com
    <http://bonfirehub-resources-can-east-doc-conversion.bf-riakcs.com/>}
    "GET /?delimiter=/ HTTP/1.1"

    I’ve put together a video showing off the top results of each of
    the 5 riak nodes while performing $ time s3cmd -c
    .s3cfg-riakcs-admin
    lss3://bonfirehub-resources-can-east-doc-conversion

    
https://dl.dropboxusercontent.com/u/5723659/RiakCS%20ls%20monitoring%20results.mov

    Now I’ve had a hunch this is just a fundamentally expensive
    operation which exceeds the 5000ms response time threshold set in
    our HAProxy config (which I raised during the video to illustrate
    what’s going on). After reading
    
http://www.quora.com/Riak/Is-it-really-expensive-for-Riak-to-list-all-buckets-Why
 and
    http://www.paperplanes.de/2011/12/13/list-all-of-the-riak-keys.html I’m
    feeling like this is just a fundamental issue with the data
    structure in Riak.

    Based on this I’m thinking that cost of this type of query is
    only going to get worse over time as we add more keys to this
    bucket (unless secondary indexes can be added). Or am I totally
    out to lunch here and there’s some other underlying problem?

    I’ve cc’d the mailing list on this as suggested.

    Bonfire Logo        *Alex Millar*, CTO
    Office: 1-800-354-8010 ext. 704 <tel:+18003548010>
    Mobile: 519-729-2539 <tel:+15197292539>
    *GoBonfire*.com <http://GoBonfire.com>


    From: Charlie Voiselle <cvoise...@basho.com>
    <mailto:cvoise...@basho.com>
    Reply: Charlie Voiselle <cvoise...@basho.com>>
    <mailto:cvoise...@basho.com>
    Date: August 13, 2014 at 10:36:51 AM
    To: Alex Millar <a...@gobonfire.com>> <mailto:a...@gobonfire.com>
    Cc: Tad Bickford <tbickf...@basho.com>> <mailto:tbickf...@basho.com>
    Subject: Fwd: RiakCS 504 Timeout on s3cmd for certain keys



    _______________________________________________
    riak-users mailing list
    riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
    http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




--
Kota UENISHI / @kuenishi
Basho Japan KK

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to