Toby,

Although I haven't traced a root cause just yet I've had several reports in IRC of odd behavior when using nginx as a load balancer for Riak CS. In almost every case, removing nginx from the equation and switching to haproxy or pointing directly at the nodes in the application has resolved issues. It's anecdotal until I can actually determine a root cause but it's worth testing out if you're experiencing trouble.

-- 
Seth Thomas

On September 18, 2013 at 8:48:27 PM, Toby Corkindale (toby.corkind...@strategicdata.com.au) wrote:

On 19/09/13 11:17, Luke Bakken wrote:
> Hi Toby,
>
> Invalid hint files won't cause Riak to fail requests - there must have
> been something else happening. Hint files are used by Riak to speed
> start time when loading a large key set.
>
> You mentioned a "load-balancer pool" - are you using something like
> HAProxy to load-balance requests to your Riak CS cluster?
>
> The "error: disconnected" message is a good clue. If you can provide
> log files that may point to the cause.

Hi Luke,
I'm still seeing quite a few failed requests. I've been chasing the
hintfiles but I guess that was a red herring.

We're using nginx to load balance requests to Riak CS.
I tried going directly to each node in turn, and it didn't show that any
one node was reliably failing every request.

Hitting one server just now came up with OK/403/403/OK/OK.
Trying another was OK/OK/OK/OK/403 though.

Here's some logs from riak-cs:

error.log:2013-09-19 11:37:02.242 [error]
<0.5105.0>@riak_cs_wm_common:maybe_create_user:223 Retrieval of user
record for s3 failed. Reason: disconnected

There wasn't anything immediately either side of that. The riak logs for
the same minute on that server likewise do not have anything.

There's quite a lot of free memory on the servers; they have 32000 file
handles available.

Toby


> On Wed, Sep 18, 2013 at 4:29 PM, Toby Corkindale
> <toby.corkind...@strategicdata.com.au> wrote:
>> I found one Riak server was reporting a lot of errors like
>> [error] <0.808.0> Hintfile
>> '/var/lib/riak/bitcask/68507889249886074290797726533575766546371837952/3.bitcask.hint'
>> invalid
>>
>> And the Riak CS logs contained a lot of messages about being unable to
>> retrieve s3 user details because "error: disconnected"
>>
>> I think I've blown away the bad hintfiles and have had them repaired from
>> other replicas now, and I haven't seen any more errors for a little while.
>>
>> I'm not sure what caused those to become invalid.
>> Just a thought, but would be good if Riak could automatically repair them
>> rather than failing requests.
>>
>> Cheer,s
>> Toby
>>
>> On 19/09/13 08:42, Toby Corkindale wrote:
>>>
>>> Ah, hold on.. have just discovered that rather than it being deletion
>>> calls, it seems to just be every X calls of any sort.. sounds like one
>>> of the servers in the load-balancer pool must be misconfigured somehow,
>>> but the rest are OK.
>>>
>>> On 19/09/13 08:34, Toby Corkindale wrote:
>>>>
>>>> I've just upgraded from Riak CS 1.3.1 to 1.4.1
>>>>
>>>> Using s3cmd to test a few things, I've found some odd behaviour.
>>>> Creating a bucket and putting a file works just fine, eg:
>>>>
>>>> s3cmd mb s3://test
>>>> s3cmd put README s3://test
>>>> s3cmd get s3://test/README
>>>>
>>>> However if I try to delete a file or bucket, it throws an error:
>>>>
>>>> s3cmd del s3://test/README
>>>> ERROR: S3 error: 403 (InvalidAccessKeyId): The AWS Access Key Id you
>>>> provided does not exist in our records.
>>>>
>>>> s3cmd rb s3://test
>>>> ERROR: S3 error: 403 (InvalidAccessKeyId): The AWS Access Key Id you
>>>> provided does not exist in our records.
>>>>
>>>>
>>>> Have I messed something up during the upgrade, or is this a bug in 1.4.1?


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to