On 19/09/13 11:17, Luke Bakken wrote:
Hi Toby,

Invalid hint files won't cause Riak to fail requests - there must have
been something else happening. Hint files are used by Riak to speed
start time when loading a large key set.

You mentioned a "load-balancer pool" - are you using something like
HAProxy to load-balance requests to your Riak CS cluster?

The "error: disconnected" message is a good clue. If you can provide
log files that may point to the cause.

Hi Luke,
I'm still seeing quite a few failed requests. I've been chasing the hintfiles but I guess that was a red herring.

We're using nginx to load balance requests to Riak CS.
I tried going directly to each node in turn, and it didn't show that any one node was reliably failing every request.

Hitting one server just now came up with OK/403/403/OK/OK.
Trying another was OK/OK/OK/OK/403 though.

Here's some logs from riak-cs:

error.log:2013-09-19 11:37:02.242 [error] <0.5105.0>@riak_cs_wm_common:maybe_create_user:223 Retrieval of user record for s3 failed. Reason: disconnected

There wasn't anything immediately either side of that. The riak logs for the same minute on that server likewise do not have anything.

There's quite a lot of free memory on the servers; they have 32000 file handles available.

Toby


On Wed, Sep 18, 2013 at 4:29 PM, Toby Corkindale
<toby.corkind...@strategicdata.com.au> wrote:
I found one Riak server was reporting a lot of errors like
[error] <0.808.0> Hintfile
'/var/lib/riak/bitcask/68507889249886074290797726533575766546371837952/3.bitcask.hint'
invalid

And the Riak CS logs contained a lot of messages about being unable to
retrieve s3 user details because "error: disconnected"

I think I've blown away the bad hintfiles and have had them repaired from
other replicas now, and I haven't seen any more errors for a little while.

I'm not sure what caused those to become invalid.
Just a thought, but would be good if Riak could automatically repair them
rather than failing requests.

Cheer,s
Toby

On 19/09/13 08:42, Toby Corkindale wrote:

Ah, hold on.. have just discovered that rather than it being deletion
calls, it seems to just be every X calls of any sort.. sounds like one
of the servers in the load-balancer pool must be misconfigured somehow,
but the rest are OK.

On 19/09/13 08:34, Toby Corkindale wrote:

I've just upgraded from Riak CS 1.3.1 to 1.4.1

Using s3cmd to test a few things, I've found some odd behaviour.
Creating a bucket and putting a file works just fine, eg:

s3cmd mb s3://test
s3cmd put README s3://test
s3cmd get s3://test/README

However if I try to delete a file or bucket, it throws an error:

s3cmd del s3://test/README
ERROR: S3 error: 403 (InvalidAccessKeyId): The AWS Access Key Id you
provided does not exist in our records.

s3cmd rb s3://test
ERROR: S3 error: 403 (InvalidAccessKeyId): The AWS Access Key Id you
provided does not exist in our records.


Have I messed something up during the upgrade, or is this a bug in 1.4.1?


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to