Hi Denis,

This doesn't sound like a known bug to me. Your hypothesis is reasonable,
since WALs use a surrogate ID, which maps to table ID/tablet information,
when read back. It is possible that it incorrectly interprets this mapping
and replays data into the wrong table. Given the amount of testing we do,
my instinct is to think this is unlikely, but if we can confirm this bug,
it would definitely be a very critical one.

To rule out some scenarios, is it possible that your clients are writing to
the wrong tables? Have you ever seen a failure affecting a table which does
not exist (like what might happen if there's an off-by-one error in the WAL
code)? Or affecting the metadata tables?

Can you reproduce this error reliably, or can you share the relevant ingest
code which can reproduce this failure? Also, what kind of tablet server
failures are you experiencing when this happens?

If you could file a bug report at https://issues.apache.org/browse/ACCUMULO
with any details and/or attachments to help us address the issue, we would
greatly appreciate it. This seems like something we'd want to fix pretty
quickly.

Thanks!


--
Christopher L Tubbs II
http://gravatar.com/ctubbsii

On Wed, Feb 18, 2015 at 6:26 PM, Denis <de...@camfex.cz> wrote:

> Hello.
>
> Few times I noticed that some tables have values they cannot have, and
> those entries have timestamp close to a tabletserver failure time.
> (I mean wrong format, one table has msgpack values at least 10 bytes
> long and another table has 1-byte values and after a failure I read
> one or two 1-byte values in the table where I expect to read msgpack).
>
> I suspect that during recovery process, when WAL is being read, some
> entries are inserted to a wrong table.
>
> May be it is a know bug as I am still using Accumulo 1.6.1
>

Reply via email to