On Mon, Aug 24, 2015 at 10:30 PM, Tom Hood <[email protected]> wrote:
> Hi, > > There appears to be a bug where two rows are merging into one as a result > of doing separate calls to the Iface.mutate method using > RowMutationType.UPDATE_ROW and RecordMutationType.REPLACE_ENTIRE_RECORD. > (I can also see the problem using REPLACE_ROW and REPLACE_ENTIRE_RECORD > instead). > > For example, if the index has 2 rows with 1 record each that has a copy of > the rowId in cf.key: > row A: cf.key=A > row B: cf.key=B > > After an attempt to Iface.mutate row A with exactly the same data, > sometimes the result is: > row A: cf.key=A > row B: cf.key=B cf.key=A > > instead of the expected result of a no-op. The corruption is visible with > "blur get" and "blur query cf.key:B" and an Iface.fetchRow from java. > > For the above, the recordId is always "0" and the rowId is a UUID generated > from java UUID.randomUUID (although for my test I'm also using the same > UUIDs). > > I'm not setting a schema at all in my test program, so all the defaults for > analyzers, fieldless=true, etc. > > I do notice the following show up in the shard server log: INFO ... > [thrift-processors1] search.PrimeDocCache: PrimeDoc for reader > [_k(4.3):C19/4] not stored, because count [13] and freq [16] do not match. > > Restarting blur doesn't seem to help. > > Blur version is 0.2.4. Hadoop stack is CDH 5.1.0 > > Cluster configuration is running 1 shard, 1 controller, 1 namenode all on > the same machine (redhat 6.3 Santiago). > > I have a fairly small test case that if I run repeatedly sometimes fails, > sometimes doesn't. I run it after using blur shell to remove the old table > and create a new one with 1 shard. > > Although it isn't 100% reproducible, it seems to fail pretty often for me. > As I've typed the code in on a different network, I don't have the code for > you yet. > > Have you seen this kind of issue before? > I have not. > > Any suggestions for how to track it down? > Not sure yet, maybe we could reproduce it in the IndexManagerTest. That's where most the the mutation test are located. > > Are there any commands you want me to run on the resulting table that might > yield some clues? > I don't know enough yet to suggest anything. I have opened a jira ticket where we can track the issue. https://issues.apache.org/jira/browse/BLUR-441 I will try to investigate ASAP. Aaron > > Thanks, > -- Tom >
