Thanks Tom I will take a look. On Wed, Aug 26, 2015 at 2:12 PM, Tom Hood <[email protected]> wrote:
> I uploaded a test program to the jira issue that demonstrates the problem > I'm seeing. > > Please let me know if you are able to reproduce the problem and whether you > think there's a workaround for it that doesn't involve a patch. > > Thanks, > -- Tom > > > On Tue, Aug 25, 2015 at 12:51 PM, Aaron McCurry <[email protected]> > wrote: > > > On Mon, Aug 24, 2015 at 10:30 PM, Tom Hood <[email protected]> wrote: > > > > > Hi, > > > > > > There appears to be a bug where two rows are merging into one as a > result > > > of doing separate calls to the Iface.mutate method using > > > RowMutationType.UPDATE_ROW and > RecordMutationType.REPLACE_ENTIRE_RECORD. > > > (I can also see the problem using REPLACE_ROW and REPLACE_ENTIRE_RECORD > > > instead). > > > > > > For example, if the index has 2 rows with 1 record each that has a copy > > of > > > the rowId in cf.key: > > > row A: cf.key=A > > > row B: cf.key=B > > > > > > After an attempt to Iface.mutate row A with exactly the same data, > > > sometimes the result is: > > > row A: cf.key=A > > > row B: cf.key=B cf.key=A > > > > > > instead of the expected result of a no-op. The corruption is visible > > with > > > "blur get" and "blur query cf.key:B" and an Iface.fetchRow from java. > > > > > > For the above, the recordId is always "0" and the rowId is a UUID > > generated > > > from java UUID.randomUUID (although for my test I'm also using the same > > > UUIDs). > > > > > > I'm not setting a schema at all in my test program, so all the defaults > > for > > > analyzers, fieldless=true, etc. > > > > > > I do notice the following show up in the shard server log: INFO ... > > > [thrift-processors1] search.PrimeDocCache: PrimeDoc for reader > > > [_k(4.3):C19/4] not stored, because count [13] and freq [16] do not > > match. > > > > > > Restarting blur doesn't seem to help. > > > > > > Blur version is 0.2.4. Hadoop stack is CDH 5.1.0 > > > > > > Cluster configuration is running 1 shard, 1 controller, 1 namenode all > on > > > the same machine (redhat 6.3 Santiago). > > > > > > I have a fairly small test case that if I run repeatedly sometimes > fails, > > > sometimes doesn't. I run it after using blur shell to remove the old > > table > > > and create a new one with 1 shard. > > > > > > Although it isn't 100% reproducible, it seems to fail pretty often for > > me. > > > As I've typed the code in on a different network, I don't have the code > > for > > > you yet. > > > > > > Have you seen this kind of issue before? > > > > > > > I have not. > > > > > > > > > > Any suggestions for how to track it down? > > > > > > > Not sure yet, maybe we could reproduce it in the IndexManagerTest. > That's > > where most the the mutation test are located. > > > > > > > > > > Are there any commands you want me to run on the resulting table that > > might > > > yield some clues? > > > > > > > I don't know enough yet to suggest anything. I have opened a jira ticket > > where we can track the issue. > > > > https://issues.apache.org/jira/browse/BLUR-441 > > > > I will try to investigate ASAP. > > > > Aaron > > > > > > > > > > Thanks, > > > -- Tom > > > > > >
