[ 
https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028857#comment-13028857
 ] 

Jonathan Ellis edited comment on CASSANDRA-2401 at 5/4/11 5:50 PM:
-------------------------------------------------------------------

I found *a* bug that could cause this: Cassandra will re-create a deleted index 
entry if it gets a write with an obsolete timestamp, but the data row tombstone 
will correctly suppress an update there. (So when you do an index query for 
value=X, and the index says "row K has that value," then you get an error 
trying to read row K that doesn't exist.)

I don't think this is the bug Tey Kar is hitting, though, because unless I'm 
mistaken you won't get this NPE until after the data row tombstone is removed 
by compaction after gc_grace_seconds.  4 days isn't enough to see that unless 
you've tweaked gc_g_s.

Still, it's worth fixing.  Patch attached.  (Also adds an assert w/ more 
information if/when another way of triggering this is found.)

      was (Author: jbellis):
    I found *a* bug that could cause this: Cassandra will re-create a deleted 
index entry if it gets a write with an obsolete timestamp, but the data row 
tombstone will correctly suppress an update there.

I don't think this is the bug Tey Kar is hitting, though, because unless I'm 
mistaken you won't get this NPE until after the data row tombstone is removed 
by compaction after gc_grace_seconds.  4 days isn't enough to see that unless 
you've tweaked gc_g_s.

Still, it's worth fixing.  Patch attached.  (Also adds an assert w/ more 
information if/when another way of triggering this is found.)
  
> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java 
> scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = 
> getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is 
> returned null, causing NULL exception in "satisfies(data, clause, primary)" 
> which is not captured. The callback got timeout and return a Timeout 
> exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in 
> removeDeletedCF(), which return the null there. (I am new and trying to 
> understand the logics around still). Instead of crash to NULL, could we 
> bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 
> threads simulate concurrent users to perform the actions above, and do a 
> query to all rows periodically. I have Column Family with rows (as File) and 
> columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test 
> on 4th day, 1 of the users failed to query the files (timeout exception 
> received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to