[ 
https://issues.apache.org/jira/browse/CASSANDRA-12431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15488949#comment-15488949
 ] 

Fei Fang commented on CASSANDRA-12431:
--------------------------------------

The guid means id (sorry I renamed the schema but not the query).

I understand that I can insert null into the table. The issue is when I do 
{code}select * from email_histogram where id='1';{code}, I get 20k rows back 
and the log shows score is null for email '2', then I did {code}select * from 
email_histogram where id='1' and email='2' {code}, then I do get a float number 
back;

At week 1, I do {code} insert into email_histogram (id, email,score) values 
('1','8', 2.1);  insert into email_histogram (id, email,score) values ('1','3', 
3.1); {code}
At week 3, I might do {code} insert into email_histogram (id, email,score) 
values ('1','8', 2.3);  insert into email_histogram (id, email,score) values 
('1','3', 3.3); {code}

The emails between week1 and week3 mostly overlap, but there might be some 
emails in week3 only or some emails in week1 only.

So we don't tombstone the entire partition, only on the columns of some 
clustering keys.

What do you mean "Is the partition only written once and never used again"?

Once a bad partition is found, it doesn't continue displaying the odd behavior. 
It seems random.

I can try with read at Quorum and write at ALL.  Do you recommend tracing on a 
staging server? I have tried tracing locally but not sure if server can handle 
that much log.

I don't think the un-repaired partition could be the cause unless one write 
could *partially* succeed, in other words, we insert the values for ALL columns 
in each write. If the email is there, I expect the score to have some value. We 
never insert null value for score. Is it possible for Cassandra to write email 
but not score from one write query?

> Getting null value for the field that has value when query result has many 
> rows
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12431
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12431
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Fei Fang
>            Assignee: Edward Capriolo
>             Fix For: 2.2.x
>
>
> Hi,
> We get null value (not an older value, but null) for a float column (score) 
> from a 20k result row query. However, when we fetch data for that specific 
> row, the column actually has value.
> The table schema is like this:
> {code}
> CREATE TABLE IF NOT EXISTS email_histogram (
> id text,
> email text,
> score float,
> PRIMARY KEY (id, email)
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = 'KEYS_ONLY'
> AND comment = ''
> AND compaction =
> {'tombstone_threshold': '0.1', 'tombstone_compaction_interval': '300', 
> 'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
> AND compression =
> {'sstable_compression': 'org.apache.cassandra.io.compress.SnappyCompressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 864000
> AND gc_grace_seconds = 86400
> AND memtable_flush_period_in_ms = 0
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
> {code}
> This is my read query: SELECT * FROM " + TABLE_NAME + " WHERE guid = ?
> I'm using consistency One when querying it and Quorum when updating it. If I 
> insert data, I insert for all the columns, never only part of the column. I 
> understand that I might get out of date value since I'm using One to read, 
> but again here I'm not getting out of date value, but just "null". 
> This is happening on our staging server which servers 20k users, and we see 
> this error happening 10+ times everyday. I don't have an exact number of how 
> many times we do the query, but nodetool cfstats shows local read count of 
> 85314 for this table for the last 18 hours and we have 6 cassandra nodes in 
> this cluster so about 500k querying for 18 hours.
> We update the table every 3 weeks. The table has 20k rows for each key (guid) 
> I'm querying for. Out of the 20k rows, only a couple at most are null and 
> they are not the same every time we query the same key.
> We are using C# driver version 3.0.1 and Cassandra version 2.2.6.44.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to