Discrepancy while paging through table, and static column updated inbetween

2016-04-19 Thread Siddharth Verma
Hi,

We are using cassandra(dsc3.0.3) on production.

For some purpose, we were doing a full table scan (setPagingState and
getPagingState used on ResultSet in java program), and there has been some
discrepancy when we ran the same job multiple times.
Each time some new data was added to the output, and some was left out.

Side Note 1 :
Table structure
col1, col2, col3, col4, col5, col6
Primary key(col1, col2)
col5 is static column
col6 static column. Used to explicitly store updated time when col5 changed


Sample Data
1,A,AA,AAA,STATIC,T1
1,B,BB,BBB,STATIC,T1
1,C,CC,CCC,STATIC,T1
1,D,DD,DDD,STATIC,T1

For some key, sometime col6 was updated while the job was running, so some
values were not printed for that partition key.

Side Note 2 :
we did -> select col6, writetime(col6) from ... where col1=... and col2=...
For the data that was missed out to make sure that particular entry wasn't
added later.


Side Note 3:
The above scenario that some col6 was updated while job was running,
therefore some entry for that partition key was ignored, is an assumption
from our end.
We can't understand why some entries were not printed in the table scan.


Re: Discrepancy while paging through table, and static column updated inbetween

2016-04-19 Thread Tyler Hobbs
This sounds similar to https://issues.apache.org/jira/browse/CASSANDRA-10010,
but that only affected 2.x.  Can you open a Jira ticket with your table
schema, the problematic query, and the details you posted here?

On Tue, Apr 19, 2016 at 10:25 AM, Siddharth Verma <
verma.siddha...@snapdeal.com> wrote:

> Hi,
>
> We are using cassandra(dsc3.0.3) on production.
>
> For some purpose, we were doing a full table scan (setPagingState and
> getPagingState used on ResultSet in java program), and there has been some
> discrepancy when we ran the same job multiple times.
> Each time some new data was added to the output, and some was left out.
>
> Side Note 1 :
> Table structure
> col1, col2, col3, col4, col5, col6
> Primary key(col1, col2)
> col5 is static column
> col6 static column. Used to explicitly store updated time when col5 changed
>
>
> Sample Data
> 1,A,AA,AAA,STATIC,T1
> 1,B,BB,BBB,STATIC,T1
> 1,C,CC,CCC,STATIC,T1
> 1,D,DD,DDD,STATIC,T1
>
> For some key, sometime col6 was updated while the job was running, so some
> values were not printed for that partition key.
>
> Side Note 2 :
> we did -> select col6, writetime(col6) from ... where col1=... and col2=...
> For the data that was missed out to make sure that particular entry wasn't
> added later.
>
>
> Side Note 3:
> The above scenario that some col6 was updated while job was running,
> therefore some entry for that partition key was ignored, is an assumption
> from our end.
> We can't understand why some entries were not printed in the table scan.
>
>


-- 
Tyler Hobbs
DataStax 


Re: Discrepancy while paging through table, and static column updated inbetween

2016-04-28 Thread Siddharth Verma
Hi Tyler,
I have created a jira for another issue, which have encountered. It is not
limited only to our speculation about static column update.
https://issues.apache.org/jira/browse/CASSANDRA-11680

Thanks


On Tue, Apr 19, 2016 at 10:37 PM, Tyler Hobbs  wrote:

> This sounds similar to
> https://issues.apache.org/jira/browse/CASSANDRA-10010, but that only
> affected 2.x.  Can you open a Jira ticket with your table schema, the
> problematic query, and the details you posted here?
>
> On Tue, Apr 19, 2016 at 10:25 AM, Siddharth Verma <
> verma.siddha...@snapdeal.com> wrote:
>
>> Hi,
>>
>> We are using cassandra(dsc3.0.3) on production.
>>
>> For some purpose, we were doing a full table scan (setPagingState and
>> getPagingState used on ResultSet in java program), and there has been some
>> discrepancy when we ran the same job multiple times.
>> Each time some new data was added to the output, and some was left out.
>>
>> Side Note 1 :
>> Table structure
>> col1, col2, col3, col4, col5, col6
>> Primary key(col1, col2)
>> col5 is static column
>> col6 static column. Used to explicitly store updated time when col5
>> changed
>>
>>
>> Sample Data
>> 1,A,AA,AAA,STATIC,T1
>> 1,B,BB,BBB,STATIC,T1
>> 1,C,CC,CCC,STATIC,T1
>> 1,D,DD,DDD,STATIC,T1
>>
>> For some key, sometime col6 was updated while the job was running, so
>> some values were not printed for that partition key.
>>
>> Side Note 2 :
>> we did -> select col6, writetime(col6) from ... where col1=... and
>> col2=...
>> For the data that was missed out to make sure that particular entry
>> wasn't added later.
>>
>>
>> Side Note 3:
>> The above scenario that some col6 was updated while job was running,
>> therefore some entry for that partition key was ignored, is an assumption
>> from our end.
>> We can't understand why some entries were not printed in the table scan.
>>
>>
>
>
> --
> Tyler Hobbs
> DataStax 
>