CQL 3.x Update ...USING TIMESTAMP...

2015-03-10 Thread Sachin Nikam
I am planning to use the Update...USING TIMESTAMP... statement to make sure
that I do not overwrite fresh data with stale data while having to avoid
doing at least LOCAL_QUORUM writes.

Here is my table structure.

Table=DocumentStore
DocumentID (primaryKey, bigint)
Document(text)
Version(int)

If the service receives 2 write requests with Version=1 and Version=2,
regardless of the order of arrival, the business requirement is that we end
up with Version=2 in the database.

Can I use the following CQL Statement?

Update DocumentStore using 
SET  Document=,
Version=
where DocumentID=;

Has anybody used something like this? If so was the behavior as expected?

Regards
Sachin


Re: C* 2.0.9 Compaction Error

2015-03-10 Thread 曹志富
Have someone see this before ? How to deal with this error?remove this node
first ,then add this node to cluster?

--
Ranger Tsao

2015-03-10 10:57 GMT+08:00 曹志富 :

> Hi,every one:
>
> I have a 12 nodes C* 2.0.9 cluster for titan.I found some error when doing
> compaction,the exception stack:
>
> java.lang.AssertionError: Added column does not sort as the last column
>
> at
> org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:115)
>
> at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116)
>
> at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:150)
>
> at
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186)
>
> at
> org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98)
>
> at
> org.apache.cassandra.db.compaction.PrecompactedRow.(PrecompactedRow.java:85)
>
> at
> org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196)
>
> at
> org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74)
>
> at
> org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55)
>
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
>
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
>
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>
> at
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:154)
>
> at
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>
> at
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
>
> at
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>
> at
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
>
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:744)
>
>
> I found en issue CASSANDRA-7470
>  ,but it's about
> CQL.
>
> So why this Error?
>
>
> --
> Ranger Tsao
>


Re: how to clear data from disk

2015-03-10 Thread Patrick McFadin
Or just manually delete the files. The directories are broken down by
keyspace and table.

Patrick

On Mon, Mar 9, 2015 at 7:50 PM, 曹志富  wrote:

> nodetool clearsnapshot
>
> --
> Ranger Tsao
>
> 2015-03-10 10:47 GMT+08:00 鄢来琼 :
>
>>  Hi ALL,
>>
>>
>>
>> After drop table, I found the data is not removed from disk, I should
>> reduce the gc_grace_seconds before the drop operation.
>>
>> I have to wait for 10 days, but there is not enough disk.
>>
>> Could you tell me there is method to clear the data from disk quickly?
>>
>> Thank you very much!
>>
>>
>>
>> Peter
>>
>
>


Re: timeout when using secondary index

2015-03-10 Thread Patrick McFadin
Jimmy,

The secondary index is getting scanned since you put the column in your
query. The behavior you are looking for is a coming feature called Global
Indexes slated for 3.0. https://issues.apache.org/jira/browse/CASSANDRA-6477

In the meantime, you could build your own lookup table even with this low
of cardinality. If the point is to find everyone of a certain gender in a
company, give this a try.

create table company_gender (
   company_id uuid,
   gender text,
   person_id uuid,
   PRIMARY KEY (company_id, gender)
)

Each company would be a partition and you could find all males or females
with a single query. The bonus is that you would get paging which will be
much more efficient.

Patrick




On Fri, Mar 6, 2015 at 2:56 PM, Jimmy Lin  wrote:

> Hi,
> Ran into RPC timeout exception when execution a query that involve
> secondary index of a Boolean column when for example the company has more
> than 1k person.
>
> select * from company where company_id= and isMale = true;
>
> such extreme low cardinality of secondary index  like the other docs
> stated, will result in basically 2 large row those values. However, I
> thought since I also bounded the query with my primary partition key, won't
> that be first consulted and then further narrow down the result and be
> efficient?
>
> Also, if I simply do
> select * from company where company_id= ;
> (without the AND clause on secondary index, it return right away)
>
>
> Or mayb Cassandra server internal always parsing the secondary index
> result first?
>
> thanks
>
>
>
> I have a simple table
>
> create table company {
> company_id uuid,
> person_id uuid,
> isMale Boolean,
> PRIMARY KEY (company_id, person_id)
> )
>
>
>
>
>


Re: Inconsistent count(*) and distinct results from Cassandra

2015-03-10 Thread Rumph, Frens Jan
Thanks for the suggestion DuyHai. I assume you mean CL=QUORUM (as in
consistency level, not replication factor). As expected, setting the
consistency level to quorum or all yields equally inconsistent results for
the select count and select distinct queries.

Which is good in a way, because if RF=1 and CL=ONE I would expect an error
if one of the nodes wouldn't be able to answer a query.

Note that there conceptually is no such thing as a quorum or majority when
RF=1. As quorum in C* is defined as floor( (RF / 2) + 1 ), in case of RF=1
this in practice is the same as CL=ONE.

On 10 March 2015 at 18:10, DuyHai Doan  wrote:

> First idea to eliminate any issue with regards to staled data: issue the
> same count query with RF=QUORUM and check whether there are still
> inconsistencies
>
> On Tue, Mar 10, 2015 at 9:13 AM, Rumph, Frens Jan 
> wrote:
>
>> Hi Jens, Mikhail, Daemeon,
>>
>> Thanks for your replies. Sorry for my reply being late ... mails from the
>> user-list were moved to the wrong inbox on my side.
>>
>> I'm in a development environment and thus using replication factor = 1
>> and consistency = ONE with three nodes. So the 'results from different
>> nodes between queries' hypothesis seems unlikely to me. I would expect a
>> timeout if some node wouldn't be able to answer.
>>
>> I tried tracing, but I couldn't really make any of it.
>>
>> For example I performed two select distinct ... from ... queries: Traces
>> for both of them contained more than one line like 'Submitting range
>> requests on ... ranges ...' and 'Submitted ... concurrent range requests
>> covering ... ranges'. These lines occur with varying numbers, e.g. :
>>
>> Submitting range requests on 593 ranges with a concurrency of 75 (1.35
>> rows per range expected)
>> Submitting range requests on 769 ranges with a concurrency of 75 (1.35
>> rows per range expected)
>>
>>
>> Also when looking at the lines like 'Executing seq scan across ...
>> sstables for ...' I saw that in one case which yielded way less partition
>> keys that only the tokens from -922337203685477  to -594461978511041000
>> were included. In a case which yielded much more partition keys, the entire
>> token range did seem to be queried.
>>
>> To reiterate my initial questions: is this behavior to be expected? Am I
>> doing something wrong? Is there a workaround?
>>
>> Best regards,
>> Frens Jan
>>
>> On 4 March 2015 at 22:59, daemeon reiydelle  wrote:
>>
>>> What is the replication? Could you be serving stale data from a node
>>> that was not properly replicated (hints timeout exceeded by a node being
>>> down?)
>>>
>>>
>>>
>>> On Wed, Mar 4, 2015 at 11:03 AM, Jens Rantil 
>>> wrote:
>>>
 Frens,

 What consistency are you querying with? Could be you are simply
 receiving result from different nodes each time.

 Jens

 –
 Skickat från Mailbox 


 On Wed, Mar 4, 2015 at 7:08 PM, Mikhail Strebkov 
 wrote:

> We have observed the same issue in our production Cassandra cluster (5
> nodes in one DC). We use Cassandra 2.1.3 (I joined the list too late to
> realize we shouldn’t user 2.1.x yet) on Amazon machines (created from
> community AMI).
>
> In addition to count variations with 5 to 10% we observe variations
> for the query “select * from table1 where time > '$fromDate' and time <
> '$toDate' allow filtering” results. We iterated through the results
> multiple times using official Java driver. We used that query for a huge
> data migration and were unpleasantly surprised that it is unreliable. In
> our case “nodetool repair” didn’t fix the issue.
>
> So I echo Frens questions.
>
> Thanks,
> Mikhail
>
>
>
>
> On Wed, Mar 4, 2015 at 3:55 AM, Rumph, Frens Jan 
> wrote:
>
>> Hi,
>>
>> Is it to be expected that select count(*) from ... and select
>> distinct partition-key-columns from ... to yield inconsistent results
>> between executions even though the table at hand isn't written to?
>>
>> I have a table in a keyspace with replication_factor = 1 which is
>> something like:
>>
>>  CREATE TABLE tbl (
>> id frozen,
>> bucket bigint,
>> offset int,
>> value double,
>> PRIMARY KEY ((id, bucket), offset)
>> )
>>
>> The frozen udt is:
>>
>>  CREATE TYPE id_type (
>> tags map
>> );
>>
>> When I do select count(*) from tbl several times the actual count
>> varies with 5 to 10%. Also when performing select distinct id, bucket 
>> from
>> tbl the results aren't consistent over several query executions. The 
>> table
>> is not being written to at the time I performed the queries.
>>
>> Is this to be expected? Or is this a bug? Is there a alternative
>> method / workaround?
>>
>> I'm using cqlsh 5.0.1 with Cassandra 2.1.2 on 64bit fedora 21 with

Re: cassandra node jvm stall intermittently

2015-03-10 Thread Jason Wee
heh on the midst of upgrading , Rob ;-)

Jason

On Tue, Mar 10, 2015 at 2:04 AM, Robert Coli  wrote:
> On Sat, Mar 7, 2015 at 1:44 AM, Jason Wee  wrote:
>>
>> hey Ali, 1.0.8
>>
>> On Sat, Mar 7, 2015 at 5:20 PM, Ali Akhtar  wrote:
>>>
>>> What version are you running?
>
>
> Upgrade your very old version to at least 1.2.x (via 1.1.x) ASAP.
>
> =Rob
>