Re: Ordering by multiple columns?

2016-10-10 Thread Nicolas Douillet
If I correctly understand the answers, the solution to your ordering
question is to use clustering keys.
I'm agree, but I just wanted to warn you about one limitation :  the values
of keys can't be updated, unless by using a delete and then an insert.
(In the case of your song "example", putting the rate as a key can be
tricky if the value has to be frequently updated)


Le lun. 10 oct. 2016 à 22:15, Mikhail Krupitskiy <
mikhail.krupits...@jetbrains.com> a écrit :

> Looks like ordering by multiple columns in Cassandra has few sides that
> are not obvious.
> I wasn’t able to find this information in the official documentation but
> it’s quite well described here:
>
> http://stackoverflow.com/questions/35708118/where-and-order-by-clauses-in-cassandra-cql
>
> Thanks,
> Mikhail
>
> On 10 Oct 2016, at 21:55, DuyHai Doan  wrote:
>
> No, we didn't record the talk this time unfortunately :(
>
> On Mon, Oct 10, 2016 at 8:17 PM, Ali Akhtar  wrote:
>
> Really helpful slides. Is there a video to go with them?
>
> On Sun, Oct 9, 2016 at 11:48 AM, DuyHai Doan  wrote:
>
> Yes it is possible, read this:
> http://www.slideshare.net/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics/24
>
> and the following slides
>
> On Sun, Oct 9, 2016 at 2:04 AM, Ali Akhtar  wrote:
>
> Is it possible to have multiple clustering keys in cassandra, or some
> other way to order by multiple columns?
>
> For example, say I have a table of songs, and each song has a rating and a
> date.
>
> I want to sort songs by rating first, and then with newer songs on top.
>
> So if two songs have 5 rating, and one's date is 1st Feb, the other is 2nd
> Feb, then I want the 2nd feb one to be sorted above the 1st feb one.
>
> Like this:
>
> Select * from songs order by rating, createdAt
>
> Is this possible?
>
>
>
>
>
>


Re: Understanding cassandra data directory contents

2016-10-10 Thread Nicolas Douillet
Hi Json,

I'm not familiar enough with Cassandra 3, but it might be snapshots.
Snapshots are usually hardlinks to sstable directories.

Try this :
nodetool clearsnapshot

Does it change anything?

--
Nicolas

Le sam. 8 oct. 2016 à 21:26, Jason Kania  a écrit :

> Hi Vladamir,
>
> Thanks for the response. I assume then that it is safe to remove the
> directories that are not current as per the system_schema.tables table. I
> have dozens of the same table and haven't dropped and added nearly that
> many times. Do any of the nodetool or other commands clean up these unused
> directories?
>
> Thanks,
>
> Jason Kania
>
> --
> *From:* Vladimir Yudovin 
> *To:* user@cassandra.apache.org; Jason Kania 
> *Sent:* Saturday, October 8, 2016 2:05 PM
> *Subject:* Re: Understanding cassandra data directory contents
>
> Each table has unique id (suffix). If you drop and then recreate table
> with the same name it gets new id.
>
> Try
> *SELECT keyspace_name, table_name, id FROM system_schema.tables ;*
> to determinate actual ID.
>
> You can limit request to specific keyspace or table.
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone  - Hosted Cloud Cassandra on
> Azure and SoftLayer.Launch your cluster in minutes.*
>
>
>  On Sat, 08 Oct 2016 13:42:19 -0400 *Jason
> Kania>* wrote 
>
> Hello,
>
> I am using Cassandra 3.0.9 and I have encountered an issue where the nodes
> in my 3 node cluster have vastly different amounts of data even though they
> should be roughly the same. When I looked through the data directory for my
> database on two of the nodes, I see a number of directories with the same
> prefix, eg:
>
> periodicReading-76eb7510096811e68a7421c8b9466352,
> periodicReading-453d55a0501d11e68623a9d2b6f96e86
> ...
>
> Only one directory with a specific table name prefix has a current date
> and the rest are older.
>
> In contrast, on the node with the least space used, each directory has a
> unique prefix (not shared).
>
> I am wondering what the contents of a Cassandra database directory should
> look like. Are there supposed to be multiple entries for a given table or
> just one?
>
> If just one, what would be a procedure to determine if the other
> directories with the same table are junk that can be removed.
>
> Thanks,
>
> Jason
>
>
>
>
>
>


Re: How does Local quorum consistency work ?? response from fastest node?

2016-09-22 Thread Nicolas Douillet
Oki, have fun :)

Le jeu. 22 sept. 2016 à 18:39, Joseph Tech  a écrit :

> Thanks!. I was just trying with default cqlsh ; now realize that its only
> ONE CL
>
> On 22 Sep 2016 16:16, "Nicolas Douillet" 
> wrote:
>
>> Hi Joseph,
>>
>> The coordinator itself could be one of the replica that holds the data,
>> but looks like it's not your case.
>> So you're right the coordinator should have sent a digest request to
>> another node.
>> Did you executed this request with cql ? Are you sure you have correctly
>> set the consistency before executing the request ?
>>
>> --
>> Nicolas
>>
>>
>>
>> Le jeu. 22 sept. 2016 à 11:52, Joseph Tech  a
>> écrit :
>>
>>> Hi,
>>>
>>> Below is a sample trace for a LOCAL_QUORUM query . I've changed the
>>> query table/col names and actual node IP addresses to IP.1 and IP.coord
>>> (for the co-ordinator node). RF=3 and we have 2 DCs. Don't we expect to see
>>> an "IP.2" since LOCAL_QUORUM requires the co-ordinator to receive at least
>>> 2 responses? . What am i missing here?
>>>
>>> activity
>>>
>>>
>>> | timestamp  | source
>>>  | source_elapsed
>>>
>>> --++---+
>>>
>>>
>>>
>>>Execute CQL3 query | 2016-09-15 04:17:55.401000 | IP.coord |
>>>  0
>>>  Parsing SELECT A,B,C from T WHERE key1='K1' and key2='K2' and key3='K3'
>>> and key4='K4'; [SharedPool-Worker-2] | 2016-09-15 04:17:55.402000 |
>>> IP.coord | 57
>>>
>>>
>>>
>>> Preparing statement [SharedPool-Worker-2] | 2016-09-15 04:17:55.403000 |
>>> IP.coord |140
>>>
>>>
>>>reading data from
>>> /IP.1 [SharedPool-Worker-2] | 2016-09-15 04:17:55.403000 | IP.coord |
>>> 1343
>>>
>>>
>>> Sending READ message to /IP.1
>>> [MessagingService-Outgoing-/IP.1] | 2016-09-15 04:17:55.404000 | IP.coord |
>>>   1388
>>>
>>>
>>>  REQUEST_RESPONSE message received from /IP.1
>>> [MessagingService-Incoming-/IP.1] | 2016-09-15 04:17:55.404000 | IP.coord |
>>>   2953
>>>
>>>
>>> Processing response from
>>> /IP.1 [SharedPool-Worker-3] | 2016-09-15 04:17:55.404000 | IP.coord |
>>> 3001
>>>
>>>
>>>  READ message received from /IP.coord
>>> [MessagingService-Incoming-/IP.coord] | 2016-09-15 04:17:55.405000 | IP.1 |
>>>117
>>>
>>>
>>>  Executing single-partition query
>>> on user_carts [SharedPool-Worker-1] | 2016-09-15 04:17:55.405000 | IP.1 |
>>>  253
>>>
>>>
>>>Acquiring
>>> sstable references [SharedPool-Worker-1] | 2016-09-15 04:17:55.406000 |
>>> IP.1 |262
>>>
>>>
>>> Merging
>>> memtable tombstones [SharedPool-Worker-1] | 2016-09-15 04:17:55.406000 |
>>> IP.1 |295
>>>
>>>
>>>Bloom filter allows skipping
>>> sstable 729 [SharedPool-Worker-1] | 2016-09-15 04:17:55.406000 | IP.1 |
>>>341
>>>
>>>
>>>Partition index with 0 entries found for
>>> sstable 713 [SharedPool-Worker-1] | 2016-09-15 04:17:55.407000 | IP.1 |
>>>411
>>>
>>>
>>>   Seeking to partition indexed section
>>> in data file [SharedPool-Worker-1] | 2016-09-15 04:17:55.407000 | IP.1 |
>>>  414
>>>
>>>
>>>   Skipped 0/2 non-slice-intersecting sstables, included 0 due
>>> to tombstones [SharedPool-Worker-1] | 2016-09-15 04:17:55.407000 | IP.1 |
>>>  854
>&g

Re: How does Local quorum consistency work ?? response from fastest node?

2016-09-22 Thread Nicolas Douillet
Hi Joseph,

The coordinator itself could be one of the replica that holds the data, but
looks like it's not your case.
So you're right the coordinator should have sent a digest request to
another node.
Did you executed this request with cql ? Are you sure you have correctly
set the consistency before executing the request ?

--
Nicolas



Le jeu. 22 sept. 2016 à 11:52, Joseph Tech  a écrit :

> Hi,
>
> Below is a sample trace for a LOCAL_QUORUM query . I've changed the query
> table/col names and actual node IP addresses to IP.1 and IP.coord (for the
> co-ordinator node). RF=3 and we have 2 DCs. Don't we expect to see an
> "IP.2" since LOCAL_QUORUM requires the co-ordinator to receive at least 2
> responses? . What am i missing here?
>
> activity
>
>
>   | timestamp  | source
>  | source_elapsed
>
> --++---+
>
>
>
>  Execute CQL3 query | 2016-09-15 04:17:55.401000 | IP.coord |
>0
>  Parsing SELECT A,B,C from T WHERE key1='K1' and key2='K2' and key3='K3'
> and key4='K4'; [SharedPool-Worker-2] | 2016-09-15 04:17:55.402000 |
> IP.coord | 57
>
>
>   Preparing
> statement [SharedPool-Worker-2] | 2016-09-15 04:17:55.403000 | IP.coord |
>  140
>
>
>  reading data from
> /IP.1 [SharedPool-Worker-2] | 2016-09-15 04:17:55.403000 | IP.coord |
> 1343
>
>
>   Sending READ message to /IP.1
> [MessagingService-Outgoing-/IP.1] | 2016-09-15 04:17:55.404000 | IP.coord |
>   1388
>
>
>REQUEST_RESPONSE message received from /IP.1
> [MessagingService-Incoming-/IP.1] | 2016-09-15 04:17:55.404000 | IP.coord |
>   2953
>
>
>   Processing response from
> /IP.1 [SharedPool-Worker-3] | 2016-09-15 04:17:55.404000 | IP.coord |
> 3001
>
>
>READ message received from /IP.coord
> [MessagingService-Incoming-/IP.coord] | 2016-09-15 04:17:55.405000 | IP.1 |
>117
>
>
>Executing single-partition query on
> user_carts [SharedPool-Worker-1] | 2016-09-15 04:17:55.405000 | IP.1 |
>253
>
>
>  Acquiring sstable
> references [SharedPool-Worker-1] | 2016-09-15 04:17:55.406000 | IP.1 |
>262
>
>
>   Merging memtable
> tombstones [SharedPool-Worker-1] | 2016-09-15 04:17:55.406000 | IP.1 |
>295
>
>
>  Bloom filter allows skipping
> sstable 729 [SharedPool-Worker-1] | 2016-09-15 04:17:55.406000 | IP.1 |
>341
>
>
>  Partition index with 0 entries found for
> sstable 713 [SharedPool-Worker-1] | 2016-09-15 04:17:55.407000 | IP.1 |
>411
>
>
> Seeking to partition indexed section in
> data file [SharedPool-Worker-1] | 2016-09-15 04:17:55.407000 | IP.1 |
>  414
>
>
> Skipped 0/2 non-slice-intersecting sstables, included 0 due to
> tombstones [SharedPool-Worker-1] | 2016-09-15 04:17:55.407000 | IP.1 |
>854
>
>
>Merging data from memtables and
> 1 sstables [SharedPool-Worker-1] | 2016-09-15 04:17:55.408000 | IP.1 |
>860
>
>
> Read 1 live and 1
> tombstone cells [SharedPool-Worker-1] | 2016-09-15 04:17:55.408000 | IP.1 |
>910
>
>
>  Enqueuing response to
> /IP.coord [SharedPool-Worker-1] | 2016-09-15 04:17:55.408000 | IP.1 |
> 1051
>
>
>   Sending REQUEST_RESPONSE message to /IP.coord
> [MessagingService-Outgoing-/IP.coord] | 2016-09-15 04:17:55.409000 | IP.1 |
>   1110
>
>
>
>Request complete | 2016-09-15 04:17:55.404067 | IP.coord |
> 3067
>
> Thanks,
> Joseph
>
>
>
>
> On Tue, Sep 20, 2016 at 3:07 AM, Nicolas Douillet <
> nicolas.douil...@gmail.com> wrote:
>
>> Hi Pranay,
>>
>> I'll try to answer the more precisely as I can.
&g

Re: How does Local quorum consistency work ?? response from fastest node?

2016-09-19 Thread Nicolas Douillet
Hi Pranay,

I'll try to answer the more precisely as I can.

Note that what I'm going to explain is valid only for reads, write requests
work differently.
I'm assuming you have only one DC.

   1. The coordinator gets a list of sorted live replicas. Replicas are
   sorted by proximity.
   (I'm not sure enough how it works to explain it here, by snitch I guess).

   2. By default *the coordinator keeps the exact list of nodes necessary*
   to ensure the desired consistency (2 nodes for RF=3),
   but, according the read repair chance provided on each column family
   (10% of the requests by default), *it might keep all the replicas* (if
   one DC).

   3. The coordinator checks if enough nodes are alive before trying any
   request. If not, no need to go further.
   You'll have a slightly different error message :

*Live nodes  do not satisfy ConsistencyLevel (2 required) *
   4. And in substance the coordinator waits for the exact number of
   responses to achieve the consistency.
   To be more specific, the coordinator is not requesting the same to each
   involved replicas (to one or two, the closest, a full data read, and for
   the others only a digest), and is waiting for the exact number of responses
   to achieve the consistency with at least one full data present.
   (There is of course more to explain, if the digests do not match for
   example ...)

   So you're right when you talk about the fastest responses, but only
   under certain conditions and if additional replicas are requested.


I'm certainly missing some points.
Is that clear enough?

--
Nicolas



Le lun. 19 sept. 2016 à 22:16, Pranay akula  a
écrit :

>
>
> i always have this doubt when a cassandra node got a read request for
> local quorum consistency does coordinator node asks all nodes with replicas
> in that DC for response or just the fastest responding nodes to it who's
> count satisfy the local quorum.
>
> In this case RF is 3 Cassandra timeout during read query at consistency
> LOCAL_QUORUM (2 responses were required but only 1 replica responded)
> does this mean coordinator asked only two replicas with fastest response
> for data and 1 out of 2 timed out  or  coordinator asked all nodes with
> replicas which means all three (3)  and 2 out of 3 timed out as i only got
> single response back.
>
>
>
> Thanks
>
> Pranay
>


Re: Inconsistent results with Quorum at different times

2016-09-16 Thread Nicolas Douillet
Hi Jaydeep,

Yes, dealing with tombstones in Cassandra is very tricky.

Cassandra keeps tombstones to mark deleted columns and distribute (hinted
handoff, full repair, read repair ...) to the other nodes that missed the
initial remove request. But Cassandra can't afford to keep those tombstones
lifetime and has to wipe them. The tradeoff is that after a time,
GCGraceSeconds, configured on each column family, the tombstones are fully
dropped during compactions and are not distributed to the other nodes
anymore.
If one node didn't have the chance to receive this tombstone during this
period, and kept and old column value, then the deleted column will
reappear.

So I guess in your case that the time T2 is older than this GCGraceSeconds ?

The best way to avoid all those phantom columns to come back from death is
to run a full repair on your cluster at least once every GCGraceSeconds.
Did you try this?

--
Nicolas


Le sam. 17 sept. 2016 à 00:05, Jaydeep Chovatia 
a écrit :

> Hi,
>
> We have three node (N1, N2, N3) cluster (RF=3) and data in SSTable as
> following:
>
> N1:
> SSTable: Partition key K1 is marked as tombstone at time T2
>
> N2:
> SSTable: Partition key K1 is marked as tombstone at time T2
>
> N3:
> SSTable: Partition key K1 is valid and has data D1 with lower time-stamp
> T1 (T1 < T2)
>
>
> Now when I read using quorum then sometimes it returns data D1 and
> sometimes it returns empty results. After tracing I found that when N1 and
> N2 are chosen then we get empty data, when (N1/N2) and N3 are chosen then
> D1 data is returned.
>
> My point is when we read with Quorum then our results have to be
> consistent, here same query give different results at different times.
>
> Isn't this a big problem with Cassandra @QUORUM (with tombstone)?
>
>
> Thanks,
> Jaydeep
>


Re: large system hint partition

2016-09-16 Thread Nicolas Douillet
Hi Erza,

Have you a dead node in your cluster?
Because the coordinator stores a hint about dead replicas in the local
system.hints when a node is dead or didn't respond to a write request.

--
Nicolas



Le sam. 17 sept. 2016 à 00:12, Ezra Stuetzel  a
écrit :

> What would be the likely causes of large system hint partitions? Normally
> large partition warnings are for user defined tables which they are writing
> large partitions to. In this case, it appears C* is writing large
> partitions to the system.hints table. Gossip is not backed up.
>
> version: C* 2.2.7
>
> WARN  [MemtableFlushWriter:134] 2016-09-16 04:27:39,220
> BigTableWriter.java:184 - Writing large partition
> system/hints:7ce838aa-f30f-494a-8caa-d44d1440e48b (128181097 bytes)
>
>
> Thanks,
>
> Ezra
>


Re: race condition for quorum consistency

2016-09-14 Thread Nicolas Douillet
Hi,

In my opinion the guaranty provided by Cassandra is :
  if your write request in Quorum *succeed*, then the next (after the
write response) read requests in Quorum (that succeed too) will be
consistent
  (actually CL.Write + CL.Read > RF)

Of course while you haven't received a valid response to your write request
in Quorum the cluster is in a inconsistent state, and you have *to retry
your write request.*

That said, Cassandra provides some other important behaviors that will tend
to reduce the time of this inconsistent state :

   - the coordinator will not send the request to only the nodes that
   should answer to satisfy the CL, but to all nodes that should have
the data (of
   course with RF=3, only A,B&C are involved)

   - during read requests, cassandra will ask to one node the data and to
   the others involved in the CL a digest, and if all digests do not match,
   will ask for them the entire data, handle the merge and finally will ask to
   those nodes a background repair. Your write may have succeed during this
   time.

   - according to a chance ratio, cassandra will ask *sometimes* a read to
   all nodes holding the data, not only the ones involved in the CL and
   execute background repairs

   - you have to schedule repairs regularly


I'd add that if some nodes do not succeed to handle write requests in time,
they may be under pressure, and there is a small chance that they succeed
on a read request :)

And finally what is time? From where/when? You may schedule a read after an
other but receive the result before. Writing in Quorum is not writing
within a transaction, you'll certainly have to made some tradeoff.

Regards,

--
Nicolas




Le mer. 14 sept. 2016 à 21:14, Alexander Dejanovski 
a écrit :

> My understanding of the described scenario is that the write hasn't
> succeeded when reads are fired, as B and C haven't processed the mutation
> yet.
>
> There would be 3 clients here and not 2 : C1 writes, C2 and C3 read.
>
> So the race condition could still happen in this particular case.
>
> Le mer. 14 sept. 2016 21:07, Work  a écrit :
>
>> Hi Alex:
>>
>> Hmmm ... Assuming clock skew is eliminated And assuming nodes are up
>> and available ... And assuming quorum writes and quorum reads and everyone
>> waiting for success ( which is NOT The OP scenario), Two different clients
>> will be guaranteed to see all successful writes, or be told that read
>> failed.
>>
>> C1 writes at quorum to A,B
>> C2 reads at quorum.
>> So it tries to read from ALL nodes, A,B, C.
>> If A,B respond --> success
>> If A,C respond --> conflict
>> If B, C respond --> conflict
>> Because a quorum (2 nodes) responded, the coordinator will return the
>> latest time stamp and may issue read repair depending on YAML settings.
>>
>> So where do you see only one client having this guarantee?
>>
>> Regards,
>>
>> James
>>
>> On Sep 14, 2016, at 4:00 AM, Alexander DEJANOVSKI 
>> wrote:
>>
>> Hi,
>>
>> the analysis is valid, and strong consistency the Cassandra way means
>> that one client writing at quorum, then reading at quorum will always see
>> his previous write.
>> Two different clients have no guarantee to see the same data when using
>> quorum, as illustrated in your example.
>>
>> Only options here are to route requests to specific clients based on some
>> id to guarantee the sequence of operations outside of Cassandra (the same
>> client will always be responsible for a set of ids), or raise the CL to ALL
>> at the expense of availability (you should not do that).
>>
>>
>> Cheers,
>>
>> Alex
>>
>> Le mer. 14 sept. 2016 à 11:47, Qi Li  a écrit :
>>
>>> hi all,
>>>
>>> we are using quorum consistency, and we *suspect* there may be a race
>>> condition during the write. lets say RF is 3. so write will wait for at
>>> least 2 nodes to ack. suppose there is only 1 node acked(node A). the other
>>> 2 nodes(node B, C) are still waiting to update. there come two read requests
>>> one read is having the data responded from the node B and C, so version
>>> 1 us returned.
>>> the other node is having data responded from node A and B, so the latest
>>> version 2 is returned.
>>>
>>> so clients are getting different data at the same time. is this a valid
>>> analysis? if so, is there any options we can set to deal with this issue?
>>>
>>> thanks
>>> Ken
>>>
>> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>