Re: who does generate timestamp during the write?

2015-09-08 Thread ibrahim El-sanosi
Yes, that you a lot

On Tue, Sep 8, 2015 at 5:25 PM, Tyler Hobbs  wrote:

>
> On Sat, Sep 5, 2015 at 8:32 AM, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrote:
>
>> So in this scenario, the latest data that wrote to the replicas is [K1,
>> V2] which should be the correct one, but it reads [K1,V1] because of divert
>> clock.
>>
>> Can such scenario occur?
>>
>
> Yes, it most certainly can.  There are a couple of pieces of advice for
> this.  First, run NTP on all of your servers.  Second, if clock drift of a
> second or so would cause problems for your data model (like your example),
> change your data model.  Usually this means creating separate rows for each
> version of the value (by adding a timuuid to the primary key, for example),
> but in some cases lightweight transactions may also be suitable.
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>


Re: Is Cassandra really Strong consistency?

2015-09-07 Thread ibrahim El-sanosi
Yes, you are right, I should not have linked my scenario with Strong
consistency. Thank you


Ibrahim


On Mon, Sep 7, 2015 at 2:01 PM, Ryan Svihla  wrote:

> The condition you bring up is a misconfigured cluster period, and no
> matter how you look at it, that's the case. In other words, the scenario
> you're bringing up does not get to the heart of the matter of Cassandra
> having "Strong Consistency" or not, your example I'm sorry to say fails in
> this regard.
>
> However, lets get at what I believe you're attempting to talk about in
> reality IE race condition protection when you desire a set order, this by
> definition is the type of guarantee provided by linearizability. So without
> SERIAL or LOCAL_SERIAL consistency when using a data model that depends on
> _order_ (which your example does) you're going to be unhappy, ALL or ONE
> consistency levels do nothing to address your example with or without clock
> skew.
>
> In theory the last timestamp of a given table could probably be satisfied
> well enough for most problem domains by just keeping the servers pointing
> to the same ntp server, in practice this is a very rare valid use case as
> clusters doing several hundred thousand transactions per second (not
> uncommon) would find that "last timestamp" is hopelessly wrong every time
> to at best be an approximation, no matter the database technology.
>
>
>
> On Mon, Sep 7, 2015 at 6:20 AM, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrote:
>
>> ""It you need strong consistency and don't mind lower transaction rate,
>> you're better off with base""
>> I wish you can explain more how this statment relate to the my post?
>> Regards,
>>
>
>
>
> --
>
> Thanks,
> Ryan Svihla
>
>


Re: Is Cassandra really Strong consistency?

2015-09-07 Thread ibrahim El-sanosi
""It you need strong consistency and don't mind lower transaction rate,
you're better off with base""
I wish you can explain more how this statment relate to the my post?
Regards,


Re: Is Cassandra really Strong consistency?

2015-09-07 Thread ibrahim El-sanosi
Ok,



With LWT, I am completely understand that it will achieve total
order/linearizability, therefore above scenario cannot occur.


However, when you said "the scenario  will occur if your clocks are not
sync’d". This is ambiguous statement. Because both client and server sides
are likely to have different wall-clock for many reasons:

1. Clients and Cassandra server are located in different regions, resulting
in different timestamps.

2. Among Cassandra servers (or replicas) is also possible to have different
timestamps, so above scenario can occur. Even we use NTP to synchronize the
clock, the scenario can happen at least the different in milliseconds.


That means that the timestamps for writes are derived either from a single
Cassandra server clock, or a single app server clock. These clocks can flow
backwards.


What do you think?



Ibrahim

On Mon, Sep 7, 2015 at 1:26 AM, Jeff Jirsa 
wrote:

>
> Yes, your scenario can occur, and will occur if your clocks are not sync’d.
>
> Either you sync your clocks to appropriate tolerances, or you don’t write
> without checking the existing value (with LWT). There is no other
> resolution in cassandra – there are no vector clocks to allow you to manage
> the conflict on your own at this point.
>
>
> From: ibrahim El-sanosi
> Reply-To: "user@cassandra.apache.org"
> Date: Sunday, September 6, 2015 at 11:28 AM
>
> To: "user@cassandra.apache.org"
> Subject: Re: Is Cassandra really Strong consistency?
>
>
>
>
>
> Yes, LWT is another case and different compared to what my scenario is
> about. I am not talking about LWT and CAS, it is true that LWT uses logical
> clock by utilising Paxos. But my scenario is talking about using timestamp
> and Last-Write-Wins.
>
>
>
>
>
> If anyone can read the above scenario and confirm whether this can occur
> or not, if it is possible, how current Cassandra can solve it?
>
>
>
> Regards,
>
>
>
> Ibrahim
>
>
>
>
>
> If anyone can read the above scenario and confirm whether this can occur
> or not, if it is possible, how current Cassandra can solve it?
>
>
>
> Regards,
>
>
>
> Ibrahim
>
> On Sun, Sep 6, 2015 at 5:57 PM, Jeff Jirsa 
> wrote:
>
>> In the cases where NTP and client timestamps with microsecond resolution
>> is insufficient, LWT “IF EXISTS, IF NOT EXISTS” is generally used.
>>
>>
>> From: ibrahim El-sanosi
>> Reply-To: "user@cassandra.apache.org"
>> Date: Sunday, September 6, 2015 at 7:40 AM
>> To: "user@cassandra.apache.org"
>>
>> Subject: Re: Is Cassandra really Strong consistency?
>>
>>
>>
>> I have done some research about “timestamps could jump back and forth
>> arbitrarily if you talk to different nodes”.
>>
>> To summarise,  it is possible in Cassandra for following scenario can
>> happen in sequence:
>>
>>
>>
>>1. Process A writes w1 with timestamp t=2
>>2. Process B reads w1
>>3. Process B writes w2 with timestamp t=1
>>4. Process B reads w1, but expected w2
>>
>> If the system clock goes backwards for any reason, Cassandra’s session
>> consistency guarantees no longer hold, even consistency level is write/read
>> CL = QOURUM  or write CL = ALL and read CL =one.
>>
>>
>>
>> Moreover, even we use NTP, the problem above can occur. That means that
>> the timestamps for writes are derived either from a single Cassandra server
>> clock, or a single app server clock. These clocks can flow backwards, for a
>> number of “reasons”:
>>
>>- *Hardware wonkiness can push clocks days or centuries into the
>>future or past.*
>>- *Virtualization can wreak havoc on kernel timekeeping.*
>>- *Misconfigured nodes may not have NTP enabled, or may not be able
>>to reach upstream sources.*
>>- *Upstream NTP servers can lie.*
>>- *When the problem is identified and fixed, NTP corrects large time
>>differentials by jumping the clock discontinously to the correct time.*
>>- *Even when perfectly synchronized, POSIX time itself is not
>>    monotonic*.
>>
>>
>>
>> If you want to read more this link can give you a lot hints.
>>
>>
>>
>> Regards,
>>
>>
>>
>> Ibrahim
>>
>> On Sun, Sep 6, 2015 at 2:01 PM, Edouard COLE 
>> wrote:
>>
>>> @ibrahim: When saying "clocks should be synchronized", it includes
>>> Cassandra nodes AND clients
>>>
>>> NTP is the way to go
>>>
>>> Le 6 sept. 2015 à 14:56, Laing, Michael  a
>>> écrit :
>

Re: Is Cassandra really Strong consistency?

2015-09-06 Thread ibrahim El-sanosi
Yes, LWT is another case and different compared to what my scenario is
about. I am not talking about LWT and CAS, it is true that LWT uses logical
clock by utilising Paxos. But my scenario is talking about using timestamp
and Last-Write-Wins.





If anyone can read the above scenario and confirm whether this can occur or
not, if it is possible, how current Cassandra can solve it?



Regards,



Ibrahim





If anyone can read the above scenario and confirm whether this can occur or
not, if it is possible, how current Cassandra can solve it?



Regards,



Ibrahim

On Sun, Sep 6, 2015 at 5:57 PM, Jeff Jirsa 
wrote:

> In the cases where NTP and client timestamps with microsecond resolution
> is insufficient, LWT “IF EXISTS, IF NOT EXISTS” is generally used.
>
>
> From: ibrahim El-sanosi
> Reply-To: "user@cassandra.apache.org"
> Date: Sunday, September 6, 2015 at 7:40 AM
> To: "user@cassandra.apache.org"
>
> Subject: Re: Is Cassandra really Strong consistency?
>
>
>
> I have done some research about “timestamps could jump back and forth
> arbitrarily if you talk to different nodes”.
>
> To summarise,  it is possible in Cassandra for following scenario can
> happen in sequence:
>
>
>
>1. Process A writes w1 with timestamp t=2
>2. Process B reads w1
>3. Process B writes w2 with timestamp t=1
>4. Process B reads w1, but expected w2
>
> If the system clock goes backwards for any reason, Cassandra’s session
> consistency guarantees no longer hold, even consistency level is write/read
> CL = QOURUM  or write CL = ALL and read CL =one.
>
>
>
> Moreover, even we use NTP, the problem above can occur. That means that
> the timestamps for writes are derived either from a single Cassandra server
> clock, or a single app server clock. These clocks can flow backwards, for a
> number of “reasons”:
>
>- *Hardware wonkiness can push clocks days or centuries into the
>future or past.*
>- *Virtualization can wreak havoc on kernel timekeeping.*
>- *Misconfigured nodes may not have NTP enabled, or may not be able to
>reach upstream sources.*
>- *Upstream NTP servers can lie.*
>- *When the problem is identified and fixed, NTP corrects large time
>differentials by jumping the clock discontinously to the correct time.*
>- *Even when perfectly synchronized, POSIX time itself is not
>monotonic*.
>
>
>
> If you want to read more this link can give you a lot hints.
>
>
>
> Regards,
>
>
>
> Ibrahim
>
> On Sun, Sep 6, 2015 at 2:01 PM, Edouard COLE 
> wrote:
>
>> @ibrahim: When saying "clocks should be synchronized", it includes
>> Cassandra nodes AND clients
>>
>> NTP is the way to go
>>
>> Le 6 sept. 2015 à 14:56, Laing, Michael  a
>> écrit :
>>
>> https://en.wikipedia.org/wiki/Network_Time_Protocol
>>
>> On Sun, Sep 6, 2015 at 8:23 AM, ibrahim El-sanosi <
>> ibrahimsaba...@gmail.com> wrote:
>>
>>> Assume the Cassandra cluster is located in somewhere in US. Clients that
>>> connect from different part of the world will have different timestamp (if
>>> we rely on client timestamp to store write) or If a coordinator is
>>> responsible for generating timestamp during the write, it also may have
>>> different time among replicas, resulting in write conflict can occur and
>>> impossible to resolve.
>>>
>>>
>>>
>>> When you are saying “Clocks should be synchronized”, does Cassandra
>>> synchronize the clock if so how can you refer me to any related article?
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Ibrahim
>>>
>>> On Sun, Sep 6, 2015 at 1:23 PM, Daniel Schulz <
>>> danielschulz2...@hotmail.com> wrote:
>>>
>>>> Cassandra is not changing clock settings; it does use it to omit TTL'ed
>>>> rows in compaction phases. So make sure your nodes agree on the very same
>>>> time using e.g. NTP. It is very crucial for data integrity on most
>>>> distributed systems.
>>>>
>>>> --
>>>> Date: Sun, 6 Sep 2015 13:10:14 +0100
>>>> Subject: Re: Is Cassandra really Strong consistency?
>>>> From: ibrahimsaba...@gmail.com
>>>> To: user@cassandra.apache.org
>>>>
>>>>
>>>> Do you mean Cassandra does synchronize the clock across all the
>>>> cluster, if yes how it does so, or could you refer me to any related
>>>> article?
>>>>
>>>> Thank you
>>>>
>>>&

Re: Is Cassandra really Strong consistency?

2015-09-06 Thread ibrahim El-sanosi
I have done some research about “timestamps could jump back and forth
arbitrarily if you talk to different nodes”.

To summarise,  it is possible in Cassandra for following scenario can
happen in sequence:



   1. Process A writes w1 with timestamp t=2
   2. Process B reads w1
   3. Process B writes w2 with timestamp t=1
   4. Process B reads w1, but expected w2

If the system clock goes backwards for any reason, Cassandra’s session
consistency guarantees no longer hold, even consistency level is write/read
CL = QOURUM  or write CL = ALL and read CL =one.



Moreover, even we use NTP, the problem above can occur. That means that the
timestamps for writes are derived either from a single Cassandra server
clock, or a single app server clock. These clocks can flow backwards, for a
number of “reasons”:

   - *Hardware wonkiness can push clocks days or centuries into the future
   or past.*
   - *Virtualization can wreak havoc on kernel timekeeping.*
   - *Misconfigured nodes may not have NTP enabled, or may not be able to
   reach upstream sources.*
   - *Upstream NTP servers can lie.*
   - *When the problem is identified and fixed, NTP corrects large time
   differentials by jumping the clock discontinously to the correct time.*
   - *Even when perfectly synchronized, POSIX time itself is not monotonic*.



If you want to read more this link can give you a lot hints.



Regards,



Ibrahim

On Sun, Sep 6, 2015 at 2:01 PM, Edouard COLE 
wrote:

> @ibrahim: When saying "clocks should be synchronized", it includes
> Cassandra nodes AND clients
>
> NTP is the way to go
>
> Le 6 sept. 2015 à 14:56, Laing, Michael  a
> écrit :
>
> https://en.wikipedia.org/wiki/Network_Time_Protocol
>
> On Sun, Sep 6, 2015 at 8:23 AM, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrote:
>
>> Assume the Cassandra cluster is located in somewhere in US. Clients that
>> connect from different part of the world will have different timestamp (if
>> we rely on client timestamp to store write) or If a coordinator is
>> responsible for generating timestamp during the write, it also may have
>> different time among replicas, resulting in write conflict can occur and
>> impossible to resolve.
>>
>>
>>
>> When you are saying “Clocks should be synchronized”, does Cassandra
>> synchronize the clock if so how can you refer me to any related article?
>>
>>
>>
>> Regards,
>>
>>
>>
>> Ibrahim
>>
>> On Sun, Sep 6, 2015 at 1:23 PM, Daniel Schulz <
>> danielschulz2...@hotmail.com> wrote:
>>
>>> Cassandra is not changing clock settings; it does use it to omit TTL'ed
>>> rows in compaction phases. So make sure your nodes agree on the very same
>>> time using e.g. NTP. It is very crucial for data integrity on most
>>> distributed systems.
>>>
>>> --
>>> Date: Sun, 6 Sep 2015 13:10:14 +0100
>>> Subject: Re: Is Cassandra really Strong consistency?
>>> From: ibrahimsaba...@gmail.com
>>> To: user@cassandra.apache.org
>>>
>>>
>>> Do you mean Cassandra does synchronize the clock across all the cluster,
>>> if yes how it does so, or could you refer me to any related article?
>>>
>>> Thank you
>>>
>>>
>>> Ibrahim
>>>
>>> On Sun, Sep 6, 2015 at 1:00 PM, Laing, Michael <
>>> michael.la...@nytimes.com> wrote:
>>>
>>> I think I saw this before.
>>>
>>> Clocks must be synchronized.
>>>
>>> On Sun, Sep 6, 2015 at 7:28 AM, ibrahim El-sanosi <
>>> ibrahimsaba...@gmail.com> wrote:
>>>
>>> Hi folks,
>>>
>>> Assume we have 4-nodes cluster N1, N2, N3, and N4 and replication factor
>>> is 3.  When write CL =ALL and read CL=ONE:
>>>
>>> Client c1 sends W1 = [k1,V1] to N1 (a coordinator).  A coordinator (N1)
>>> generates timestamp Mon 05-09-2015 11:30:40,200 (according to its local
>>> clock) and assigned it to W1 and sends the W1 to N2, N3, and N4. After few
>>> seconds, Client c2 sends W2 = [K1, V2] to N4 (a coordinator). A coordinator
>>> (N4) generates timestamp Mon 05-09-2015 11:30:38,200 (according to its
>>> local clock, but assume here N4 clock a bit behind, nearly 2 seconds) and
>>> assigned it to W2 and sends the W2 to N2, N3, and N4 (itself).
>>>
>>> As we have write CL =ALL and read CL = ONE. Now, Client c2 wants to read
>>> K1, connects to a coordinator N1, a coordinator sends read K1 to N2,
>>> picking latest timestamp which is [K1, V1]:Mon 05-09-2015 11:30:40,200.
>>>
>>> So in this scenario, the latest data that wrote to the replicas is [K1,
>>> V2] which should be the correct one, but it reads [K1,V1] because of divert
>>> clock.
>>>
>>> Can such scenario occur?
>>>
>>> Thank you
>>>
>>>
>>>
>>>
>>
>


Re: Is Cassandra really Strong consistency?

2015-09-06 Thread ibrahim El-sanosi
Assume the Cassandra cluster is located in somewhere in US. Clients that
connect from different part of the world will have different timestamp (if
we rely on client timestamp to store write) or If a coordinator is
responsible for generating timestamp during the write, it also may have
different time among replicas, resulting in write conflict can occur and
impossible to resolve.



When you are saying “Clocks should be synchronized”, does Cassandra
synchronize the clock if so how can you refer me to any related article?



Regards,



Ibrahim

On Sun, Sep 6, 2015 at 1:23 PM, Daniel Schulz 
wrote:

> Cassandra is not changing clock settings; it does use it to omit TTL'ed
> rows in compaction phases. So make sure your nodes agree on the very same
> time using e.g. NTP. It is very crucial for data integrity on most
> distributed systems.
>
> --
> Date: Sun, 6 Sep 2015 13:10:14 +0100
> Subject: Re: Is Cassandra really Strong consistency?
> From: ibrahimsaba...@gmail.com
> To: user@cassandra.apache.org
>
>
> Do you mean Cassandra does synchronize the clock across all the cluster,
> if yes how it does so, or could you refer me to any related article?
>
> Thank you
>
>
> Ibrahim
>
> On Sun, Sep 6, 2015 at 1:00 PM, Laing, Michael 
> wrote:
>
> I think I saw this before.
>
> Clocks must be synchronized.
>
> On Sun, Sep 6, 2015 at 7:28 AM, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrote:
>
> Hi folks,
>
> Assume we have 4-nodes cluster N1, N2, N3, and N4 and replication factor
> is 3.  When write CL =ALL and read CL=ONE:
>
> Client c1 sends W1 = [k1,V1] to N1 (a coordinator).  A coordinator (N1)
> generates timestamp Mon 05-09-2015 11:30:40,200 (according to its local
> clock) and assigned it to W1 and sends the W1 to N2, N3, and N4. After few
> seconds, Client c2 sends W2 = [K1, V2] to N4 (a coordinator). A coordinator
> (N4) generates timestamp Mon 05-09-2015 11:30:38,200 (according to its
> local clock, but assume here N4 clock a bit behind, nearly 2 seconds) and
> assigned it to W2 and sends the W2 to N2, N3, and N4 (itself).
>
> As we have write CL =ALL and read CL = ONE. Now, Client c2 wants to read
> K1, connects to a coordinator N1, a coordinator sends read K1 to N2,
> picking latest timestamp which is [K1, V1]:Mon 05-09-2015 11:30:40,200.
>
> So in this scenario, the latest data that wrote to the replicas is [K1,
> V2] which should be the correct one, but it reads [K1,V1] because of divert
> clock.
>
> Can such scenario occur?
>
> Thank you
>
>
>
>


Re: Is Cassandra really Strong consistency?

2015-09-06 Thread ibrahim El-sanosi
Do you mean Cassandra does synchronize the clock across all the cluster, if
yes how it does so, or could you refer me to any related article?

Thank you


Ibrahim

On Sun, Sep 6, 2015 at 1:00 PM, Laing, Michael 
wrote:

> I think I saw this before.
>
> Clocks must be synchronized.
>
> On Sun, Sep 6, 2015 at 7:28 AM, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrote:
>
>> Hi folks,
>>
>> Assume we have 4-nodes cluster N1, N2, N3, and N4 and replication factor
>> is 3.  When write CL =ALL and read CL=ONE:
>>
>> Client c1 sends W1 = [k1,V1] to N1 (a coordinator).  A coordinator (N1)
>> generates timestamp Mon 05-09-2015 11:30:40,200 (according to its local
>> clock) and assigned it to W1 and sends the W1 to N2, N3, and N4. After few
>> seconds, Client c2 sends W2 = [K1, V2] to N4 (a coordinator). A coordinator
>> (N4) generates timestamp Mon 05-09-2015 11:30:38,200 (according to its
>> local clock, but assume here N4 clock a bit behind, nearly 2 seconds) and
>> assigned it to W2 and sends the W2 to N2, N3, and N4 (itself).
>>
>> As we have write CL =ALL and read CL = ONE. Now, Client c2 wants to read
>> K1, connects to a coordinator N1, a coordinator sends read K1 to N2,
>> picking latest timestamp which is [K1, V1]:Mon 05-09-2015 11:30:40,200.
>>
>> So in this scenario, the latest data that wrote to the replicas is [K1,
>> V2] which should be the correct one, but it reads [K1,V1] because of divert
>> clock.
>>
>> Can such scenario occur?
>>
>> Thank you
>>
>
>


Is Cassandra really Strong consistency?

2015-09-06 Thread ibrahim El-sanosi
Hi folks,

Assume we have 4-nodes cluster N1, N2, N3, and N4 and replication factor is
3.  When write CL =ALL and read CL=ONE:

Client c1 sends W1 = [k1,V1] to N1 (a coordinator).  A coordinator (N1)
generates timestamp Mon 05-09-2015 11:30:40,200 (according to its local
clock) and assigned it to W1 and sends the W1 to N2, N3, and N4. After few
seconds, Client c2 sends W2 = [K1, V2] to N4 (a coordinator). A coordinator
(N4) generates timestamp Mon 05-09-2015 11:30:38,200 (according to its
local clock, but assume here N4 clock a bit behind, nearly 2 seconds) and
assigned it to W2 and sends the W2 to N2, N3, and N4 (itself).

As we have write CL =ALL and read CL = ONE. Now, Client c2 wants to read
K1, connects to a coordinator N1, a coordinator sends read K1 to N2,
picking latest timestamp which is [K1, V1]:Mon 05-09-2015 11:30:40,200.

So in this scenario, the latest data that wrote to the replicas is [K1, V2]
which should be the correct one, but it reads [K1,V1] because of divert
clock.

Can such scenario occur?

Thank you


Re: who does generate timestamp during the write?

2015-09-05 Thread ibrahim El-sanosi
In this cases, assume we have 4-nodes cluster N1, N2, N3, and N4 and
replication factor is 3. Client c1 sends W1 = [k1,V1] to N1 (a
coordinator).  A coordinator (N1) generates timestamp Mon 05-09-2015
11:30:40,200 (according to its local clock) and assigned it to W1 and sends
the W1 to the N2, N3, and N4. After few seconds, Client c2 sends W2 = [K1,
V2] to N4 (a coordinator). A coordinator (N4) generates timestamp Mon
05-09-2015 11:30:38,200 (according to its local clock, but assume here N4
clock a bit behind, nearly 2 seconds) and assigned IT to W2 and sends the
W2 to N2, N3, and N4 (itself).

Assume we have write CL =ALL and read CL = ONE. Now, Client c2 wants to
read K1, connects to a coordinator N1, a coordinator sends read K1 to N1,
picking latest timestamp which is [K1, V1]:Mon 05-09-2015 11:30:40,200.

So in this scenario, the latest data that wrote to the replicas is [K1, V2]
which should be the correct one, but it reads [K1,V1] because of divert
clock.

Can such scenario occur?

Thank you


Re: who does generate timestamp during the write?

2015-09-04 Thread ibrahim El-sanosi
Hi Andrey,

I just came across this articale "

"Each cell in a CQL table has a corresponding timestamp
which is taken from the clock on *the Cassandra node* *that orchestrates the
write.* When you are reading from a Cassandra cluster the node that
coordinates the read will compare the timestamps of the values it fetches.
Last write(=highest timestamp) wins and will be returned to the client."

What do you think?

"

On Fri, Sep 4, 2015 at 6:41 PM, Andrey Ilinykh  wrote:

> Coordinator doesn't generate timestamp, it is generated by client.
>
> On Fri, Sep 4, 2015 at 10:37 AM, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrote:
>
>> Ok, why coordinator does generate timesamp, as the write is a part of
>> Cassandra process after client submit the request to Cassandra?
>>
>> On Fri, Sep 4, 2015 at 6:29 PM, Andrey Ilinykh 
>> wrote:
>>
>>> Your application.
>>>
>>> On Fri, Sep 4, 2015 at 10:26 AM, ibrahim El-sanosi <
>>> ibrahimsaba...@gmail.com> wrote:
>>>
>>>> Dear folks,
>>>>
>>>> When we hear about the notion of Last-Write-Wins in Cassandra according
>>>> to timestamp, *who does generate this timestamp during the write,
>>>> coordinator or each individual replica in which the write is going to be
>>>> stored?*
>>>>
>>>>
>>>> *Regards,*
>>>>
>>>>
>>>>
>>>> *Ibrahim*
>>>>
>>>
>>>
>>
>


Re: who does generate timestamp during the write?

2015-09-04 Thread ibrahim El-sanosi
Ok, why coordinator does generate timesamp, as the write is a part of
Cassandra process after client submit the request to Cassandra?

On Fri, Sep 4, 2015 at 6:29 PM, Andrey Ilinykh  wrote:

> Your application.
>
> On Fri, Sep 4, 2015 at 10:26 AM, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrote:
>
>> Dear folks,
>>
>> When we hear about the notion of Last-Write-Wins in Cassandra according
>> to timestamp, *who does generate this timestamp during the write,
>> coordinator or each individual replica in which the write is going to be
>> stored?*
>>
>>
>> *Regards,*
>>
>>
>>
>> *Ibrahim*
>>
>
>


who does generate timestamp during the write?

2015-09-04 Thread ibrahim El-sanosi
Dear folks,

When we hear about the notion of Last-Write-Wins in Cassandra according to
timestamp, *who does generate this timestamp during the write, coordinator
or each individual replica in which the write is going to be stored?*


*Regards,*



*Ibrahim*


Re: lightweight transactions with potential problem?

2015-08-27 Thread ibrahim El-sanosi
Hi Sylvain and all folks,

I have another scenario in my mind where *linearizable consistency (CAS,
Compare-and-Set) *can fail as we *the following round-trips:*

*1.*  *Prepare/promise*

*2.*  *Read/result*

*3.*  *Propose/accept*

4.  *Commit/acknowledgment *

Assume we have an application for resistering new account, I want to make
sure I only allow exactly one user to claim a given account. For example,
we do not allow two users having the same username.

Assuming we have a cluster consist of 5 nodes N1, N2, N3, N4, and N5. We
have two concurrent clients C1 and C2. We have replication factor 3 and the
partitioner has determined the primary and the replicas nodes of the INSERT
example are N3, N4, and N5.



The scenario happens in following order:

1.  C1 connects to coordinator N1 and sends INSERT  V1 (assume V1 is
username, not resister before)

2.  N1 sends PREPARE message with ballot 1 (highest ballot have seen)
to N3, N4 and N5. Note that this prepare for C1 and V1.

3. Now C2 connects to coordinator N2 and sends INSERT  V1.

4. N2 sends PREPARE message with ballot 2 (highest ballot after re-prepare
because first time, N2 does not know about ballot 1, then eventual it
solves and have ballot 2) to N3, N4 and N5. Note that this prepare for C2
and V1.

*5.**N1  sends READ message to N3, N4 and N5 to read V1.*

*6.**N3, N4 and N5 send RESULT message to N1, informing that V1 not
exist which results in N1 will go forward to next round.*

*7.* * N2  sends READ message to N3, N4 and N5 to read V1.*

*8.*   *N3, N4 and N5 send RESULT message to N2, informing that V1 not
exist which results in N2 will go forward to next round.*

9.   Now N1 send PROPOSE message to  N3, N4 and N5 (ballot 1, V1).

10.  N3, N4 and N5 send ACCEPT message to N1.

11.  N2 send PROPOSE message to  N3, N4 and N5 (ballot 2, V1).

12.  N3, N4 and N5 send ACCEPT message to N2.

13.  N1 send COMMIT message to  N3, N4 and N5 (ballot 1).

14.   N3, N4 and N5 send ACK message to N1.

15.   N2 send COMMIT message to  N3, N4 and N5 (ballot 2).

16.  N3, N4 and N5 send ACK message to N2.



As result, both V1 from client C1 and V1 from client C2 have written to
replicas N3, N4, and N5. Which I think it does not achieve the goal of
*linearizable
consistency and CAS. *



*Is that true and such scenario could be occurred?*



I look forward to hearing from you.



Regards,


Ibrahim

On Wed, Aug 26, 2015 at 12:19 PM, ibrahim El-sanosi <
ibrahimsaba...@gmail.com> wrote:

> Thank you lot
>
> Ibrahim
>
> On Wed, Aug 26, 2015 at 12:15 PM, Sylvain Lebresne 
> wrote:
>
>> Yes
>>
>> On Wed, Aug 26, 2015 at 1:05 PM, ibrahim El-sanosi <
>> ibrahimsaba...@gmail.com> wrote:
>>
>>> OK. I see what the purpose of acknowledgment round here. So
>>> acknowledgment is optional here, depend on CL setting as we normally do in
>>> Cassandra.
>>> So we can say that acknowledgment is not really related to Paxos phase,
>>> it depends on CL in Cassandra?
>>>
>>> Ibrahim
>>>
>>> On Wed, Aug 26, 2015 at 11:50 AM, Sylvain Lebresne >> > wrote:
>>>
>>>> On Wed, Aug 26, 2015 at 12:19 PM, ibrahim El-sanosi <
>>>> ibrahimsaba...@gmail.com> wrote:
>>>>
>>>>> Yes, Sylvain, your answer makes more sense. The phase is in Paxos
>>>>> protocol sometimes called learning or decide phase, BUT this phase does 
>>>>> not
>>>>> have acknowledgment round, just learning or decide message from the
>>>>> proposer to learners. So why we need acknowledgment round with commit 
>>>>> phase
>>>>> in lightweight transactions?
>>>>>
>>>>
>>>> It's not _needed_ as far as Paxos is concerned. But it's useful in the
>>>> context of Cassandra. The commit phase is about actually persisting to
>>>> replica the update decided by the Paxos algorithm and thus making that
>>>> update visible to non paxos reads. Being able to apply normal consistencies
>>>> to this phase is thus useful, since it allows user to get visibility
>>>> guarantees even for non-paxos reads if they so wish, and that's exactly
>>>> what we do and why we optionally wait on acknowledgments (and I say
>>>> optionally because how many acks we wait on depends on the user provided
>>>> consistency level and if that's CL.ANY then the whole Paxos operation
>>>> actually return without waiting on any of those acks).
>>>>
>>>>
>>>>
>>>
>>
>


Re: lightweight transactions with potential problem?

2015-08-26 Thread ibrahim El-sanosi
Thank you lot

Ibrahim

On Wed, Aug 26, 2015 at 12:15 PM, Sylvain Lebresne 
wrote:

> Yes
>
> On Wed, Aug 26, 2015 at 1:05 PM, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrote:
>
>> OK. I see what the purpose of acknowledgment round here. So
>> acknowledgment is optional here, depend on CL setting as we normally do in
>> Cassandra.
>> So we can say that acknowledgment is not really related to Paxos phase,
>> it depends on CL in Cassandra?
>>
>> Ibrahim
>>
>> On Wed, Aug 26, 2015 at 11:50 AM, Sylvain Lebresne 
>> wrote:
>>
>>> On Wed, Aug 26, 2015 at 12:19 PM, ibrahim El-sanosi <
>>> ibrahimsaba...@gmail.com> wrote:
>>>
>>>> Yes, Sylvain, your answer makes more sense. The phase is in Paxos
>>>> protocol sometimes called learning or decide phase, BUT this phase does not
>>>> have acknowledgment round, just learning or decide message from the
>>>> proposer to learners. So why we need acknowledgment round with commit phase
>>>> in lightweight transactions?
>>>>
>>>
>>> It's not _needed_ as far as Paxos is concerned. But it's useful in the
>>> context of Cassandra. The commit phase is about actually persisting to
>>> replica the update decided by the Paxos algorithm and thus making that
>>> update visible to non paxos reads. Being able to apply normal consistencies
>>> to this phase is thus useful, since it allows user to get visibility
>>> guarantees even for non-paxos reads if they so wish, and that's exactly
>>> what we do and why we optionally wait on acknowledgments (and I say
>>> optionally because how many acks we wait on depends on the user provided
>>> consistency level and if that's CL.ANY then the whole Paxos operation
>>> actually return without waiting on any of those acks).
>>>
>>>
>>>
>>
>


Re: lightweight transactions with potential problem?

2015-08-26 Thread ibrahim El-sanosi
OK. I see what the purpose of acknowledgment round here. So
acknowledgment is optional here, depend on CL setting as we normally do in
Cassandra.
So we can say that acknowledgment is not really related to Paxos phase, it
depends on CL in Cassandra?

Ibrahim

On Wed, Aug 26, 2015 at 11:50 AM, Sylvain Lebresne 
wrote:

> On Wed, Aug 26, 2015 at 12:19 PM, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrote:
>
>> Yes, Sylvain, your answer makes more sense. The phase is in Paxos
>> protocol sometimes called learning or decide phase, BUT this phase does not
>> have acknowledgment round, just learning or decide message from the
>> proposer to learners. So why we need acknowledgment round with commit phase
>> in lightweight transactions?
>>
>
> It's not _needed_ as far as Paxos is concerned. But it's useful in the
> context of Cassandra. The commit phase is about actually persisting to
> replica the update decided by the Paxos algorithm and thus making that
> update visible to non paxos reads. Being able to apply normal consistencies
> to this phase is thus useful, since it allows user to get visibility
> guarantees even for non-paxos reads if they so wish, and that's exactly
> what we do and why we optionally wait on acknowledgments (and I say
> optionally because how many acks we wait on depends on the user provided
> consistency level and if that's CL.ANY then the whole Paxos operation
> actually return without waiting on any of those acks).
>
>
>


Re: lightweight transactions with potential problem?

2015-08-26 Thread ibrahim El-sanosi
Yes, Sylvain, your answer makes more sense. The phase is in Paxos protocol
sometimes called learning or decide phase, BUT this phase does not have
acknowledgment round, just learning or decide message from the proposer to
learners. So why we need acknowledgment round with commit phase in
lightweight transactions?


Commit/acknowledgment phase in CAS?

2015-08-25 Thread ibrahim El-sanosi
Hi folks,


To achieve linearizable consistency in Cassandra, there are four
round-trips must be performed:

1.   Prepare/promise

2.   Read/result

3.   Propose/accept

*4.   **Commit/acknowledgment *



In the last phase in Paxos protocol (white paper), there is decide phase
only, no Commit/acknowledgment. DESIDE means to tell learners to apply the
accepted value.

If Commit/acknowledgment phase in CAS has similar purpose as DECIDE, then
why we have an acknowledgment round?


In fact, I want to know the purpose of Commit/acknowledgment phase in
lineazaible consistency in Cassandra. I have read the
http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0,
but it does not explain whole the picture.



I look forward to hearing from you

Ibrahim


Re: lightweight transactions with potential problem?

2015-08-25 Thread ibrahim El-sanosi
What an excellent explanation!!, thank you a lot.

By the way, I do not understand why in lightweight transactions in
Cassandra has round-trip commit/acknowledgment?

For me, I think we can commit the value within phase propose/accept. Do you
agree? If not agree can you explain why we need commit/acknowledgment?



Regards,



ibrahim


Re: lightweight transactions with potential problem?

2015-08-25 Thread ibrahim El-sanosi
OK, I see.

So you meant that the older ballot will not only reject in round-trip1
(prepare/promise), it also can be reject in propose/accept round-trips2, Is
that correct?

You Said : Or more precisely, you got step 8 wrong: when a replica PROMISE,
the promise is not that they won't "promise" a ballot older than 2,it's
that they won't "accept" a ballot older than 2

Why step 8 wrong? I think replicas can accept any highest ballot, so ballot
2 is the highest in step 8? what do you think?
 Do you also mean replica can promise older ballot.

I wish you could make it more clear.

Thank you a lot Sylvain

Ibrahim


On Tue, Aug 25, 2015 at 1:40 PM, Sylvain Lebresne 
wrote:

> That scenario cannot happen. More specifically, your step 12 cannot happen
> if
> step 8 has happen. Or more precisely, you got step 8 wrong: when a replica
> PROMISE, the promise is not that they won't "promise" a ballot older than
> 2,
> it's that they won't "accept" a ballot older than 2. Therefore, after step
> 8,
> the accept from N1 will be reject in step 12 and the insert from N1 will be
> rejected (that is, N1 will restart the whole algorithm with a new ballot).
>
>
> On Tue, Aug 25, 2015 at 1:54 PM, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrote:
>
>> Hi folks,
>>
>>
>> Cassandra provides *linearizable consistency (CAS, Compare-and-Set) by
>> using Paxos 4 round-trips as following*
>>
>> *1.  **Prepare/promise*
>>
>> *2.  **Read/result*
>>
>> *3.  **Propose/accept*
>>
>> *4.  **Commit/acknowledgment *
>>
>> Assume we have an application for resistering new account, I want to make
>> sure I only allow exactly one user to claim a given account. For example,
>> we do not allow two users having the same username.
>>
>> Assuming we have a cluster consist of 5 nodes N1, N2, N3, N4, and N5. We
>> have two concurrent clients C1 and C2. We have replication factor 3 and the
>> partitioner has determined the primary and the replicas nodes of the INSERT
>> example are N3, N4, and N5.
>>
>>
>> The scenario happens in following order:
>>
>> 1.  C1 connects to coordinator N1 and sends INSERT  V1 (assume V1 is
>> username, not resister before)
>>
>> 2.  N1 sends PREPARE message with ballot 1 (highest ballot have
>> seen) to N3, N4 and N5. Note that this prepare for C1 and V1.
>>
>> 3.  N3, N4 and N5 send a PROMISE message to N1, to not promise any
>> with older than ballot 1.
>>
>> 4.N1  sends READ message to N3, N4 and N5 to read V1.
>>
>> 5.N3, N4 and N5 send RESULT message to N1, informing that V1 not
>> exist which results in N1 will go forward to next round.
>>
>> 6.  Now C2 connects to coordinator N2 and sends INSERT  V1.
>>
>> 7.  N2 sends PREPARE message with ballot 2 (highest ballot after
>> re-prepare because first time, N2 does not know about ballot 1, then
>> eventual it solves and have ballot 2) to N3, N4 and N5. Note that this
>> prepare for C2 and V1.
>>
>> 8.  N3, N4 and N5 send a PROMISE message to N2, to not promise any
>> with older than ballot 2.
>>
>> 9.  N2  sends READ message to N3, N4 and N5 to read V1.
>>
>> 10.   N3, N4 and N5 send RESULT message to N2, informing that V1 not
>> exist which results in N2 will go forward to next round.
>>
>> 11.   Now N1 send PROPOSE message to  N3, N4 and N5 (ballot 1, V1).
>>
>> 12.  N3, N4 and N5 send ACCEPT message to N1.
>>
>> 13.  N2 send PROPOSE message to  N3, N4 and N5 (ballot 2, V1).
>>
>> 14.  N3, N4 and N5 send ACCEPT message to N2.
>>
>> 15.  N1 send COMMIT message to  N3, N4 and N5 (ballot 1).
>>
>> 16.   N3, N4 and N5 send ACK message to N1.
>>
>> 17.   N2 send COMMIT message to  N3, N4 and N5 (ballot 2).
>>
>> 18.  N3, N4 and N5 send ACK message to N2.
>>
>>
>> As result, both V1 from client C1 and V1 from client C2 have written to
>> replicas N3, N4, and N5. Which I think it does not achieve the goal of 
>> *linearizable
>> consistency and CAS. *
>>
>>
>>
>> *Is that true and such scenario could be occurred?*
>>
>>
>>
>> I look forward to hearing from you.
>>
>>
>> Regards,
>>
>
>


lightweight transactions with potential problem?

2015-08-25 Thread ibrahim El-sanosi
Hi folks,


Cassandra provides *linearizable consistency (CAS, Compare-and-Set) by
using Paxos 4 round-trips as following*

*1.  **Prepare/promise*

*2.  **Read/result*

*3.  **Propose/accept*

*4.  **Commit/acknowledgment *

Assume we have an application for resistering new account, I want to make
sure I only allow exactly one user to claim a given account. For example,
we do not allow two users having the same username.

Assuming we have a cluster consist of 5 nodes N1, N2, N3, N4, and N5. We
have two concurrent clients C1 and C2. We have replication factor 3 and the
partitioner has determined the primary and the replicas nodes of the INSERT
example are N3, N4, and N5.


The scenario happens in following order:

1.  C1 connects to coordinator N1 and sends INSERT  V1 (assume V1 is
username, not resister before)

2.  N1 sends PREPARE message with ballot 1 (highest ballot have seen)
to N3, N4 and N5. Note that this prepare for C1 and V1.

3.  N3, N4 and N5 send a PROMISE message to N1, to not promise any with
older than ballot 1.

4.N1  sends READ message to N3, N4 and N5 to read V1.

5.N3, N4 and N5 send RESULT message to N1, informing that V1 not exist
which results in N1 will go forward to next round.

6.  Now C2 connects to coordinator N2 and sends INSERT  V1.

7.  N2 sends PREPARE message with ballot 2 (highest ballot after
re-prepare because first time, N2 does not know about ballot 1, then
eventual it solves and have ballot 2) to N3, N4 and N5. Note that this
prepare for C2 and V1.

8.  N3, N4 and N5 send a PROMISE message to N2, to not promise any with
older than ballot 2.

9.  N2  sends READ message to N3, N4 and N5 to read V1.

10.   N3, N4 and N5 send RESULT message to N2, informing that V1 not exist
which results in N2 will go forward to next round.

11.   Now N1 send PROPOSE message to  N3, N4 and N5 (ballot 1, V1).

12.  N3, N4 and N5 send ACCEPT message to N1.

13.  N2 send PROPOSE message to  N3, N4 and N5 (ballot 2, V1).

14.  N3, N4 and N5 send ACCEPT message to N2.

15.  N1 send COMMIT message to  N3, N4 and N5 (ballot 1).

16.   N3, N4 and N5 send ACK message to N1.

17.   N2 send COMMIT message to  N3, N4 and N5 (ballot 2).

18.  N3, N4 and N5 send ACK message to N2.


As result, both V1 from client C1 and V1 from client C2 have written to
replicas N3, N4, and N5. Which I think it does not achieve the goal of
*linearizable
consistency and CAS. *



*Is that true and such scenario could be occurred?*



I look forward to hearing from you.


Regards,


Write request in Cassandra?

2015-08-21 Thread ibrahim El-sanosi
Dear folks,


I have doubt on how Cassandra performs a write request; I have two
scenarios, please read them and ensure which one is correct?


Assume we have cluster consists of 4 nodes N1, N2, N3, and N4. As Cassandra
distributes the nodes in ring topology, the nodes links as following:

N-->N-->N3-->N4-->N1

Also we have replication factor equal to 3 (RF=3), and consistency level
equals to ALL (CL=ALL).

Client sends write request, W, to coordinator, say N4. The partitioner has
determined the primary node of W is N1.

What will happen now?


*Scenario 1**:* coordinator sends W to N1. Upon receiving W, N1 stores it
locally (in commitLog and memtable, *please forget about internal process*)
and acknowledges the coordinator N4. Then N1 sends a copy of W to N2
(because N2 is next node in ring from N1 prospective). Upon receiving W, N2
stores it locally and sends acknowledgement to N4. Then N2 sends a copy of
W to N3 (because N3 is next node in ring from N2 prospective). Upon
receiving W, N3 stores it locally and acknowledges the Coordinator N4.
Finally as soon as coordinator, N4, receives an acknowledgement from all
nodes (N1, N2, and N3), it replays to the client.

Note that: if scenario 1 correct, then the latency will be 4 rounds (N4-->N1
-->N2-->N3-->N4 client).



*Scenario 2:*  coordinator, N4, broadcasts W to N1, N2, and N3 (N4-->N1, N4
-->N2, N4-->N3). Then replicas (N1, N2, and N3) store W locally and
acknowledge to N4.  When N4 receives all ACKs, it replays to client.



Can anyone confirm which scenario is correct in Cassandra?



Regards?



Ibrahim