sstableloader making no progress

2017-02-10 Thread Simone Franzini
I am trying to ingest some data from a cluster to a different cluster via
sstableloader. I am running DSE 4.8.7 / Cassandra 2.1.14.
I have re-created the schemas and followed other instructions here:
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsBulkloader_t.html

I am initially testing the ingest process with a single table, containing 3
really small sstables (just a few KB each):
sstableloader -v -d  /
>From the console, it appears that the progress quickly reaches 100%, but
the command never returns:
progress: [/10.128.X.Y]0:3/3 100% [/10.192.Z.W]0:3/3 100% ... total: 100% 0
 MB/s(avg: 0 MB/s)

nodetool netstats shows that there is no progress:
Mode: NORMAL
Bulk Load e495cea0-efde-11e6-9ec0-8f99f25bfcf7
/10.128.X.Y
Receiving 3 files, 3963 bytes total. Already received 0 files, 0
bytes total
Bulk Load b2566980-efb7-11e6-a467-8f99f25bfcf7
/10.128.X.Y
Receiving 3 files, 3963 bytes total. Already received 0 files, 0
bytes total
Bulk Load f31e7810-efdd-11e6-8484-8f99f25bfcf7
/10.128.X.Y
Receiving 3 files, 3963 bytes total. Already received 0 files, 0
bytes total
...
Read Repair Statistics:
Attempted: 8
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a 02148112
Responses   n/a 0 977176


The logs show the following, but no error or warning message:
2017-02-10 16:18:49,096 INFO  [STREAM-INIT-/10.128.X.Y:33302]
 StreamResultFuture.java:109 - [Stream
#e495cea0-efde-11e6-9ec0-8f99f25bfcf7 ID#0] Creating new streaming plan for
Bulk Load
2017-02-10 16:18:49,105 INFO  [STREAM-INIT-/10.128.X.Y:33302]
 StreamResultFuture.java:116 - [Stream
#e495cea0-efde-11e6-9ec0-8f99f25bfcf7, ID#0] Received streaming plan for
Bulk Load
2017-02-10 16:18:49,110 INFO  [STREAM-INIT-/10.128.X.Y:33306]
 StreamResultFuture.java:116 - [Stream
#e495cea0-efde-11e6-9ec0-8f99f25bfcf7, ID#0] Received streaming plan for
Bulk Load
2017-02-10 16:18:49,110 INFO  [STREAM-IN-/10.128.X.Y]
 StreamResultFuture.java:166 - [Stream
#e495cea0-efde-11e6-9ec0-8f99f25bfcf7 ID#0] Prepare completed. Receiving 3
files(3963 bytes), sending 0 files(0 bytes)


Any help would be greatly appreciated.

Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini


Re: Nodetool cleanup error - cannot run before a node has joined the ring

2017-02-10 Thread Simone Franzini
Thank you Michael.

Well, this was apparently my bad.

1. nodetool connects to the local JMX port 7199, which is indeed running on
localhost in my case.

2. I did a few more attempts, the message "Aborted cleaning up atleast one
column family in keyspace " only appears in the DC where
 is not replicated, so it is indeed correct. I guess that
the "Cleanup
cannot run before a node has joined the ring" message refers to this.

3. I have seen a few log lines from CompactionManager.java showing nodetool
cleanup doing work, so that's good. I guess that the "No sstables for
." log lines refer to the fact that there is no sstable
*to cleanup* for that table, not that it couldn't find any sstable.

So at the end of the day, just log messages that were not very clear (at
least to me).

Thanks,

Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini

On Fri, Feb 10, 2017 at 2:42 PM, Michael Shuler 
wrote:

> By default, yes, nodetool connects to localhost, which your log entries
> show. Use `nodetool -h $PRIV_IP cleanup ...` to connect to that private
> IP it's listening on. `nodetool help cleanup` for all options.
>
> --
> Kind regards,
> Michael
>
> On 02/10/2017 02:22 PM, Simone Franzini wrote:
> > I am running DSE 4.8.7 / Cassandra 2.1.14.
> > When I attempt to run nodetool cleanup on any node / any environment we
> > are managing, I get the following output:
> >
> > Aborted cleaning up atleast one column family in keyspace ,
> > check server logs for more information.
> > error: nodetool failed, check server logs
> > -- StackTrace --
> > java.lang.RuntimeException: nodetool failed, check server logs
> > at
> > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:294)
> > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:206)
> >
> >
> > The server logs do not have any error or warn message, however I see the
> > following log line:
> > 2017-02-10 14:06:49,931 INFO  [RMI TCP Connection(34)-127.0.0.1]
> >  CompactionManager.java:415 - Cleanup cannot run before a node has
> > joined the ring
> >
> > That is then followed by a line like this, for every single one of our
> > keyspaces and tables:
> > 2017-02-10 14:06:49,969 INFO  [RMI TCP Connection(34)-127.0.0.1]
> >  CompactionManager.java:294 - No sstables for .
> >
> > I find the message above a bit suspicious. Of course I do not have any
> > node in the process of joining the ring. It looks like nodetool is
> > having trouble connecting to the Cassandra instance? Is it trying to
> > connect to localhost? We have Cassandra listening on a private IP, not
> > on localhost. All other nodetool commands are running fine though. Any
> > suggestion with what could be the issue here?
> >
> > Thanks,
> > Simone Franzini, PhD
> >
> > http://www.linkedin.com/in/simonefranzini
>
>


Re: Nodetool cleanup error - cannot run before a node has joined the ring

2017-02-10 Thread Michael Shuler
By default, yes, nodetool connects to localhost, which your log entries
show. Use `nodetool -h $PRIV_IP cleanup ...` to connect to that private
IP it's listening on. `nodetool help cleanup` for all options.

-- 
Kind regards,
Michael

On 02/10/2017 02:22 PM, Simone Franzini wrote:
> I am running DSE 4.8.7 / Cassandra 2.1.14.
> When I attempt to run nodetool cleanup on any node / any environment we
> are managing, I get the following output:
> 
> Aborted cleaning up atleast one column family in keyspace ,
> check server logs for more information.
> error: nodetool failed, check server logs
> -- StackTrace --
> java.lang.RuntimeException: nodetool failed, check server logs
> at
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:294)
> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:206)
> 
> 
> The server logs do not have any error or warn message, however I see the
> following log line:
> 2017-02-10 14:06:49,931 INFO  [RMI TCP Connection(34)-127.0.0.1]
>  CompactionManager.java:415 - Cleanup cannot run before a node has
> joined the ring
> 
> That is then followed by a line like this, for every single one of our
> keyspaces and tables:
> 2017-02-10 14:06:49,969 INFO  [RMI TCP Connection(34)-127.0.0.1]
>  CompactionManager.java:294 - No sstables for .
> 
> I find the message above a bit suspicious. Of course I do not have any
> node in the process of joining the ring. It looks like nodetool is
> having trouble connecting to the Cassandra instance? Is it trying to
> connect to localhost? We have Cassandra listening on a private IP, not
> on localhost. All other nodetool commands are running fine though. Any
> suggestion with what could be the issue here?
> 
> Thanks,
> Simone Franzini, PhD
> 
> http://www.linkedin.com/in/simonefranzini



Nodetool cleanup error - cannot run before a node has joined the ring

2017-02-10 Thread Simone Franzini
I am running DSE 4.8.7 / Cassandra 2.1.14.
When I attempt to run nodetool cleanup on any node / any environment we are
managing, I get the following output:

Aborted cleaning up atleast one column family in keyspace ,
check server logs for more information.
error: nodetool failed, check server logs
-- StackTrace --
java.lang.RuntimeException: nodetool failed, check server logs
at
org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:294)
at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:206)


The server logs do not have any error or warn message, however I see the
following log line:
2017-02-10 14:06:49,931 INFO  [RMI TCP Connection(34)-127.0.0.1]
 CompactionManager.java:415 - Cleanup cannot run before a node has joined
the ring

That is then followed by a line like this, for every single one of our
keyspaces and tables:
2017-02-10 14:06:49,969 INFO  [RMI TCP Connection(34)-127.0.0.1]
 CompactionManager.java:294 - No sstables for .

I find the message above a bit suspicious. Of course I do not have any node
in the process of joining the ring. It looks like nodetool is having
trouble connecting to the Cassandra instance? Is it trying to connect to
localhost? We have Cassandra listening on a private IP, not on localhost.
All other nodetool commands are running fine though. Any suggestion with
what could be the issue here?

Thanks,
Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini


Re: How does cassandra achieve Linearizability?

2017-02-10 Thread Kant Kodali
Thanks Ariel! Yes I knew there are so many variations and optimizations of
Paxos. I just wanted to see if we had any plans on improving the existing
Paxos implementation and it is great to see the work is under progress! I
am going to follow that ticket and read up the references pointed in it


On Fri, Feb 10, 2017 at 8:33 AM, Ariel Weisberg  wrote:

> Hi,
>
> Cassandra's implementation of Paxos doesn't implement many optimizations
> that would drastically improve throughput and latency. You need consensus,
> but it doesn't have to be exorbitantly expensive and fall over under any
> kind of contention.
>
> For instance you could implement EPaxos https://issues.apache.
> org/jira/browse/CASSANDRA-6246
> ,
> batch multiple operations into the same Paxos round, have an affinity for a
> specific proposer for a specific partition, implement asynchronous commit,
> use a more efficient implementation of the Paxos log, and maybe other
> things.
>
> Ariel
>
>
> On Fri, Feb 10, 2017, at 05:31 AM, Benjamin Roth wrote:
>
> Hi Kant,
>
> If you read the published papers about Paxos, you will most probably
> recognize that there is no way to "do it better". This is a conceptional
> thing due to the nature of distributed systems + the CAP theorem.
> If you want A+P in the triangle, then C is very expensive. CS is made for
> A+P mostly with tunable C. In ACID databases this is a completely different
> thing as they are mostly either not partition tolerant, not highly
> available or not scalable (in a distributed manner, not speaking of
> "monolithic super servers").
>
> There is no free lunch ...
>
>
> 2017-02-10 11:09 GMT+01:00 Kant Kodali :
>
> "That’s the safety blanket everyone wants but is extremely expensive,
> especially in Cassandra."
>
> yes LWT's are expensive. Are there any plans to make this better?
>
> On Fri, Feb 10, 2017 at 12:17 AM, Kant Kodali  wrote:
>
> Hi Jon,
>
> Thanks a lot for your response. I am well aware that the LWW != LWT but I
> was talking more in terms of LWW with respective to LWT's which I believe
> you answered. so thanks much!
>
>
> kant
>
>
> On Thu, Feb 9, 2017 at 6:01 PM, Jon Haddad 
> wrote:
>
> LWT != Last Write Wins.  They are totally different.
>
> LWTs give you (assuming you also read at SERIAL) “atomic consistency”,
> meaning you are able to perform operations atomically and in isolation.
> That’s the safety blanket everyone wants but is extremely expensive,
> especially in Cassandra.  The lightweight part, btw, may be a little
> optimistic, especially if a key is under contention.  With regard to the
> “last write” part you’re asking about - w/ LWT Cassandra provides the
> timestamp and manages it as part of the ballot, and it always is
> increasing.  See 
> org.apache.cassandra.service.ClientState#getTimestampForPaxos.
> From the code:
>
>  * Returns a timestamp suitable for paxos given the timestamp of the last
> known commit (or in progress update).
>  * Paxos ensures that the timestamp it uses for commits respects the
> serial order of those commits. It does so
>  * by having each replica reject any proposal whose timestamp is not
> strictly greater than the last proposal it
>  * accepted. So in practice, which timestamp we use for a given proposal
> doesn't affect correctness but it does
>  * affect the chance of making progress (if we pick a timestamp lower than
> what has been proposed before, our
>  * new proposal will just get rejected).
>
> Effectively paxos removes the ability to use custom timestamps and
> addresses clock variance by rejecting ballots with timestamps less than
> what was last seen.  You can learn more by reading through the other
> comments and code in that file.
>
> Last write wins is a free for all that guarantees you *nothing* except the
> timestamp is used as a tiebreaker.  Here we acknowledge things like the
> speed of light as being a real problem that isn’t going away anytime soon.
> This problem is sometimes addressed with event sourcing rather than
> mutating in place.
>
> Hope this helps.
>
>
> Jon
>
>
>
>
> On Feb 9, 2017, at 5:21 PM, Kant Kodali  wrote:
>
> @Justin I read this article http://www.datastax.com/dev/bl
> og/lightweight-transactions-in-cassandra-2-0. And it clearly says
> Linearizable consistency can be achieved with LWT's.  so should I assume
> the Linearizability in the context of the above article is possible with
> LWT's and synchronization of clocks through ntpd ? because LWT's also
> follow Last Write Wins. isn't it? Also another question does most of the
> production clusters do setup ntpd? If so what is the time it takes to sync?
> any idea
>
> @Micheal Schuler Are you referring to  something like true time as in
> https://static.googleusercontent.com/media/research.google.c
> om/en//archive/spanner-osdi2012.pdf?  Actually I never heard of setting
> up GPS modules and how that can be helpful. Let me research on that but
> good point.
>

Re: How does cassandra achieve Linearizability?

2017-02-10 Thread Ariel Weisberg
Hi,



Cassandra's implementation of Paxos doesn't implement many optimizations
that would drastically improve throughput and latency. You need
consensus, but it doesn't have to be exorbitantly expensive and fall
over under any kind of contention.


For instance you could implement EPaxos
https://issues.apache.org/jira/browse/CASSANDRA-6246[1], batch multiple
operations into the same Paxos round, have an affinity for a specific
proposer for a specific partition, implement asynchronous commit, use a
more efficient implementation of the Paxos log, and maybe other things.


Ariel





On Fri, Feb 10, 2017, at 05:31 AM, Benjamin Roth wrote:

> Hi Kant,

> 

> If you read the published papers about Paxos, you will most probably
> recognize that there is no way to "do it better". This is a
> conceptional thing due to the nature of distributed systems + the CAP
> theorem.
> If you want A+P in the triangle, then C is very expensive. CS is made
> for A+P mostly with tunable C. In ACID databases this is a completely
> different thing as they are mostly either not partition tolerant, not
> highly available or not scalable (in a distributed manner, not
> speaking of "monolithic super servers").
> 

> There is no free lunch ...

> 

> 

> 2017-02-10 11:09 GMT+01:00 Kant Kodali :

>> "That’s the safety blanket everyone wants but is extremely expensive,
>> especially in Cassandra."
>> 

>> yes LWT's are expensive. Are there any plans to make this better? 

>> 

>> On Fri, Feb 10, 2017 at 12:17 AM, Kant Kodali
>>  wrote:
>>> Hi Jon,

>>> 

>>> Thanks a lot for your response. I am well aware that the LWW != LWT
>>> but I was talking more in terms of LWW with respective to LWT's
>>> which I believe you answered. so thanks much!
>>> 

>>> 

>>> kant

>>> 

>>> 

>>> On Thu, Feb 9, 2017 at 6:01 PM, Jon Haddad
>>>  wrote:
 LWT != Last Write Wins.  They are totally different.  

 

 LWTs give you (assuming you also read at SERIAL) “atomic
 consistency”, meaning you are able to perform operations atomically
 and in isolation.  That’s the safety blanket everyone wants but is
 extremely expensive, especially in Cassandra.  The lightweight
 part, btw, may be a little optimistic, especially if a key is under
 contention.  With regard to the “last write” part you’re asking
 about - w/ LWT Cassandra provides the timestamp and manages it as
 part of the ballot, and it always is increasing.  See
 org.apache.cassandra.service.ClientState#getTimestampForPaxos.
 From the code:
 

  * Returns a timestamp suitable for paxos given the timestamp of
the last known commit (or in progress update).
  * Paxos ensures that the timestamp it uses for commits respects
the serial order of those commits. It does so
  * by having each replica reject any proposal whose timestamp is
not strictly greater than the last proposal it
  * accepted. So in practice, which timestamp we use for a given
proposal doesn't affect correctness but it does
  * affect the chance of making progress (if we pick a timestamp
lower than what has been proposed before, our
  * new proposal will just get rejected).

 

 Effectively paxos removes the ability to use custom timestamps and
 addresses clock variance by rejecting ballots with timestamps less
 than what was last seen.  You can learn more by reading through the
 other comments and code in that file.
 

 Last write wins is a free for all that guarantees you *nothing*
 except the timestamp is used as a tiebreaker.  Here we acknowledge
 things like the speed of light as being a real problem that isn’t
 going away anytime soon.  This problem is sometimes addressed with
 event sourcing rather than mutating in place.
 

 Hope this helps.

 

 

 Jon

 

 

 

 

> On Feb 9, 2017, at 5:21 PM, Kant Kodali  wrote:
> 

> @Justin I read this article
> http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0.
> And it clearly says Linearizable consistency can be achieved with
> LWT's.  so should I assume the Linearizability in the context of
> the above article is possible with LWT's and synchronization of
> clocks through ntpd ? because LWT's also follow Last Write Wins.
> isn't it? Also another question does most of the production
> clusters do setup ntpd? If so what is the time it takes to sync?
> any idea
> 

> @Micheal Schuler Are you referring to  something like true time as
> in
> https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf?
> Actually I never heard of setting up GPS modules and how that can
> be helpful. Let me research on that but good point.
> 

> On Thu, Feb 9, 2017 at 5:09 PM, Michael Shuler
>  wrote:
>> If you require the best precision you can get, setting up a
>> pair of
>>

Re: How does cassandra achieve Linearizability?

2017-02-10 Thread Benjamin Roth
Hi Kant,

If you read the published papers about Paxos, you will most probably
recognize that there is no way to "do it better". This is a conceptional
thing due to the nature of distributed systems + the CAP theorem.
If you want A+P in the triangle, then C is very expensive. CS is made for
A+P mostly with tunable C. In ACID databases this is a completely different
thing as they are mostly either not partition tolerant, not highly
available or not scalable (in a distributed manner, not speaking of
"monolithic super servers").

There is no free lunch ...


2017-02-10 11:09 GMT+01:00 Kant Kodali :

> "That’s the safety blanket everyone wants but is extremely expensive,
> especially in Cassandra."
>
> yes LWT's are expensive. Are there any plans to make this better?
>
> On Fri, Feb 10, 2017 at 12:17 AM, Kant Kodali  wrote:
>
>> Hi Jon,
>>
>> Thanks a lot for your response. I am well aware that the LWW != LWT but I
>> was talking more in terms of LWW with respective to LWT's which I believe
>> you answered. so thanks much!
>>
>> kant
>>
>> On Thu, Feb 9, 2017 at 6:01 PM, Jon Haddad 
>> wrote:
>>
>>> LWT != Last Write Wins.  They are totally different.
>>>
>>> LWTs give you (assuming you also read at SERIAL) “atomic consistency”,
>>> meaning you are able to perform operations atomically and in isolation.
>>> That’s the safety blanket everyone wants but is extremely expensive,
>>> especially in Cassandra.  The lightweight part, btw, may be a little
>>> optimistic, especially if a key is under contention.  With regard to the
>>> “last write” part you’re asking about - w/ LWT Cassandra provides the
>>> timestamp and manages it as part of the ballot, and it always is
>>> increasing.  See org.apache.cassandra.servi
>>> ce.ClientState#getTimestampForPaxos.  From the code:
>>>
>>>  * Returns a timestamp suitable for paxos given the timestamp of the
>>> last known commit (or in progress update).
>>>  * Paxos ensures that the timestamp it uses for commits respects the
>>> serial order of those commits. It does so
>>>  * by having each replica reject any proposal whose timestamp is not
>>> strictly greater than the last proposal it
>>>  * accepted. So in practice, which timestamp we use for a given proposal
>>> doesn't affect correctness but it does
>>>  * affect the chance of making progress (if we pick a timestamp lower
>>> than what has been proposed before, our
>>>  * new proposal will just get rejected).
>>>
>>> Effectively paxos removes the ability to use custom timestamps and
>>> addresses clock variance by rejecting ballots with timestamps less than
>>> what was last seen.  You can learn more by reading through the other
>>> comments and code in that file.
>>>
>>> Last write wins is a free for all that guarantees you *nothing* except
>>> the timestamp is used as a tiebreaker.  Here we acknowledge things like the
>>> speed of light as being a real problem that isn’t going away anytime soon.
>>> This problem is sometimes addressed with event sourcing rather than
>>> mutating in place.
>>>
>>> Hope this helps.
>>>
>>> Jon
>>>
>>>
>>> On Feb 9, 2017, at 5:21 PM, Kant Kodali  wrote:
>>>
>>> @Justin I read this article http://www.datastax.com/dev/bl
>>> og/lightweight-transactions-in-cassandra-2-0. And it clearly says
>>> Linearizable consistency can be achieved with LWT's.  so should I assume
>>> the Linearizability in the context of the above article is possible
>>> with LWT's and synchronization of clocks through ntpd ? because LWT's also
>>> follow Last Write Wins. isn't it? Also another question does most of the
>>> production clusters do setup ntpd? If so what is the time it takes to sync?
>>> any idea
>>>
>>> @Micheal Schuler Are you referring to  something like true time as in
>>> https://static.googleusercontent.com/media/research.google.c
>>> om/en//archive/spanner-osdi2012.pdf?  Actually I never heard of setting
>>> up GPS modules and how that can be helpful. Let me research on that but
>>> good point.
>>>
>>> On Thu, Feb 9, 2017 at 5:09 PM, Michael Shuler 
>>> wrote:
>>>
 If you require the best precision you can get, setting up a pair of
 stratum 1 ntpd masters in each data center location with a GPS modules
 is not terribly complex. Low latency and jitter on servers you manage.
 140ms is a long way away network-wise, and I would suggest that was a
 poor choice of upstream (probably stratum 2 or 3) source.

 As Jonathan mentioned, there's no guarantee from Cassandra, but if you
 need as close as you can get, you'll probably need to do it yourself.

 (I run several stratum 2 ntpd servers for pool.ntp.org)

 --
 Kind regards,
 Michael

 On 02/09/2017 06:47 PM, Kant Kodali wrote:
 > Hi Justin,
 >
 > There are bunch of issues w.r.t to synchronization of clocks when we
 > used ntpd. Also the time it took to sync the clocks was approx 140ms
 > (don't quote me on it though because it is reported by our devops :)
 >
 > we h

Re: How does cassandra achieve Linearizability?

2017-02-10 Thread Kant Kodali
"That’s the safety blanket everyone wants but is extremely expensive,
especially in Cassandra."

yes LWT's are expensive. Are there any plans to make this better?

On Fri, Feb 10, 2017 at 12:17 AM, Kant Kodali  wrote:

> Hi Jon,
>
> Thanks a lot for your response. I am well aware that the LWW != LWT but I
> was talking more in terms of LWW with respective to LWT's which I believe
> you answered. so thanks much!
>
> kant
>
> On Thu, Feb 9, 2017 at 6:01 PM, Jon Haddad 
> wrote:
>
>> LWT != Last Write Wins.  They are totally different.
>>
>> LWTs give you (assuming you also read at SERIAL) “atomic consistency”,
>> meaning you are able to perform operations atomically and in isolation.
>> That’s the safety blanket everyone wants but is extremely expensive,
>> especially in Cassandra.  The lightweight part, btw, may be a little
>> optimistic, especially if a key is under contention.  With regard to the
>> “last write” part you’re asking about - w/ LWT Cassandra provides the
>> timestamp and manages it as part of the ballot, and it always is
>> increasing.  See 
>> org.apache.cassandra.service.ClientState#getTimestampForPaxos.
>> From the code:
>>
>>  * Returns a timestamp suitable for paxos given the timestamp of the last
>> known commit (or in progress update).
>>  * Paxos ensures that the timestamp it uses for commits respects the
>> serial order of those commits. It does so
>>  * by having each replica reject any proposal whose timestamp is not
>> strictly greater than the last proposal it
>>  * accepted. So in practice, which timestamp we use for a given proposal
>> doesn't affect correctness but it does
>>  * affect the chance of making progress (if we pick a timestamp lower
>> than what has been proposed before, our
>>  * new proposal will just get rejected).
>>
>> Effectively paxos removes the ability to use custom timestamps and
>> addresses clock variance by rejecting ballots with timestamps less than
>> what was last seen.  You can learn more by reading through the other
>> comments and code in that file.
>>
>> Last write wins is a free for all that guarantees you *nothing* except
>> the timestamp is used as a tiebreaker.  Here we acknowledge things like the
>> speed of light as being a real problem that isn’t going away anytime soon.
>> This problem is sometimes addressed with event sourcing rather than
>> mutating in place.
>>
>> Hope this helps.
>>
>> Jon
>>
>>
>> On Feb 9, 2017, at 5:21 PM, Kant Kodali  wrote:
>>
>> @Justin I read this article http://www.datastax.com/dev/bl
>> og/lightweight-transactions-in-cassandra-2-0. And it clearly says
>> Linearizable consistency can be achieved with LWT's.  so should I assume
>> the Linearizability in the context of the above article is possible with
>> LWT's and synchronization of clocks through ntpd ? because LWT's also
>> follow Last Write Wins. isn't it? Also another question does most of the
>> production clusters do setup ntpd? If so what is the time it takes to sync?
>> any idea
>>
>> @Micheal Schuler Are you referring to  something like true time as in
>> https://static.googleusercontent.com/media/research.google.
>> com/en//archive/spanner-osdi2012.pdf?  Actually I never heard of setting
>> up GPS modules and how that can be helpful. Let me research on that but
>> good point.
>>
>> On Thu, Feb 9, 2017 at 5:09 PM, Michael Shuler 
>> wrote:
>>
>>> If you require the best precision you can get, setting up a pair of
>>> stratum 1 ntpd masters in each data center location with a GPS modules
>>> is not terribly complex. Low latency and jitter on servers you manage.
>>> 140ms is a long way away network-wise, and I would suggest that was a
>>> poor choice of upstream (probably stratum 2 or 3) source.
>>>
>>> As Jonathan mentioned, there's no guarantee from Cassandra, but if you
>>> need as close as you can get, you'll probably need to do it yourself.
>>>
>>> (I run several stratum 2 ntpd servers for pool.ntp.org)
>>>
>>> --
>>> Kind regards,
>>> Michael
>>>
>>> On 02/09/2017 06:47 PM, Kant Kodali wrote:
>>> > Hi Justin,
>>> >
>>> > There are bunch of issues w.r.t to synchronization of clocks when we
>>> > used ntpd. Also the time it took to sync the clocks was approx 140ms
>>> > (don't quote me on it though because it is reported by our devops :)
>>> >
>>> > we have multiple clients (for example bunch of micro services are
>>> > reading from Cassandra) I am not sure how one can achieve
>>> > Linearizability by setting timestamps on the clients ? since there is
>>> no
>>> > total ordering across multiple clients.
>>> >
>>> > Thanks!
>>> >
>>> >
>>> > On Thu, Feb 9, 2017 at 4:16 PM, Justin Cameron >> > > wrote:
>>> >
>>> > Hi Kant,
>>> >
>>> > Clock synchronization is important - you should ensure that ntpd is
>>> > properly configured on all nodes. If your particular use case is
>>> > especially sensitive to out-of-order mutations it is possible to
>>> set
>>> > timestamps on the client side using the
>>> >

Re: Error when running nodetool cleanup after adding a new node to a cluster

2017-02-10 Thread Srinath Reddy
The nodetool cleanup ran successfully after setting the CLASSPATH variable to 
the kubernetes-cassandra.jar.

Thanks.

> On 09-Feb-2017, at 2:23 PM, Srinath Reddy  wrote:
> 
> Alex,
> 
> Thanks for reply.  I will try the workaround and post an update.
> 
> Regards,
> 
> Srinath Reddy
> 
>> On 09-Feb-2017, at 1:44 PM, Oleksandr Shulgin > > wrote:
>> 
>> On Thu, Feb 9, 2017 at 6:13 AM, Srinath Reddy > > wrote:
>> Hi,
>> 
>> Trying to re-balacne a Cassandra cluster after adding a new node and I'm 
>> getting this error when running nodetool cleanup. The Cassandra cluster is 
>> running in a Kubernetes cluster.
>> 
>> Cassandra version is 2.2.8
>> 
>> nodetool cleanup
>> error: io.k8s.cassandra.KubernetesSeedProvider
>> Fatal configuration error; unable to start server.  See log for stacktrace.
>> -- StackTrace --
>> org.apache.cassandra.exceptions.ConfigurationException: 
>> io.k8s.cassandra.KubernetesSeedProvider
>> Fatal configuration error; unable to start server.  See log for stacktrace.
>>  at 
>> org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:676)
>>  at 
>> org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:119)
>>  at org.apache.cassandra.tools.NodeProbe.checkJobs(NodeProbe.java:256)
>>  at 
>> org.apache.cassandra.tools.NodeProbe.forceKeyspaceCleanup(NodeProbe.java:262)
>>  at org.apache.cassandra.tools.nodetool.Cleanup.execute(Cleanup.java:55)
>>  at 
>> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:244)
>>  at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:158)
>> 
>> Hi,
>> 
>> From the above stacktrace it looks like you're hitting the following TODO 
>> item:
>> 
>> https://github.com/apache/cassandra/blob/98d74ed998706e9e047dc0f7886a1e9b18df3ce9/src/java/org/apache/cassandra/tools/NodeProbe.java#L282
>>  
>> 
>> 
>> That is, nodetool needs to know concurrent_compactors setting's value before 
>> starting cleanup, but doesn't use JMX and tries to parse the configuration 
>> file instead.  That fails because your custom SeedProvider class is not on 
>> classpath for nodetool.
>> 
>> A workaround: make sure io.k8s.cassandra.KubernetesSeedProvider can be found 
>> by java when running nodetool script, see 
>> https://github.com/apache/cassandra/blob/98d74ed998706e9e047dc0f7886a1e9b18df3ce9/bin/nodetool#L108
>>  
>> 
>> 
>> Proper fix: get rid of the TODO and really query the value using JMX, 
>> especially since the latest tick-tock release of Cassandra (3.10) added a 
>> way to modify it with JMX.
>> 
>> --
>> Alex
> 



signature.asc
Description: Message signed with OpenPGP


Re: cassandra user request log

2017-02-10 Thread Benjamin Roth
If you want to audit write operations only, you could maybe use CDC, this
is a quite new feature in 3.x (I think it was introduced in 3.9 or 3.10)

2017-02-10 10:10 GMT+01:00 vincent gromakowski <
vincent.gromakow...@gmail.com>:

> tx
>
> 2017-02-10 10:01 GMT+01:00 Benjamin Roth :
>
>> you could write a custom trigger that logs access to specific CFs. But be
>> aware that this may have a big performance impact.
>>
>> 2017-02-10 9:58 GMT+01:00 vincent gromakowski <
>> vincent.gromakow...@gmail.com>:
>>
>>> GDPR compliancy...we need to trace user activity on personal data. Maybe
>>> there is another way ?
>>>
>>> 2017-02-10 9:46 GMT+01:00 Benjamin Roth :
>>>
 On a cluster with just a little bit load, that would cause zillions of
 petabytes of logs (just roughly ;)). I don't think this is viable.
 There are many many JMX metrics on an aggregated level. But none per
 authed used.
 What exactly do you want to find out? Is it for debugging purposes?


 2017-02-10 9:42 GMT+01:00 vincent gromakowski <
 vincent.gromakow...@gmail.com>:

> Hi all,
> Is there any way to trace user activity at the server level to see
> which user is accessing which data ? Do you thin it would be simple to
> implement ?
> Tx
>



 --
 Benjamin Roth
 Prokurist

 Jaumo GmbH · www.jaumo.com
 Wehrstraße 46 · 73035 Göppingen · Germany
 Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
 <+49%207161%203048801>
 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

>>>
>>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: cassandra user request log

2017-02-10 Thread vincent gromakowski
tx

2017-02-10 10:01 GMT+01:00 Benjamin Roth :

> you could write a custom trigger that logs access to specific CFs. But be
> aware that this may have a big performance impact.
>
> 2017-02-10 9:58 GMT+01:00 vincent gromakowski <
> vincent.gromakow...@gmail.com>:
>
>> GDPR compliancy...we need to trace user activity on personal data. Maybe
>> there is another way ?
>>
>> 2017-02-10 9:46 GMT+01:00 Benjamin Roth :
>>
>>> On a cluster with just a little bit load, that would cause zillions of
>>> petabytes of logs (just roughly ;)). I don't think this is viable.
>>> There are many many JMX metrics on an aggregated level. But none per
>>> authed used.
>>> What exactly do you want to find out? Is it for debugging purposes?
>>>
>>>
>>> 2017-02-10 9:42 GMT+01:00 vincent gromakowski <
>>> vincent.gromakow...@gmail.com>:
>>>
 Hi all,
 Is there any way to trace user activity at the server level to see
 which user is accessing which data ? Do you thin it would be simple to
 implement ?
 Tx

>>>
>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>> <+49%207161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: cassandra user request log

2017-02-10 Thread Benjamin Roth
you could write a custom trigger that logs access to specific CFs. But be
aware that this may have a big performance impact.

2017-02-10 9:58 GMT+01:00 vincent gromakowski :

> GDPR compliancy...we need to trace user activity on personal data. Maybe
> there is another way ?
>
> 2017-02-10 9:46 GMT+01:00 Benjamin Roth :
>
>> On a cluster with just a little bit load, that would cause zillions of
>> petabytes of logs (just roughly ;)). I don't think this is viable.
>> There are many many JMX metrics on an aggregated level. But none per
>> authed used.
>> What exactly do you want to find out? Is it for debugging purposes?
>>
>>
>> 2017-02-10 9:42 GMT+01:00 vincent gromakowski <
>> vincent.gromakow...@gmail.com>:
>>
>>> Hi all,
>>> Is there any way to trace user activity at the server level to see which
>>> user is accessing which data ? Do you thin it would be simple to implement ?
>>> Tx
>>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: cassandra user request log

2017-02-10 Thread vincent gromakowski
GDPR compliancy...we need to trace user activity on personal data. Maybe
there is another way ?

2017-02-10 9:46 GMT+01:00 Benjamin Roth :

> On a cluster with just a little bit load, that would cause zillions of
> petabytes of logs (just roughly ;)). I don't think this is viable.
> There are many many JMX metrics on an aggregated level. But none per
> authed used.
> What exactly do you want to find out? Is it for debugging purposes?
>
>
> 2017-02-10 9:42 GMT+01:00 vincent gromakowski <
> vincent.gromakow...@gmail.com>:
>
>> Hi all,
>> Is there any way to trace user activity at the server level to see which
>> user is accessing which data ? Do you thin it would be simple to implement ?
>> Tx
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: cassandra user request log

2017-02-10 Thread Benjamin Roth
On a cluster with just a little bit load, that would cause zillions of
petabytes of logs (just roughly ;)). I don't think this is viable.
There are many many JMX metrics on an aggregated level. But none per authed
used.
What exactly do you want to find out? Is it for debugging purposes?


2017-02-10 9:42 GMT+01:00 vincent gromakowski :

> Hi all,
> Is there any way to trace user activity at the server level to see which
> user is accessing which data ? Do you thin it would be simple to implement ?
> Tx
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


cassandra user request log

2017-02-10 Thread vincent gromakowski
Hi all,
Is there any way to trace user activity at the server level to see which
user is accessing which data ? Do you thin it would be simple to implement ?
Tx


Re: How does cassandra achieve Linearizability?

2017-02-10 Thread Kant Kodali
Hi Jon,

Thanks a lot for your response. I am well aware that the LWW != LWT but I
was talking more in terms of LWW with respective to LWT's which I believe
you answered. so thanks much!

kant

On Thu, Feb 9, 2017 at 6:01 PM, Jon Haddad 
wrote:

> LWT != Last Write Wins.  They are totally different.
>
> LWTs give you (assuming you also read at SERIAL) “atomic consistency”,
> meaning you are able to perform operations atomically and in isolation.
> That’s the safety blanket everyone wants but is extremely expensive,
> especially in Cassandra.  The lightweight part, btw, may be a little
> optimistic, especially if a key is under contention.  With regard to the
> “last write” part you’re asking about - w/ LWT Cassandra provides the
> timestamp and manages it as part of the ballot, and it always is
> increasing.  See 
> org.apache.cassandra.service.ClientState#getTimestampForPaxos.
> From the code:
>
>  * Returns a timestamp suitable for paxos given the timestamp of the last
> known commit (or in progress update).
>  * Paxos ensures that the timestamp it uses for commits respects the
> serial order of those commits. It does so
>  * by having each replica reject any proposal whose timestamp is not
> strictly greater than the last proposal it
>  * accepted. So in practice, which timestamp we use for a given proposal
> doesn't affect correctness but it does
>  * affect the chance of making progress (if we pick a timestamp lower than
> what has been proposed before, our
>  * new proposal will just get rejected).
>
> Effectively paxos removes the ability to use custom timestamps and
> addresses clock variance by rejecting ballots with timestamps less than
> what was last seen.  You can learn more by reading through the other
> comments and code in that file.
>
> Last write wins is a free for all that guarantees you *nothing* except the
> timestamp is used as a tiebreaker.  Here we acknowledge things like the
> speed of light as being a real problem that isn’t going away anytime soon.
> This problem is sometimes addressed with event sourcing rather than
> mutating in place.
>
> Hope this helps.
>
> Jon
>
>
> On Feb 9, 2017, at 5:21 PM, Kant Kodali  wrote:
>
> @Justin I read this article http://www.datastax.com/dev/
> blog/lightweight-transactions-in-cassandra-2-0. And it clearly says
> Linearizable consistency can be achieved with LWT's.  so should I assume
> the Linearizability in the context of the above article is possible with
> LWT's and synchronization of clocks through ntpd ? because LWT's also
> follow Last Write Wins. isn't it? Also another question does most of the
> production clusters do setup ntpd? If so what is the time it takes to sync?
> any idea
>
> @Micheal Schuler Are you referring to  something like true time as in
> https://static.googleusercontent.com/media/research.google.com/en//
> archive/spanner-osdi2012.pdf?  Actually I never heard of setting up GPS
> modules and how that can be helpful. Let me research on that but good point.
>
> On Thu, Feb 9, 2017 at 5:09 PM, Michael Shuler 
> wrote:
>
>> If you require the best precision you can get, setting up a pair of
>> stratum 1 ntpd masters in each data center location with a GPS modules
>> is not terribly complex. Low latency and jitter on servers you manage.
>> 140ms is a long way away network-wise, and I would suggest that was a
>> poor choice of upstream (probably stratum 2 or 3) source.
>>
>> As Jonathan mentioned, there's no guarantee from Cassandra, but if you
>> need as close as you can get, you'll probably need to do it yourself.
>>
>> (I run several stratum 2 ntpd servers for pool.ntp.org)
>>
>> --
>> Kind regards,
>> Michael
>>
>> On 02/09/2017 06:47 PM, Kant Kodali wrote:
>> > Hi Justin,
>> >
>> > There are bunch of issues w.r.t to synchronization of clocks when we
>> > used ntpd. Also the time it took to sync the clocks was approx 140ms
>> > (don't quote me on it though because it is reported by our devops :)
>> >
>> > we have multiple clients (for example bunch of micro services are
>> > reading from Cassandra) I am not sure how one can achieve
>> > Linearizability by setting timestamps on the clients ? since there is no
>> > total ordering across multiple clients.
>> >
>> > Thanks!
>> >
>> >
>> > On Thu, Feb 9, 2017 at 4:16 PM, Justin Cameron > > > wrote:
>> >
>> > Hi Kant,
>> >
>> > Clock synchronization is important - you should ensure that ntpd is
>> > properly configured on all nodes. If your particular use case is
>> > especially sensitive to out-of-order mutations it is possible to set
>> > timestamps on the client side using the
>> > drivers. https://docs.datastax.com/en/d
>> eveloper/java-driver/3.1/manual/query_timestamps/
>> > > anual/query_timestamps/>
>> >
>> > We use our own NTP cluster to reduce clock drift as much as
>> > possible, but public NTP servers are good enough for most
>> > uses.