from:"Nate McCall"

ApacheCon Cassandra and NGCC 2020 Call for proposals

2020-02-05 Thread Nate McCall

I am delighted to share with you that we, the Apache Cassandra community,
in light of our success at last year at last year's conference, have been
given a three day track at this year's ApacheCon in New Orleans, LA, USA
[0].

The goal of this track is simple: we are going to get together to talk
about Apache Cassandra. As such, this will be the ideal place to network
with peers, ask questions, get answers, etc.

On day one, we will be having our Next Generation Cassandra Conference
(NGCC). All are welcome to attend but this day is targeted for Apache
Cassandra committers, contributors and large-scale cluster operators to get
together and discuss topics of interest to them for future development
efforts. The content will focus on internals and will be geared towards
folks with knowledge of the codebase and/or operating Cassandra in very
large environments. Talk submissions for NGCC should take this target
audience into account.

Days two and three will be more general purpose and accessible for a wider
audience. If you are interested in speaking here, put something together
that tells a story others will want to hear. What we are looking for is
general use case submissions that our users will find interesting. This can
be how you solved a specific problem or just a general picture into how
your organization uses Apache Cassandra. A good submission will embrace the
open source ethos of sharing information to help others solve similar
problems.

NGCC talks will be targeted to 30 minutes with 15 minutes for questions or
small break out discussions. General purpose talks will have 50 minutes
with five minutes for questions.

For more information, including details of how to submit proposals, please
see this page:
https://acna2020.jamhosted.net

Please indicate "Cassandra" as the category and add NGCC at the top of the
"Proposal abstract" text box if you are submitting an NGCC talk.

If you are interested in helping organize, plan, and review submissions for
the Cassandra track, we'll send additional details out closer to the CFP
deadline about how you can be involved.

[0] https://www.apachecon.com/acna2020/

2020 ASF Community Survey: Users

2019-12-05 Thread Nate McCall

Hello everyone,

If you have an apache.org email, you should have received an email with an
invitation to take the 2020 ASF Community Survey. Please take 15 minutes to
complete it.

If you do not have an apache.org email address or you didn’t receive a
link, please follow this link to the survey:
https://communitysurvey.limequery.org/454363

This survey is important because it will provide us with scientific
information about our community, and shed some light on how we can
collaborate better and become more diverse. Our last survey of this kind
was implemented in 2016, which means that our existing data about Apache
communities is outdated. The deadline to complete the survey is January
4th, 2020. You can find information about privacy on the survey’s
Confluence page [1].

Your participation is paramount to the success of this project! Please
consider filling out the survey, and share this news with your fellow
Apache contributors. As individuals form the Apache community, your opinion
matters: we want to hear your voice.

If you have any questions about the survey or otherwise, please reach out
to us!

Kindly,
ASF Diversity & Inclusion
https://diversity.apache.org/


[1]
https://cwiki.apache.org/confluence/display/EDI/Launch+Plan+-+The+2020+ASF+Community+Survey

Cassandra track at ApacheCon 2019 finalized

2019-06-06 Thread Nate McCall

Hi Folks,
The schedule is up for ApacheCon 2019, we could not be happier with the
Cassandra track we were able to put together.

https://www.apachecon.com/acna19/schedule.html

Huge thanks again to everyone that submitted talks. We had 3x the number of
submissions of any other project specific track and *almost* as many
submissions as the premier big data track!!

Make sure you get this on your schedules. It will be a unique opportunity
to interface with project developers, other Apache Cassandra users and
operators as well as the whole ASF community.

Hope to see you all there!

Cheers,
-Nate

Two day Apache Cassandra track at ApacheConNA 2019

2019-03-12 Thread Nate McCall

Hi Folks,
I am delighted to share with you that we, the Apache Cassandra
community, have been given a two day track at this year's ApacheCon
North America.

The goal of this track is simple: we are going to get together to talk
about Apache Cassandra. As such, this will be the ideal place to
network with peers, ask questions, get answers, etc.

On day one, we will be having our Next Generation Cassandra Conference
(NGCC). All are welcome to attend but this day is targeted for Apache
Cassandra committers, contributors and large-scale cluster operators
to get together and discuss topics of interest to them for future
development efforts. The content will focus on internals and will be
geared towards folks with knowledge of the codebase and/or operating
Cassandra in very large environments. Talk submissions for NGCC should
take this target audience into account.

Day two will be more general purpose and accessible for a wider
audience. If you are interested in speaking here, put something
together that tells a story others will want to hear. What we are
looking for is general use case submissions that our users will find
interesting. This can be how you solved a specific problem or just a
general picture into how your organization uses Apache Cassandra. A
good submission will embrace the open source ethos of sharing
information to help others solve similar problems.

NGCC talks will be targeted to 30 minutes with 15 minutes for
questions or small break out discussions. General purpose talks will
have 40 minutes with five minutes for questions.

For more information, including details of how to submit proposals,
please see this page:
http://cassandra.apache.org/events/2019-apache-cassandra-summit/

Cheers,
-Nate

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: too many logDroppedMessages and StatusLogger

2019-03-11 Thread Nate McCall

Are you using queries with a large number of arguments to an IN clause
on a partition key? If so, the coordinator has to:
- hold open the client request
- unwind the IN clause into individual statements
- scatter/gathering those statements around the cluster (each at the
requested consistency level!)
- pull it all back together and send it out

In extreme cases, this can flood internode messaging and make things
look slow even when the system is near idle.

On Fri, Mar 8, 2019 at 9:27 PM Marco Gasparini
 wrote:
>
> Hi all,
>
> I cannot understand why I get the following logs, they appear every day at 
> not fixed period of time. I saw them every 2 minutes or every 10 seconds, I 
> cannot find any pattern.
> I took this very example here during an heavy workload of writes and reads 
> but I get them also during a very little workload and without any active 
> compaction/repair/streaming process and no high cpu/memory/iowait usage.
>
>> 2019-03-08 01:49:47,868 INFO  [ScheduledTasks:1] MessagingService.java:1246 
>> logDroppedMessages READ messages were dropped in last 5000 ms: 0 internal 
>> and 1 cross node. Mean internal dropped latency: 6357 ms and Mean cross-node 
>> dropped latency: 6556 ms
>> 2019-03-08 01:49:47,868 INFO  [ScheduledTasks:1] StatusLogger.java:47 log 
>> Pool NameActive   Pending  Completed   Blocked  All 
>> Time Blocked
>> 2019-03-08 01:49:47,870 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> MutationStage 0 0   17641121 0   
>>   0
>> 2019-03-08 01:49:47,870 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> ViewMutationStage 0 0  0 0   
>>   0
>> 2019-03-08 01:49:47,870 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> ReadStage 0 06851090 0   
>>   0
>> 2019-03-08 01:49:47,870 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> RequestResponseStage  0 0   13646587 0   
>>   0
>> 2019-03-08 01:49:47,870 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> ReadRepairStage   0 0 352884 0   
>>   0
>> 2019-03-08 01:49:47,870 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> CounterMutationStage  0 0  0 0   
>>   0
>> 2019-03-08 01:49:47,870 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> MiscStage 0 0  0 0   
>>   0
>> 2019-03-08 01:49:47,870 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> CompactionExecutor0 0 882478 0   
>>   0
>> 2019-03-08 01:49:47,871 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> MemtableReclaimMemory 0 0   4101 0   
>>   0
>> 2019-03-08 01:49:47,871 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> PendingRangeCalculator0 0  7 0   
>>   0
>> 2019-03-08 01:49:47,871 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> GossipStage   0 04399705 0   
>>   0
>> 2019-03-08 01:49:47,871 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> SecondaryIndexManagement  0 0  0 0   
>>   0
>> 2019-03-08 01:49:47,871 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> HintsDispatcher   0 0   2165 0   
>>   0
>> 2019-03-08 01:49:47,871 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> MigrationStage0 0 50 0   
>>   0
>> 2019-03-08 01:49:47,871 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> MemtablePostFlush 0 0   4393 0   
>>   0
>> 2019-03-08 01:49:47,872 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> PerDiskMemtableFlushWriter_0 0 0   4097 0
>>  0
>> 2019-03-08 01:49:47,872 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> ValidationExecutor0 0   1565 0   
>>   0
>> 2019-03-08 01:49:47,872 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> Sampler   0 0  0 0   
>>   0
>> 2019-03-08 01:49:47,872 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> MemtableFlushWriter   0 0   4101 0   
>>   0
>> 2019-03-08 01:49:47,872 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> InternalResponseStage 0 0 121813 0   
>>   0
>> 2019-03-08 01:49:47,872 INFO  [ScheduledTasks:1] StatusLogger.java:51 log 
>> AntiEntropyStage  0 0

Re: Cassandra trace

2018-10-24 Thread Nate McCall

At this point, query tracing is easier to do from the driver side.
Docs for python and java:
http://datastax.github.io/python-driver/api/cassandra/query.html#
https://github.com/datastax/java-driver/tree/3.x/manual/logging#logging-query-latencies

This has been completely redone in 4.0. For details (which also
include some good discussion on the current limitations) see:
https://issues.apache.org/jira/browse/CASSANDRA-13983
https://issues.apache.org/jira/browse/CASSANDRA-12151
On Tue, Oct 23, 2018 at 5:10 PM Mun Dega  wrote:
>
> Hello,
>
> Does anyone know how I can see queries coming when they're as prepared 
> statements when trace is turned on Cassandra 3.x?
>
> If trace doesn't show, any ideas how I can see these type of queries?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Cassandra 4.0

2018-10-24 Thread Nate McCall

When it's ready :)

In all seriousness, the past two blog posts include some discussion on
our motivations and current goals with regard to 4.0:
http://cassandra.apache.org/blog/
On Wed, Oct 24, 2018 at 4:49 AM Abdul Patel  wrote:
>
> Hi all,
>
> Any idea when 4.0 is planned to release?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: SNAPSHOT builds?

2018-09-30 Thread Nate McCall

We'll start publishing snapshot builds in the near future to ease
testing (support for such just added via CASSANDRA-12704).
On Sun, Sep 30, 2018 at 5:11 AM James Carman  wrote:
>
> Okay, cool.  So, 4.0.0-SNAPSHOT doesn’t have Java 11 support quite yet?  No 
> big deal.  Just trying to get ahead of the game and be ready once we have it. 
>  Thanks, Jonathan!
>
> On Sat, Sep 29, 2018 at 11:16 AM Jonathan Haddad  wrote:
>>
>> Hey James, you’ll have to build it. Java 11 is out  but the build 
>> instructions still apply:
>>
>> http://thelastpickle.com/blog/2018/08/16/java11.html
>>
>>
>> On Sat, Sep 29, 2018 at 7:01 AM James Carman  
>> wrote:
>>>
>>> I am trying to find 4.x SNAPSHOT builds.  Are they available anywhere 
>>> handy?  I'm trying to work on Java 11 compatibility for a library.
>>>
>>> Thanks,
>>>
>>> James
>>
>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Separated commit log directory configuration

2018-09-30 Thread Nate McCall

> We only increased commitlog_total_space_in_mb so that Cassandra fully uses 
> the dedicated disk, but that may be an error?
> The default value for this setting is (per the documentation):
>
> The default value is the smaller of 8192, and 1/4 of the total space of 
> the commitlog volume.
>
> But that doesn't say much  (or should it really by 25% of the disk space?)

I wouldnt fill any filesystem intentionally that close to its
partition size. Most (all?) will start to degrade performance-wise.
Unless you are really strapped for disk space, give it some breathing
room. This is best chosen by monitoring commitlog rotation frequency
in conjunction with disk utilization for your cluster.

>
> So, my questions would be:
>
> * What size should I dedicate to this commit log disk? What are the rules of 
> thumb to discover the "best" size?
> * How should I configure the "commitlog_total_space_in_mb" setting 
> respectively to the size of the disk?

Most clusters shouldnt need to adjust this or any of the default
commitlog settings unless you have excessively large mutations or
require the commitlog getting written to disk more frequently.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Rolling back Cassandra upgrades (tarball)

2018-09-30 Thread Nate McCall

> I have a cluster on v3.0.11 I am planning to upgrade this to 3.10.
> Is rolling back the binaries a viable solution?

What's the goal with moving form 3.0 to 3.x?

Also, our latest release in 3.x is 3.11.3 and has a couple of
important bug fixes over 3.10 (which is a bit dated at this point).

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Apache Cassandra Blog is now live

2018-08-07 Thread Nate McCall

You can tell how psyched we are about it because we cross posted!

Seriously though - this is by the community for the community, so any
ideas - please send them along.

On Wed, Aug 8, 2018 at 1:53 PM, sankalp kohli  wrote:
> Hi,
>  Apache Cassandra Blog is now live. Check out the first blog post.
>
> http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html
>
> Thanks,
> Sankalp

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

New community blog with inaugural post on faster streaming in 4.0

2018-08-07 Thread Nate McCall

Hi folks,
We just added a blog section to our site, with a post detailing
performance improvements of streaming coming in 4.0:
http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html

I think it's a good indicator of what we are going for that our first
author is not a committer or PMC member. Any subject ideas, please
bring them up on the the dev list (d...@cassandra.apache.org) or open a
JIRA. As long as it's informative and about Apache Cassandra, we are
interested.

Thanks,
-Nate

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: which driver to use with cassandra 3

2018-07-22 Thread Nate McCall

Due to how Spring Data binding works, you have to write queries
explicitly to use the "...FROM keyspace.table ..." in either the
template-method classes (CqlTemplate, etc) or via @Query annontations
to avoid the 'use keyspace' overhead. For example, a Repository
implementation for a User class (do this all by hand, do not use
CassandraRepository per Patrick's point of it being a traveling
carnival of cassandra anti-patterns) would look something like:

@Query("select id, email, name from myusers.user where id = ?0")
User findById(UUID id);

Another important note - only the template method classes that work
*directly* with prepared statements use them. In other words: *nothing
else in the API uses prepared statements.* And this is a massive
performance hit in statement parsing alone. There are open issues for
this in the SD jira:
https://jira.spring.io/browse/DATACASS-578
https://jira.spring.io/browse/DATACASS-510

If you stick to the cqlTemplate methods for working with
PreparedStatements and ResultSet extractors, etc, because you want
Spring to manage all the configuration, that's totally legit and it
will work well.

In general, this will be an good API one day as some of the Fluent
stuff for working with paged results sets is particularly excellent
and well crafted around modern Java paradigms (outside of not using
PreparedStatement unfortunately).

On Sun, Jul 22, 2018 at 1:15 PM, Goutham reddy
 wrote:
> Hi,
> Consider overriding default java driver provided by spring boot if you are
> using Datastax clusters with with any of the 3.X Datastax driver. I agree to
> Patrick, always have one key space specified to one application in that way
> you achieve domain driven applications and cause less overhead avoiding
> switching between key spaces.
>
> Cheers,
> Goutham
>
> On Fri, Jul 20, 2018 at 10:10 AM Patrick McFadin  wrote:
>>
>> Vitaliy,
>>
>> The DataStax Java driver is very actively maintained by a good size team
>> and a lot of great community contributors. It's version 3.x compatible and
>> even has some 4.x features starting to creep in. Support for virtual tables
>> (https://issues.apache.org/jira/browse/CASSANDRA-7622)  was just merged as
>> an example. Even the largest DataStax customers have a mix of enterprise +
>> OSS and we want to support them either way. Giving developers the most
>> consistent experience is part of that goal.
>>
>> As for spring-data-cassandra, it does pull the latest driver as a part of
>> its own build, so you will already have it in your classpath. Spring adds
>> some auto-magic that you should be aware. The part you mentioned about the
>> schema management, is one to be careful with using. If you use it in dev,
>> it's not a huge problem. If it gets out to prod, you could potentially have
>> A LOT of concurrent schema changes happening which can lead to bad things.
>> Also, some of the spring API features such as findAll() can expose typical
>> c* anti-patterns such as "allow filtering" Just be aware of what feature
>> does what. And finally, another potential production problem is that if you
>> use a lot of keyspaces, Spring will instantiate a new Driver Session object
>> per keyspace which can lead to a lot of redundant connection to the
>> database. From the driver, a better way is to specify a keyspace per query.
>>
>> As you are using spring-data-cassandra, please share your experiences if
>> you can. There are a lot of developers that would benefit from some
>> real-world stories.
>>
>> Patrick
>>
>>
>> On Fri, Jul 20, 2018 at 4:54 AM Vitaliy Semochkin 
>> wrote:
>>>
>>> Thank you very much Duy Hai Doan!
>>> I have relatively simple demands and since spring using datastax
>>> driver I can always get back to it,
>>> though  I would prefer to use spring in order to do bootstrapping and
>>> resource management for me.
>>> On Fri, Jul 20, 2018 at 4:51 PM DuyHai Doan  wrote:
>>> >
>>> > Spring data cassandra is so so ... It has less features (at last at the
>>> > time I looked at it) than the default Java driver
>>> >
>>> > For driver, right now most of people are using Datastax's ones
>>> >
>>> > On Fri, Jul 20, 2018 at 3:36 PM, Vitaliy Semochkin
>>> >  wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> Which driver to use with cassandra 3
>>> >>
>>> >> the one that is provided by datastax, netflix or something else.
>>> >>
>>> >> Spring uses driver from datastax, though is it a reliable solution for
>>> >> a long term project, having in mind that datastax and cassandra
>>> >> parted?
>>> >>
>>> >> Regards,
>>> >> Vitaliy
>>> >>
>>> >> -
>>> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> >> For additional commands, e-mail: user-h...@cassandra.apache.org
>>> >>
>>> >
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>
> --
> R

CVE-2018-8016 on Apache Cassandra

2018-06-25 Thread Nate McCall

CVE-2018-8016 describes an issue with the default configuration of
Apache Cassandra releases 3.8 through 3.11.1 which binds an
unauthenticated JMX/RMI interface to all network interfaces allowing
attackers to execute arbitrary Java code via an RMI request. This
issue is a regression of the previously disclosed CVE-2015-0225.

The regression was introduced in
https://issues.apache.org/jira/browse/CASSANDRA-12109. The fix for the
regression is implemented in
https://issues.apache.org/jira/browse/CASSANDRA-14173. This fix is
contained in the 3.11.2 release of Apache Cassandra.

- The Apache Cassandra PMC

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: 答复: Time serial column family design

2018-04-17 Thread Nate McCall

I disagree. Create date as a raw integer is an excellent surrogate for
controlling time series "buckets" as it gives you complete control over the
granularity. You can even have multiple granularities in the same table -
remember that partition key "misses" in Cassandra are pretty lightweight as
they won't make it past the bloom filter on the read path.

On Wed, Apr 18, 2018 at 10:00 AM, Javier Pareja 
wrote:

> Hi David,
>
> Could you describe why you chose to include the create date in the
> partition key? If the vin in enough "partitioning", meaning that the size
> (number of rows x size of row) of each partition is less than 100MB, then
> remove the date and just use the create_time, because the date is already
> included in that column anyways.
>
> For example if columns "a" and "b" (from your table) are of max 256 UTF8
> characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows
> per partition. You can actually have many more but you don't want to go
> much higher for performance reasons.
>
> If this is not enough you could use create_month instead of create_date,
> for example, to reduce the partition size while not being too granular.
>
>
> On Tue, 17 Apr 2018, 22:17 Nate McCall,  wrote:
>
>> Your table design will work fine as you have appropriately bucketed by an
>> integer-based 'create_date' field.
>>
>> Your goal for this refactor should be to remove the "IN" clause from your
>> code. This will move the rollup of multiple partition keys being retrieved
>> into the client instead of relying on the coordinator assembling the
>> results. You have to do more work and add some complexity, but the trade
>> off will be much higher performance as you are removing the single
>> coordinator as the bottleneck.
>>
>> On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni 
>> wrote:
>>
>>> Hi Nate,
>>>
>>> Thanks for your reply!
>>>
>>> Is there other way to design this table to meet this requirement?
>>>
>>>
>>>
>>> Best Regards,
>>>
>>>
>>>
>>> 倪项菲*/ **David Ni*
>>>
>>> 中移德电网络科技有限公司
>>>
>>> Virtue Intelligent Network Ltd, co.
>>>
>>> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
>>>
>>> Mob: +86 13797007811|Tel: + 86 27 5024 2516
>>>
>>>
>>>
>>> *发件人:* Nate McCall 
>>> *发送时间:* 2018年4月17日 7:12
>>> *收件人:* Cassandra Users 
>>> *主题:* Re: Time serial column family design
>>>
>>>
>>>
>>>
>>>
>>> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in
>>> (20180416, 20180415, 20180414, 20180413, 20180412….);
>>>
>>> But this cause the cql query is very long,and I don’t know whether there
>>> is limitation for the length of the cql.
>>>
>>> Please give me some advice,thanks in advance.
>>>
>>>
>>>
>>> Using the SELECT ... IN syntax  means that:
>>>
>>> - the driver will not be able to route the queries to the nodes which
>>> have the partition
>>>
>>> - a single coordinator must scatter-gather the query and results
>>>
>>>
>>>
>>> Break this up into a series of single statements using the executeAsync
>>> method and gather the results via something like Futures in Guava or
>>> similar.
>>>
>>
>>
>>
>> --
>> -
>> Nate McCall
>> Wellington, NZ
>> @zznate
>>
>> CTO
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>


-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: 答复: Time serial column family design

2018-04-17 Thread Nate McCall

Your table design will work fine as you have appropriately bucketed by an
integer-based 'create_date' field.

Your goal for this refactor should be to remove the "IN" clause from your
code. This will move the rollup of multiple partition keys being retrieved
into the client instead of relying on the coordinator assembling the
results. You have to do more work and add some complexity, but the trade
off will be much higher performance as you are removing the single
coordinator as the bottleneck.

On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni  wrote:

> Hi Nate,
>
> Thanks for your reply!
>
> Is there other way to design this table to meet this requirement?
>
>
>
> Best Regards,
>
>
>
> 倪项菲*/ **David Ni*
>
> 中移德电网络科技有限公司
>
> Virtue Intelligent Network Ltd, co.
>
> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
>
> Mob: +86 13797007811|Tel: + 86 27 5024 2516
>
>
>
> *发件人:* Nate McCall 
> *发送时间:* 2018年4月17日 7:12
> *收件人:* Cassandra Users 
> *主题:* Re: Time serial column family design
>
>
>
>
>
> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in
> (20180416, 20180415, 20180414, 20180413, 20180412….);
>
> But this cause the cql query is very long,and I don’t know whether there
> is limitation for the length of the cql.
>
> Please give me some advice,thanks in advance.
>
>
>
> Using the SELECT ... IN syntax  means that:
>
> - the driver will not be able to route the queries to the nodes which have
> the partition
>
> - a single coordinator must scatter-gather the query and results
>
>
>
> Break this up into a series of single statements using the executeAsync
> method and gather the results via something like Futures in Guava or
> similar.
>



-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Time serial column family design

2018-04-16 Thread Nate McCall

>
>
> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in
> (20180416, 20180415, 20180414, 20180413, 20180412….);
>
> But this cause the cql query is very long,and I don’t know whether there
> is limitation for the length of the cql.
>
> Please give me some advice,thanks in advance.
>

Using the SELECT ... IN syntax  means that:
- the driver will not be able to route the queries to the nodes which have
the partition
- a single coordinator must scatter-gather the query and results

Break this up into a series of single statements using the executeAsync
method and gather the results via something like Futures in Guava or
similar.

Re: Mailing list server IPs

2018-04-15 Thread Nate McCall

Hi Jacques,
Thanks for bringing this up. I took a quick look through the INFRA project
and saw a couple of resolved issues that might help:
https://issues.apache.org/jira/browse/INFRA-6584?jql=project%20%3D%20INFRA%20AND%20text%20~%20%22mail%20server%20whitelist%22

If those don't do it for you, please open a new issue with INFRA.


On Sat, Apr 14, 2018 at 1:19 AM, Jacques-Henri Berthemet <
jacques-henri.berthe...@genesys.com> wrote:

> I checked with IT and I missed an email on the period where I got the last
> bounce. It’s not a very big deal but I’d like to have it fixed if
> possible.
>
>
>
> Gmail servers are very picky on SMTP traffic and reject a lot of things.
>
>
>
> *--*
>
> *Jacques-Henri Berthemet*
>
>
>
> *From:* Nicolas Guyomar [mailto:nicolas.guyo...@gmail.com]
> *Sent:* Friday, April 13, 2018 3:15 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Mailing list server IPs
>
>
>
> Hi,
>
>
>
> I receive similar messages from time to time, and I'm using Gmail ;)  I
> believe I never missed a mail on the ML and that you can safely ignore this
> message
>
>
>
> On 13 April 2018 at 15:06, Jacques-Henri Berthemet <
> jacques-henri.berthe...@genesys.com> wrote:
>
> Hi,
>
>
>
> I’m getting bounce messages from the ML from time to time, see attached
> example. Our IT told me that they need to whitelist all IPs used by
> Cassandra ML server. Is there a way to get those IPs?
>
>
>
> Sorry if it’s not really related to Cassandra itself but I didn’t find
> anything in http://untroubled.org/ezmlm/ezman/ezman5.html commands.
>
>
>
> Regards,
>
> --
>
> Jacques-Henri Berthemet
>
>
>
> -- Forwarded message --
> From: "user-h...@cassandra.apache.org" 
> To: Jacques-Henri Berthemet 
> Cc:
> Bcc:
> Date: Fri, 6 Apr 2018 20:47:22 +
> Subject: Warning from user@cassandra.apache.org
> Hi! This is the ezmlm program. I'm managing the
> user@cassandra.apache.org mailing list.
>
>
> Messages to you from the user mailing list seem to
> have been bouncing. I've attached a copy of the first bounce
> message I received.
>
> If this message bounces too, I will send you a probe. If the probe bounces,
> I will remove your address from the user mailing list,
> without further notice.
>
>
> I've kept a list of which messages from the user mailing list have
> bounced from your address.
>
> Copies of these messages may be in the archive.
> To retrieve a set of messages 123-145 (a maximum of 100 per request),
> send a short message to:
>
>
> To receive a subject and author list for the last 100 or so messages,
> send a short message to:
>
>
> Here are the message numbers:
>
>60535
>60536
>60548
>
> --- Enclosed is a copy of the bounce message I received.
>
> Return-Path: <>
> Received: (qmail 8848 invoked for bounce); 27 Mar 2018 14:22:11 -
> Date: 27 Mar 2018 14:22:11 -0000
> From: mailer-dae...@apache.org
> To: user-return-605...@cassandra.apache.org
> Subject: failure notice
>
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
>



-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: OOM after a while during compacting

2018-04-05 Thread Nate McCall

>
>
> - Heap size is set to 8GB
> - Using G1GC
> - I tried moving the memtable out of the heap. It helped but I still got
> an OOM last night
> - Concurrent compactors is set to 1 but it still happens and also tried
> setting throughput between 16 and 128, no changes.
>

That heap size is way to small for G1GC. Switch back to the defaults with
CMS. IME, G1 needs > 20g for *just* the JVM to see improvements (but this
also depends on workload and a few other factors). Stick with the CMS
defaults unless you have some evidence-based experiment to try.

Also worth noting that with a 1TB gp2 EBS volume, you only have 3k IOPS to
play with before you are subject to rate limiting. If you allocate a volume
greater than 3.33TB, you get 10K IOPS and the rate limiting goes away (you
can see this playing around with the EBS sizing in the AWS calculator:
http://calculator.s3.amazonaws.com/index.html). Another common mistake here
is accidentally putting the commitlog on the boot volume which has a super
low amount of IOPS given it's 64g (?iirc) by default.

Re: cassl 2.1.x seed node update via JMX

2018-03-22 Thread Nate McCall

This capability was *just* added in CASSANDRA-14190 and only in trunk.

Previously (as described in the ticket above), the seed node list is only
updated when doing a shadow round, removing an endpoint or restarting (look
for callers of o.a.c.gms.Gossiper#buildSeedsList() if you're curious).

A rolling restart is the usual SOP for that.

On Fri, Mar 23, 2018 at 9:54 AM, Carl Mueller 
wrote:

> We have a cluster that is subject to the one-year gossip bug.
>
> We'd like to update the seed node list via JMX without restart, since our
> foolishly single-seed-node in this forsaken cluster is being autoculled in
> AWS.
>
> Is this possible? It is not marked volatile in the Config of the source
> code, so I doubt it.
>

-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Migration of keyspace to another new cluster

2018-03-13 Thread Nate McCall

> Hi,
> We got a requirement to migrate only one keyspace data from one cluster to
> other cluster. And we no longer need the old cluster anymore. Can you
> suggest what are the best possible ways we can achieve it.
>
> Regards
> Goutham Reddy
>


Temporarily treat the new cluster as a new datacenter for the current
cluster and follow the process for adding a datacenter for that keyspace.
When complete remove the old datacenter/cluster similarly.

Re: Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes?

2018-02-20 Thread Nate McCall

> We do archiving data in Order to make assumptions on it in future. So, yes
> we expect to grow continously. In the mean time I learned to go for
> predictable grow per partition rather than unpredictable large
> partitioning. So today we are growing 250.000.000 Records per Day going
> into a single table and heading towards to about 100 times that number this
> year. A Partition will grow one Record a Day, which should give us good
> horizontal scaleability, but means 250.000.000 to 25.000.000.000
> partitions. Hope this Numbers should not make me feel uncomfortable :)
>

There will be some additional tuning to do at around ~200 million
partitions per table per node. Specifically bloom filters and index
summaries. Depending on partition size and read access patterns, tuning
compression settings will have a big effect as well given the volume.

Re: Hints folder missing in Cassandra

2018-02-07 Thread Nate McCall

> The environment is built using established images for Cassandra 3.10.
> Unfortunately the debug log does not indicate any errors before I start
> seeing the WARN for missing hints folder. I understand that hints file will
> be deleted after replay is complete, but not sure of the root cause of why
> the hints folder is getting deleted.
> When I look at the nodetool status or nodetool ring - it indicates that
> all nodes are up and running in normal state, no node went down. Also, I do
> not see anything the debug logs indicating that a node went down. In such a
> scenario, I am not sure why would HintsWriterExecutor would get triggered.
>
>
That error code (O_RDONLY) in the log message indicates that the hints
folder has had its permission bits set to read only.

We've had several issues with some of the tools doing this type of thing
when they are run as the root user. Is this specific node one on which you
use any of the tools like sstableloader or similar? If so, are you running
them as root?

Another thought - if it is on a different partition than the data
directory, is there free space left on the underlying device holding:
/var/lib/cassandra/hints?


-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Setting min_index_interval to 1?

2018-02-01 Thread Nate McCall

>
>
> Another was the crazy idea I started with of setting min_index_interval to
> 1. My guess was that this would cause it to read all index entries, and
> effectively have them all cached permanently. And it would read them
> straight out of the SSTables on every restart. Would this work? Other than
> probably causing a really long startup time, are there issues with this?
>
>
I've never tried that. It sounds like you understand the potential impact
on memory and startup time. If you have the data in such a way that you can
easily experiment, I would like to see a breakdown of the impact on
response time vs. memory usage as well as where the point of diminishing
returns is on turning this down towards 1 (I think there will be a sweet
spot somewhere).

Re: What happens if multiple processes send create table if not exist statement to cassandra?

2018-01-28 Thread Nate McCall

> Thanks a lot for that explanation Jeff!! I am trying to see if there is
> any JIRA ticket that talks about incorporating LWT in scenarios you
> mentioned?
>

https://issues.apache.org/jira/browse/CASSANDRA-10699

Re: Upgrade to 3.11.1 give SSLv2Hello is disabled error

2018-01-17 Thread Nate McCall

>
> We use Oracle jdk1.8.0_152 on all nodes and as I understand oracle use a
> dot in the protocol name (TLSv1.2) and I use the same protocol name and
> cipher names in the 3.0.14 nodes and the one I try to upgrade to 3.11.1.
>

I agree with Stefan's assessment and share his confusion. Would you be
willing to add the following to the startup options with the explicitly
configured "TLSv1.2" and post the results?
-Djavax.net.debug=ssl

That should provide additional detail on the SSL handshake.

Re: 3.0.15 or 3.11.1

2018-01-09 Thread Nate McCall

>
> Can you please provide dome JIRAs for superior fixes and performance
> improvements which are present in 3.11.1 but are missing in 3.0.15.
>
>
For the security conscious, CASSANDRA-11695 allows you to use Cassandra's
authentication and authorization to lock down JMX/nodetool access instead
of relying on per-node configuration.

-- 
-----
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: NVMe SSD benchmarking with Cassandra

2018-01-08 Thread Nate McCall

>
>
>
>
> In regards to setting read ahead, how is this set for nvme drives? Also,
> below is our compression settings for the table… It’s the same as our tests
> that we are doing against SAS SSDs so I don’t think the compression
> settings would be the issue…
>
>
>

Check blockdev --report between the old and the new servers to see if there
is a difference. Are there other deltas in the disk layouts between the old
and new servers (ie. LVM, mdadm, etc.)?

You can control read ahead via 'blockdev --setra' or via poking the kernel:
/sys/block/[YOUR DRIVE]/queue/read_ahead_kb

In both cases, changes are instantaneous so you can do it on a canary and
monitor for effect.

Also, i'd be curious to know (since you have this benchmark setup) if you
got the degradation you are currently seeing if you set concurrent_reads
and concurrent_writes back to their defaults.


-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Cassandra proxy to control read/write throughput

2017-10-29 Thread Nate McCall

The following presentation describes in detail a technique for using
coordinator-only nodes which will give you similar behavior (particularly
slides 12 to 14):
https://www.slideshare.net/DataStax/optimizing-your-
cluster-with-coordinator-nodes-eric-lubow-simplereach-cassandra-summit-2016

On Thu, Oct 26, 2017 at 12:07 PM, AI Rumman  wrote:

> Hi,
>
> I am using different versions of Casandra in my environment where I have
> 60 nodes are running for different applications. Each application is
> connecting to its own cluster. I am thinking about abstracting the
> Cassandra IP from app drivers.
> App will communicate to one proxy IP which will redirect traffic to
> appropriate Cassandra cluster. The reason behind this thinking is to merge
> multiple clusters and control the read/write throughput from proxy based on
> the application.
> If anyone knows about pg_bouncer for Postgresql, I am thinking something
> similar to that.
> Have anyone worked in such a project? Can you please share some idea?
>
> Thanks.
>



-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Understanding Messages in the Debug.log

2017-09-21 Thread Nate McCall

>
>
> The message in the debug log is
>
> DEBUG [GossipStage:1] 2017-09-21 09:19:52,627 FailureDetector.java:456 -
> Ignoring interval time of 2000275419
>
>
>
Did you truncate the log message? There should be and "for [endpoint]" on
the end which should help you narrow things down to a set of problem nodes.
I agree with Jeff in that this is most likely NTP sync issue or network
flap, though.

Re: system_auth replication factor in Cassandra 2.1

2017-08-30 Thread Nate McCall

Regardless, if you are not modifying users frequently (with five you most
likely are not), make sure turn the permission cache wyyy up.

In 2.1 that is just: permissions_validity_in_ms (default is 2000 or 2
seconds). Feel free to set it to 1 day or some such. The corresponding
async update parameter (permissions_update_interval_in_ms) can be set to a
slightly smaller value. If you really need to, you can drop the cache via
the "invalidate" operation on the
"org.apache.cassandra.auth:type=PermissionsCache" mbean (on each node) to
revoke a user for example.

In later versions, you would have to do the same with:
- roles_validity_in_ms
- credentials_validity_in_ms
and their corresponding 'interval' parameters.

Re: Cassandra All host(s) tried for query failed (no host was tried)

2017-08-30 Thread Nate McCall

If these app instances sit idle for a while, they might just be timing out
their sockets. You can tweak socket settings on the driver as described
here:
https://github.com/datastax/java-driver/tree/3.x/manual/socket_options

Perhaps start with explicitly setting keepAlive to true as that may or may
not be set depending on whether it's using the native epoll extension or
NIO directly (more details about such on the page above).

On Thu, Aug 31, 2017 at 3:10 AM, Ivan Iliev 
wrote:

> Hello everyone,
>
> We are using Cassandra 3.9 for storing quite a lot of data produced from
> our tester machines.
>
> Occasionally, we are seeing issues with apps not being able to communicate
> with Cassandra nodes, returning the following errors (captured in
> servicemix logs):
>
>>  by: com.datastax.driver.core.exceptions.NoHostAvailableException: All
>> host(s) tried for query failed (no host was tried)
>> at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(
>> RequestHandler.java:218)
>> at com.datastax.driver.core.RequestHandler.access$1000(
>> RequestHandler.java:43)
>> at com.datastax.driver.core.RequestHandler$SpeculativeExecution.
>> sendRequest(RequestHandler.java:284)
>> at com.datastax.driver.core.RequestHandler.startNewExecution(
>> RequestHandler.java:115)
>> at com.datastax.driver.core.RequestHandler.sendRequest(
>> RequestHandler.java:91)
>> at com.datastax.driver.core.SessionManager.executeAsync(
>> SessionManager.java:132)
>> ... 107 more
>
>
> As a result, apps that try to send data to cassandra get crashed due to
> running out of memory and we have to restart the containers in which they
> run.
>
> So far I have not been able to identify what might be the cause for this
> as nothing (at least I could not find anything relevant on the timestamps)
> in the cassandra debug and system logs.
>
> Could you share some insight on this ? What to check and where to start
> from , in order to troubleshoot this.
>
> Thanks !
> Ivan
>



-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Cassandra seems slow when having many read operations

2017-07-23 Thread Nate McCall

On Sat, Jul 15, 2017 at 6:37 AM, Felipe Esteves <
felipe.este...@b2wdigital.com> wrote:

>
> One point I've noticed, is that Opscenter show "OS: Disk Latency" max with
> high values when the problem occurs, but it doesn't reflect in server
> directly monitoring, in these tools the IO and latency of disks seems ok.
>

YMMV, but I've seen something like this due to an issue balancing IRQ
interrupts on older 3 series kernels. Check the output of 'cat
/proc/interrupts' and make sure the interrupts for the disks and network
driver(s) particularly are not contending. This article explains the issue
in detail (as well as how to fix it):
http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux

Re: Unbalanced cluster

2017-07-10 Thread Nate McCall

You wouldnt have a build file laying around for that, would you?

On Tue, Jul 11, 2017 at 3:23 PM, Nate McCall  wrote:

> On Tue, Jul 11, 2017 at 3:20 AM, Avi Kivity  wrote:
>
>>
>>
>>
>> [1] https://github.com/avikivity/shardsim
>>
>
> Avi, that's super handy - thanks for posting.
>



-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Unbalanced cluster

2017-07-10 Thread Nate McCall

On Tue, Jul 11, 2017 at 3:20 AM, Avi Kivity  wrote:

>
>
>
> [1] https://github.com/avikivity/shardsim
>

Avi, that's super handy - thanks for posting.

Re: commitlog_total_space_in_mb tuning

2017-07-09 Thread Nate McCall

>
>
> We're running with 128G memory and 30G heap size. Maybe it's good idea
> to increase the commitlog_total_space. On the other hand, even with  8G
> commitlog_total_space, replaying CL after restart takes more than 5
> minutes.
>
> In our case, the actual problem is it's causing lots of read repair
> timeouts as the repair mutations are dropped. Which causes Cassandra JVM
> hang or sometimes crash.
>

Do you have a mix of a small number of really heavily written to tables and
a larger number of tables with fewer writes?

One thing I've had success with when waitingOnSegmentAllocation spiked is
setting memtable_flush_period_in_ms on the less busy tables (obviously not
all the same so you don't flush storm). This seems to keep the
block-and-tackle CL rotation cleaner with fewer tables to flush.

Re: Definition of QUORUM consistency level

2017-06-08 Thread Nate McCall

> We have CL.TWO.
>
>
>
This was actually the original motivation for CL.TWO and CL.THREE if memory
serves:
https://issues.apache.org/jira/browse/CASSANDRA-2013

Re: Definition of QUORUM consistency level

2017-06-08 Thread Nate McCall

>
>
> So, for the quorum, what we really want is that there is one overlap among
>> the nodes in write path and read path. It actually was my assumption for a
>> long time that we need (N/2 + 1) for write and just need (N/2) for read,
>> because it's enough to provide the strong consistency.
>>
>
> You are write about ...
>
*right (lol!).

Re: Definition of QUORUM consistency level

2017-06-08 Thread Nate McCall

> So, for the quorum, what we really want is that there is one overlap among
> the nodes in write path and read path. It actually was my assumption for a
> long time that we need (N/2 + 1) for write and just need (N/2) for read,
> because it's enough to provide the strong consistency.
>

You are write about strong consistency with that calculation, but if I want
to issue a QUORUM read just by itself, I would expect a majority of nodes
to reply. How it was written might be immaterial to my use case of reading
'from a majority.'

-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Order by for aggregated values

2017-06-06 Thread Nate McCall

>
>
> My application is a real-time application. It monitors devices in the
> network and displays the top N devices for various parameters averaged over
> a time period. A query may involve anywhere from 10 to 50k devices, and
> anywhere from 5 to 2000 intervals. We expect a query to take less than 2
> seconds.
>
>
>
> My impression was that Spark is aimed at larger scale analytics.
>
>
>
> I am ok with the limitation on “group by”. I am intending to use async
> queries and token-aware load balancing to partition the query and execute
> it in parallel on each node.
>
>
>

This sounds a lot more like a use case for a streaming system (run in
parallel with Cassandra).

Apache Flink might be one avenue to explore - their Cassandra integration
works fine, btw.

A lot of folks are doing similar things with Apache Beam as well as it has
quite an elegant paradigm for the use case you describe, particularly if
you need to combine batching with streaming. (FYI, their "CassandraIO" is
about to be merged in master:
https://github.com/apache/beam/pull/592#issuecomment-306618338).


-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: hanging validation compaction

2017-04-13 Thread Nate McCall

terator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)
>> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowI
>> terator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
>> org.apache.cassandra.utils.AbstractIterator.hasNext(Abstract
>> Iterator.java:47)
>> org.apache.cassandra.utils.MergeIterator$Candidate.advance(
>> MergeIterator.java:374)
>> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(
>> MergeIterator.java:186)
>> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNe
>> xt(MergeIterator.java:155)
>> org.apache.cassandra.utils.AbstractIterator.hasNext(Abstract
>> Iterator.java:47)
>> org.apache.cassandra.db.rows.UnfilteredRowIterators$Unfilter
>> edRowMergeIterator.computeNext(UnfilteredRowIterators.java:500)
>> org.apache.cassandra.db.rows.UnfilteredRowIterators$Unfilter
>> edRowMergeIterator.computeNext(UnfilteredRowIterators.java:360)
>> org.apache.cassandra.utils.AbstractIterator.hasNext(Abstract
>> Iterator.java:47)
>> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
>> org.apache.cassandra.db.rows.UnfilteredRowIterators.digest(U
>> nfilteredRowIterators.java:178)
>> org.apache.cassandra.repair.Validator.rowHash(Validator.java:221)
>> org.apache.cassandra.repair.Validator.add(Validator.java:160)
>> org.apache.cassandra.db.compaction.CompactionManager.doValid
>> ationCompaction(CompactionManager.java:1364)
>> org.apache.cassandra.db.compaction.CompactionManager.access$
>> 700(CompactionManager.java:85)
>> org.apache.cassandra.db.compaction.CompactionManager$13.
>> call(CompactionManager.java:933)
>> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
>> threadLocalDeallocator$0(NamedThreadFactory.java:79)
>> org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$5/1371495133.run(Unknown
>> Source)
>> java.lang.Thread.run(Thread.java:745)
>>
>> On Thu, 2017-04-13 at 08:47 +0200, benjamin roth wrote:
>>
>> You should connect to the node with JConsole and see where the compaction
>> thread is stuck
>>
>> 2017-04-13 8:34 GMT+02:00 Roland Otta :
>>
>> hi,
>>
>> we have the following issue on our 3.10 development cluster.
>>
>> we are doing regular repairs with thelastpickle's fork of creaper.
>> sometimes the repair (it is a full repair in that case) hangs because
>> of a stuck validation compaction
>>
>> nodetool compactionstats gives me
>> a1bb45c0-1fc6-11e7-81de-0fb0b3f5a345 Validation  bds  ad_event
>> 805955242 841258085 bytes 95.80%
>> we have here no more progress for hours
>>
>> nodetool tpstats shows
>> alidationExecutor1 1  16186 0
>>0
>>
>> i checked the logs on the affected node and could not find any
>> suspicious errors.
>>
>> anyone that already had this issue and knows how to cope with that?
>>
>> a restart of the node helps to finish the repair ... but i am not sure
>> whether that somehow breaks the full repair
>>
>> bg,
>> roland
>>
>>
>>
>


-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: UNSUBSCRIBE

2017-04-11 Thread Nate McCall

To unsubscribe from this list, please send an email to
user-unsubscr...@cassandra.apache.org

Thanks!

On Wed, Apr 12, 2017 at 6:37 AM, Lawrence Turcotte <
lawrence.turco...@gmail.com> wrote:

> UNSUBSCRIBE
>

Re: Unsubscribe

2017-04-06 Thread Nate McCall

Hi John,
Please send an email to user-unsubscr...@cassandra.apache.org to
unsubscribe from this list.

On Fri, Apr 7, 2017 at 8:58 AM, John Buczkowski  wrote:

> *From:* eugene miretsky [mailto:eugene.miret...@gmail.com]
> *Sent:* Thursday, April 06, 2017 4:36 PM
> *To:* user@cassandra.apache.org
> *Subject:* Why are automatic anti-entropy repairs required when hinted
> hand-off is enabled?
>
>
>
> Hi,
>
>
>
> As I see it, if hinted handoff is enabled, the only time data can be
> inconsistent is when:
>
>1. A node is down for longer than the max_hint_window
>2. The coordinator node crushes before all the hints have been replayed
>
> Why is it still recommended to perform frequent automatic repairs, as well
> as enable read repair? Can't I just run a repair after one of the nodes is
> down? The only problem I see with this approach is a long repair job
> (instead of small incremental repairs). But other than that, are there any
> other issues/corner-cases?
>
>
>
> Cheers,
>
> Eugene
>

Re: [Cassandra 3.0.9] Cannot allocate memory

2017-03-22 Thread Nate McCall

On Thu, Mar 23, 2017 at 11:18 AM, Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in> wrote:

> JVM config is as below:
>
>
>
> -Xms16G
>
> -Xmx16G
>
> -Xmn3000M
>
>
>

I don't think it is the cause, but you need to remove Xmn when using G1GC.

Re: Scrubbing corrupted SStable.

2017-03-21 Thread Nate McCall

The snapshots are hard links on the file system, so everything is included.
You can use the "--no-snapshot" option to disable snapshots.

On Tue, Mar 21, 2017 at 5:01 PM, Pranay akula 
wrote:
>
> I am trying to scrub a Column family using nodetool scrub,  is it going
to create snapshots for sstables which are corrupted or for all the
sstables it is going to scrub ?? and to remove snapshots created does
running nodetool clearsnapshot is enough or do i need to manually delete
pre-scrub data from snapshots of that Column family ??
>
> I can see significant increase in Data after starting scrub.
>
>
>
>
> Thanks
> Pranay.




--
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: ONE has much higher latency than LOCAL_ONE

2017-03-21 Thread Nate McCall

On Wed, Mar 22, 2017 at 12:48 PM, Shannon Carey  wrote:
>
> The cluster is in two DCs, and yes the client is deployed locally to each
DC.

First off, what is the goal of using ONE instead of LOCAL_ONE? If it's
failover, this could be addressed with a RetryPolicy starting wth LOCAL_ONE
and falling back to ONE.

Are you using the ".withLocalDc" option in the DCAwareRoundRobinPolicy
builder? (It's been a while since I've gone through this in detail,
though). If you could provide a snippet that included the complete options
passed to the builder that might be helpful.

Also, check for the complete forms of these two logging messages on the app
side during startup (the second one is at INFO so adjust if needed):
"Some contact points don't match local data center. Local DC = {}.
Non-conforming contact points: {}"
"Using data-center name '{}' for DCAwareRoundRobinPolicy..."

Make sure those line up with the cluster topology and your expectations.

Actually, in typing that up, it may be more appropriate to move the
conversation over here since this is probably driver specific:
https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user

--
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: ONE has much higher latency than LOCAL_ONE

2017-03-21 Thread Nate McCall

On Wed, Mar 22, 2017 at 1:11 PM, Nate McCall  wrote:

>
>
> On Wed, Mar 22, 2017 at 12:48 PM, Shannon Carey 
> wrote:
> >
> > The cluster is in two DCs, and yes the client is deployed locally to
> each DC.
>
> First off, what is the goal of using ONE instead of LOCAL_ONE? If it's
> failover, this could be addressed with a RetryPolicy starting wth LOCAL_ONE
> and falling back to ONE.
>
>
Just read your previous thread about this. That's pretty un-intuitive and
counter to the way I remember that working (though admittedly, it's been a
while).

Do please open a thread on the driver mailing list, i'm curious about the
response.

Re: spikes in blocked native transport requests

2017-03-21 Thread Nate McCall

See the details on: https://issues.apache.org/jira/browse/CASSANDRA-11363

You may need to add -Dcassandra.max_queued_native_transport_requests=4096
as a startup parameter. YMMV though, I suggest reading through the above to
get a complete picture.

On Mon, Mar 20, 2017 at 11:10 PM, Roland Otta 
wrote:
>
> well. i checked it now.
>
> we have some stw collections from 100 to 200ms every 5 to 60 seconds.
> i am not sure whether the blocked threads are related to that but anyway
these pauses are too long for low latency applications.
>
> so i wil check gc tuning first and will check afterwards whether the
blocked threads still exist afterwards.
>
>
>
> On Mon, 2017-03-20 at 08:55 +0100, benjamin roth wrote:
>
> Did you check STW GCs?
> You can do that with 'nodetool gcstats', by looking at the gc.log or
observing GC related JMX metrics.
>
> 2017-03-20 8:52 GMT+01:00 Roland Otta :
>
> we have a datacenter which is currently used exlusively for spark batch
> jobs.
>
> in case batch jobs are running against that environment we can see very
> high peaks in blocked native transport requests (up to 10k / minute).
>
> i am concerned because i guess that will slow other queries (in case
> other applications are going to use that dc as well).
>
> i already tried increasing native_transport_max_threads +
> concurrent_reads without success.
>
> during the jobs i cant find any resource limitiations on my hardware
> (iops, disk usage, cpu, ... is fine).
>
> am i missing something? any suggestions how to cope with that?
>
> br//
> roland
>
>
>
>
>



--
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Grouping time series data into blocks of times

2017-03-19 Thread Nate McCall

I think you would be better served by using a streaming system like Apache
Flink (http://flink.apache.org) and checkpointing occasionally to
Cassandra.

This is a significant increase in complexity, but you are describing a
real-time streaming use case with the need for watermarking time windows
and Flink has that all built in.

Re: High disk io read load

2017-02-16 Thread Nate McCall

> - Node A has 512 tokens and Node B 256. So it has double the load (data).
> - Node A also has 2 SSDs, Node B only 1 SSD (according to load)
>

I very rarely see heterogeneous vnode counts in the same cluster. I would
almost guarantee you are the only one doing this with MVs as well.

That said, since you have different IO hardware, are you sure the system
configurations (eg. block size, read ahead, etc) are the same on both
machines? Is dstat showing a similar order of magnitude of network traffic
in vs. IO for what you would expect?


-- 
-----
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Cipher Suite Cassandra 2.1.14 Encryption

2017-01-04 Thread Nate McCall

Is AES-GCM supported in python by default? I have a vague recollection that
it is not (certainly possible my knowledge is outdated as well).

On Wed, Dec 21, 2016 at 10:21 AM, Jacob Shadix 
wrote:

> I was testing client encryption w/cqlsh and get the following error when
> using TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 as the cipher. Any ideas why?
>
> Last error: _ssl.c:492: EOF occurred in violation of protocol")})
> -- Jacob Shadix
>

-- 
-----
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: High CPU on nodes

2016-12-21 Thread Nate McCall

https://issues.apache.org/jira/browse/CASSANDRA-6908

Disable DynamicSnitch by adding the following to cassandra.yaml (it is a
not in the file by default):

dynamic_snitch: false



On Wed, Dec 21, 2016 at 8:40 AM, Anubhav Kale 
wrote:

> CIL
>
>
>
> *From:* Alain RODRIGUEZ [mailto:arodr...@gmail.com]
> *Sent:* Saturday, December 17, 2016 5:18 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: High CPU on nodes
>
>
>
> Hi,
>
>
>
> What does 'nodetool netstats' looks like on those nodes?
>
>
>
> *Its not doing any streaming.*
>
>
>
> we have 30GB heap
>
>
>
> How is the JVM / GC doing? Are you using G1GC or CMS? This setting would
> be bad for CMS.
>
>
>
> *G1. GC is doing fine. I don’t see any long pauses beyond 200 ms.*
>
>
>
> You can use this tool to understand were the CPU is being used
> https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#
> ttop-command
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faragozin%2Fjvm-tools%2Fblob%2Fmaster%2Fsjk-core%2FCOMMANDS.md%23ttop-command&data=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cab2c0fcf99a447694b0908d4267f3036%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636175775106811606&sdata=R%2FouOelExm1C3okjg9zEJsdlCiDRrhy8%2B9n3SIqC4fg%3D&reserved=0>
> .
>
>
>
> I hope that helps,
>
>
>
> C*heers,
>
> ---
>
> Alain Rodriguez - @arodream - al...@thelastpickle.com
>
> France
>
>
>
> The Last Pickle - Apache Cassandra Consulting
>
> http://www.thelastpickle.com
> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.thelastpickle.com&data=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cab2c0fcf99a447694b0908d4267f3036%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636175775106811606&sdata=kZPi%2B43OyWGNr%2FAmJsLflVOWkSMI0V7oK4x%2Ff%2FR27BU%3D&reserved=0>
>
>
>
>
>
>
>
> 2016-12-17 0:10 GMT+01:00 Anubhav Kale :
>
> Hello,
>
>
>
> I am trying to fight a high CPU problem on some of our nodes. Thread dumps
> show that it’s not GC threads (we have 30GB heap), iostat %iowait confirms
> it’s not disk (ranges between 0.3 – 0.9%). One of the ways in which the
> problem manifests is that the nodes can’t compact SSTables and it happens
> randomly. We run Cassandra 2.1.13 on Azure Premium Storage (network
> attached SSDs).
>
>
>
> One of the sample threads that was taking high CPU shows :
>
>
>
> "pool-13-thread-1" #3352
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.datastax.com%2Fhc%2Frequests%2F3352&data=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cab2c0fcf99a447694b0908d4267f3036%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636175775106811606&sdata=OP%2FepExQP5HyrBitvVlyjCj4cVXpB0zc8Oj5TWapduY%3D&reserved=0>
>  prio=5 os_prio=0 tid=0x7f2275340bb0 nid=0x1b0b runnable
> [0x7f33ffaae000]
> java.lang.Thread.State: RUNNABLE
> at java.util.TimSort.gallopRight(TimSort.java:632)
> at java.util.TimSort.mergeLo(TimSort.java:739)
> at java.util.TimSort.mergeAt(TimSort.java:514)
> at java.util.TimSort.mergeCollapse(TimSort.java:441)
> at java.util.TimSort.sort(TimSort.java:245)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1454)
> at java.util.Collections.sort(Collections.java:175)
> at org.apache.cassandra.locator.DynamicEndpointSnitch.
> sortByProximityWithScore(DynamicEndpointSnitch.java:163)
> at org.apache.cassandra.locator.DynamicEndpointSnitch.
> sortByProximityWithBadness(DynamicEndpointSnitch.java:200)
> at org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximity(
> DynamicEndpointSnitch.java:152)
> at org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(
> StorageProxy.java:1581)
> at org.apache.cassandra.service.StorageProxy.getRangeSlice(
> StorageProxy.java:1739)
>
>
>
> Looking at code, I can’t figure out why things like this would require a
> high CPU and I don’t find any JIRAs relating this as well. So, what can I
> do next to troubleshoot this ?
>
>
>
> Thanks !
>
>
>



-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Cassandra Encryption

2016-11-22 Thread Nate McCall

You should be using a root certificate for signing all the node
certificates to create a trust chain. That way nodes won't have to
explicitly know about each other, only the root certificate.

This post has some details:
http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html

On Tue, Nov 22, 2016 at 9:07 PM, Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> yes, I am generating separate certificate for each node.
> even if I use the same certificate how does it helps?
>
> On Mon, Nov 21, 2016 at 9:02 PM, Vladimir Yudovin 
> wrote:
>
>> Hi Jai,
>>
>> so do you generate separate certificate for each node? Why not use one
>> certificate for all nodes?
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>>  On Mon, 21 Nov 2016 17:25:11 -0500*Jai Bheemsen Rao Dhanwada
>> >* wrote 
>>
>> Hello,
>>
>> I am setting up encryption on one of my cassandra cluster using the below
>> procedure.
>>
>> server_encryption_options:
>> internode_encryption: all
>> keystore: /etc/keystore
>> keystore_password: x
>> truststore: /etc/truststore
>> truststore_password: x
>>
>> http://docs.oracle.com/javase/6/docs/technotes/guides/securi
>> ty/jsse/JSSERefGuide.html#CreateKeystore
>>
>> However, one difficulty with this approach is whenever I am adding a new
>> node I had to rolling restart all the C* nodes in the cluster, so that the
>> truststore is updated with the new server information.
>>
>> Is there a way to automatically trigger a reload so that the truststore
>> is updated on the existing machines without restart.
>>
>> Can someone please help ?
>>
>>
>>
>


-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Client-side timeouts after dropping table

2016-09-20 Thread Nate McCall

If you can get to them in the test env. you want to look in
o.a.c.metrics.CommitLog for:
- TotalCommitlogSize: if this hovers near commitlog_size_in_mb and never
goes down, you are thrashing on segment allocation
- WaitingOnCommit: this is the time spent waiting on calls to sync and will
start to climb real fast if you cant sync within sync_interval
- WaitingOnSegmentAllocation: how long it took to allocate a new commitlog
segment, if it is all over the place it is IO bound

Try turning all the commit log settings way down for low-IO test
infrastructure like this. Maybe total commit log size of like 32mb with 4mb
segments (or even lower depending on test data volume) so they basically
flush constantly and don't try to hold any tables open. Also lower
concurrent_writes substantially while you are at it to add some write
throttling.

On Wed, Sep 21, 2016 at 2:14 PM, John Sanda  wrote:

> I have seen in various threads on the list that 3.0.x is probably best for
> prod. Just wondering though if there is anything in particular in 3.7 to be
> weary of.
>
> I need to check with one of our QA engineers to get specifics on the
> storage. Here is what I do know. We have a blade center running lots of
> virtual machines for various testing. Some of those vm's are running
> Cassandra and the Java web apps I previously mentioned via docker
> containers. The storage is shared. Beyond that I don't have any more
> specific details at the moment. I can also tell you that the storage can be
> quite slow.
>
> I have come across different threads that talk to one degree or another
> about the flush queue getting full. I have been looking at the code in
> ColumnFamilyStore.java. Is perDiskFlushExecutors the thread pool I should
> be interested in? It uses an unbounded queue, so I am not really sure what
> it means for it to get full. Is there anything I can check or look for to
> see if writes are getting blocked?
>
> On Tue, Sep 20, 2016 at 8:41 PM, Jonathan Haddad 
> wrote:
>
>> If you haven't yet deployed to prod I strongly recommend *not* using 3.7.
>>
>>
>> What network storage are you using?  Outside of a handful of highly
>> experienced experts using EBS in very specific ways, it usually ends in
>> failure.
>>
>> On Tue, Sep 20, 2016 at 3:30 PM John Sanda  wrote:
>>
>>> I am deploying multiple Java web apps that connect to a Cassandra 3.7
>>> instance. Each app creates its own schema at start up. One of the schema
>>> changes involves dropping a table. I am seeing frequent client-side
>>> timeouts reported by the DataStax driver after the DROP TABLE statement is
>>> executed. I don't see this behavior in all environments. I do see it
>>> consistently in a QA environment in which Cassandra is running in docker
>>> with network storage, so writes are pretty slow from the get go. In my logs
>>> I see a lot of tables getting flushed, which I guess are all of the dirty
>>> column families in the respective commit log segment. Then I seen a whole
>>> bunch of flushes getting queued up. Can I reach a point in which too many
>>> table flushes get queued such that writes would be blocked?
>>>
>>>
>>> --
>>>
>>> - John
>>>
>>
>
>
> --
>
> - John
>



-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: What cipher suites are support in Cassandra 3.7 ?

2016-09-02 Thread Nate McCall

Your best bet is to use 256bit AES via "TLS_RSA_WITH_AES_256_CBC_SHA" since
that is (usually) hardware accelerated on recent CPUs.

The security page on the docs site has a lot of good information:
http://cassandra.apache.org/doc/latest/operating/security.html

The above contains a link to the following that is worth calling out
directly based on your question:
https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/FIPS.html

If you want to know more about the implementation, the config eventually is
passed through Netty's io.netty.handler.ssl.SslHandler (
https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/transport/Server.java#L367)
which is itself well documented regarding connection lifecycle:
https://netty.io/4.0/api/io/netty/handler/ssl/SslHandler.html


On Sat, Sep 3, 2016 at 10:44 AM, Eric Ho  wrote:
>
> I'm trying to enable SSL (internode + client).
> But I need to specify the suites but I don't know which ones are
supported by C*..
> Any pointers much appreciated.
> thx
>
> --
>
> -eric ho
>



--
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Issue in internode encryption in cassandra

2016-07-25 Thread Nate McCall

>
>
> I am using internode encryption in cassandra, with self signed CA it
works fine. but with other product CA m getting this error "Filtering out
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA as it
isnt supported by the socket”
>

You've specified ECDHE_RSA as the cypher. This is a new-ish cypher based on
elliptic curve cryptography and it may not be available to some
distributions. Run "openssl ciphers ECDH" on the node and the client to
ensure they both support that algorithm (my guess is one or the other
won't).

This article provides an excellent description of ECDH:
https://vincent.bernat.im/en/blog/2011-ssl-perfect-forward-secrecy.html#diffie-hellman-with-elliptic-curves

Unless you have a specific requirement, use "TLS_RSA_WITH_AES_256_CBC_SHA."

--
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Re : Recommended procedure for enabling SSL on a live production cluster

2016-07-20 Thread Nate McCall

If you migrate to the latest 2.1 first, you can make this a non-issue as
2.1.12 and above support simultaneous SSL and plain on the same port for
exactly this use case:
https://issues.apache.org/jira/browse/CASSANDRA-10559

On Thu, Jul 21, 2016 at 3:02 AM, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

> hi ;
>  if possible could someone shed some light on this. I followed a
> post from the lastpickle which was very informative, but we had some
> concerns when it came to enabling SSL on a live production cluster.
>
>
> http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html
>
> 1 : We generally remove application traffic from a DC which has ongoing
> changes, just not to affect end customers if things go south during the
> update.
>
> 2 : So once DC-A has been restarted after enabling SSL, this would be
> missing writes during that period, as the DC-A would be shown as down by
> the other DC's. We will not be able to put back application traffic on DC-A
> until we run inter-dc repairs, which will happen only  when SSL has been
> enabled on all DC's.
>
> 3 : Repeating the procedure for every DC will lead to some missed writes
> across all DC's.
>
> 4 : We could do the rolling restart of a DC-A with application traffic on,
> but we are concerned if for any infrastructure related reason we have an
> issue, we will have to serve traffic from another DC-B, which might be
> missing on writes to the DC-A during that period.
>
> We have 4 DC's which 50 nodes each.
>
>
> thanks
> Sai
>
> -- Forwarded message --
> From: sai krishnam raju potturi 
> Date: Mon, Jul 18, 2016 at 11:06 AM
> Subject: Re : Recommended procedure for enabling SSL on a live production
> cluster
> To: user@cassandra.apache.org
>
>
> Hi;
>   We have a Cassandra cluster ( version 2.0.14 ) spanning across 4
> datacenters with 50 nodes each. We are planning to enable SSL between the
> datacenters. We are following the standard procedure for enabling SSL (
> http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html)
> . We were planning to enable SSL for each datacenter at a time.
>
> During the rolling restart, it's expected that the nodes in the
> datacenter that had the service restarted, will show as down by the nodes
> in other datacenters that have not restarted the service. This would lead
> to missed writes among various nodes during this procedure.
>
> What would be the recommended procedure for enabling SSL on a live
> production cluster without the chaos.
>
> thanks
> Sai
>
>


-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Question about hector api documentation

2016-06-25 Thread Nate McCall

> I used to be surprised that people still ask about Hector here; and that
> questions here on Hector always seem to mirror new Hector questions on
> Stack Overflow.  The problem (I think), is that places like Edureka! are
> still charging people $300 for a Cassandra training class, where they still
> actively teach people to use Hector:
>
> http://www.edureka.co/cassandra-course-curriculum
>
>
I was wondering where these kept coming from...

I shut that project down a year ago and had not taken a commit of any
substance for three years.

+1 on the Java-Driver with either:
- the object mapper module:
https://github.com/datastax/java-driver/tree/3.0/manual/object_mapper
- or Achilles: https://github.com/doanduyhai/Achilles

-- 
-----
Nate McCall
Austin, TX
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Lot's of hints, but only on a few nodes

2016-05-10 Thread Nate McCall

The most immediate work-around would be to nodetool disablehints around the
cluster before you load data. This would stop it snowballing from hints at
least.


On Tue, May 10, 2016 at 7:49 AM, Erik Forsberg  wrote:

> I have this situation where a few (like, 3-4 out of 84) nodes misbehave.
> Very long GC pauses, dropping out of cluster etc.
>
> This happens while loading data (via CQL), and analyzing metrics it looks
> like on these few nodes, a lot of hints are being generated close to the
> time when they start to misbehave.
>
> Since this is Cassandra 2.0.13 which have a less than optimal hints
> implementation, largs numbers of hints is a GC troublemaker.
>
> Again looking at metrics, it looks like hints are being generated for a
> large number of nodes, so it doesn't look like the destination nodes are at
> fault. So, I'm confused.
>
> Any Hints (pun intended) on what could cause a few nodes to generate more
> hints than the rest of the cluster?
>
> Regards,
> \EF
>



-- 
-
Nate McCall
Austin, TX
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: tuning repairs and compaction options

2016-05-06 Thread Nate McCall

>
> Hi, we are running a 9 node cluster under load. The nodes are running in
> EC2 on i2.2xlarge instances. Cassandra version is 2.2.4. One node was down
> yesterday for more than 3 hours. So we manually started an incremental
> repair this morning via nodetool (anti-entropy repair?)
>
> What we can see is that user CPU on that node goes up to over 95% and also
> goes up on all other nodes. Also the number of SSTables is exploding, I
> guess due to anticompaction.
>

You might be seeing https://issues.apache.org/jira/browse/CASSANDRA-10342
(fixed in 2.2.6).


>
> What are my tuning options to have a more gentle repair behaviour? Which
> settings should I look at if I want CPU to stay below 50% for instance. My
> worry is always to impact the read/write performance during times when we
> do anti-entropy repairs.
>

+1 on cassandra_range_repair script.




-- 
-
Nate McCall
Austin, TX
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: In memory code and query executions

2016-05-04 Thread Nate McCall

On Mon, May 2, 2016 at 11:04 AM, Corry Opdenakker  wrote:

> Hi all,
>
> Is it possible to execute queries towards an embedded cassandra db whyle
> bypassing completely the TCP (or IPC) protocol stack?
>

tl,dr: it is not for the faint of heart and you must understand *exactly*
what you are doing.

First I have to ask is there something specific that is not working the way
you anticipate?

Short answer is yes, though:
https://github.com/apache/cassandra/tree/cassandra-2.1/examples/client_only

This was removed in > 2.1 because very few people were using it and it was
confusing to have there as an "example."

I would not call this embedded so much as running a "client-mode proxy" but
same idea.

Apparantly the embedded cassandra is by default accessed using localhost as
> hostname which will result in an IPC optimized connection I assume.
>

Not quite sure what you mean here?


> Is there a way to fully omit the Tcp/ipc stack and execute queries
> directly in-memory at the cassandra database? preferrably in a (query
> resultset -> to -> appcode) zero-copy approach.
>
>
Again, yes per the link above, but you would need to modify a few things
for recent versions. The general approach is there however. You could even
go a level below QueryProcessor and invoke methods on StorageProxy
directly, bypassing the parse/PS lookup.

That all said, you need to understand:
- These are all internal APIs and as such can and will change substantially
without warning even between point releases
- Understanding the internals to use them correctly at this level requires
a deep understanding of the code base
- You will be bypassing a substantial amount of validation and could easily
insert data that will corrupt your table
- You can potentially put a lot more pressure on portions of the system
that anticipate upstream throttling

In sum: it's possible, but put something in production first using standard
APIs before you go this deep. This is not the level at which you want to
write your first app against Cassandra.


-- 
-
Nate McCall
Austin, TX
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: nodetool -h fails Connection refused

2016-04-20 Thread Nate McCall

You need to set LOCAL_JMX=false

It will then read the rest of this stanza:
https://github.com/apache/cassandra/blob/cassandra-2.2/conf/cassandra-env.sh#L284-L288

Using the defaults above as-is, you will need to add JMX authentication.
Details are here:
https://docs.datastax.com/en/cassandra/2.2/cassandra/configuration/secureNodetoolSSL.html

A lot of this can be controlled with system properties as well:
http://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html

The default config files for JMX authentication and access included in the
JVM also have extensive details in the comments:
$JAVA_HOME/jre/lib/management/jmxremote.access
$JAVA_HOME/jre/lib/management/jmxremote.password.template



On Tue, Apr 19, 2016 at 8:40 PM, Alaa Zubaidi (PDF) 
wrote:

> Hi,
>
> I am trying to run nodetool remotely. but its not working:
> I am running Cassandra 2.2.5 on CentOS 6.
> listen_address: is set to 
> rpc_address: is set to 0.0.0.0
> broadcast_rpc_address: is set to 
>
> I changed the following in cassadnra-env.sh
> JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname="
> -Dcom.sun.management.jmxremote.port=7199
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
>
> "nodetool -h  -p  status" results in:
> failed to connect to 'hostname' - Connection Exception: 'Connection
> refused"
>
> netstat -nl | grep 7199
> tcp00127.0.0.1:71990.0.0.0:*LISTEN
>
> ONLY "nodetool -h localhost" works
>
> Any idea how to fix it?
>
> Thanks,
> Alaa
>
>
> *This message may contain confidential and privileged information. If it
> has been sent to you in error, please reply to advise the sender of the
> error and then immediately permanently delete it and all attachments to it
> from your systems. If you are not the intended recipient, do not read,
> copy, disclose or otherwise use this message or any attachments to it. The
> sender disclaims any liability for such unauthorized use. PLEASE NOTE that
> all incoming e-mails sent to PDF e-mail accounts will be archived and may
> be scanned by us and/or by external service providers to detect and prevent
> threats to our systems, investigate illegal or inappropriate behavior,
> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
> concerns about this process, please contact us at *
> *legal.departm...@pdf.com* *.*




-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Experience with Kubernetes

2016-04-14 Thread Nate McCall

> Does anybody here have any experience, positive or negative, with
> deploying Cassandra (or DSE) clusters using Kubernetes? I don't have any
> immediate need (or experience), but I am curious about the pros and cons.
>
>

The last time I played around with kubernetes+cassandra, you could not
specify node allocations across failure boundaries (AZs, Regions, etc).

To me, that makes it not interesting outside of development or trivial
setups.

It does look like they are getting farther along on "ubernetes" which
should fix this:
https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/federation.md



-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Unexplainably large reported partition sizes

2016-03-07 Thread Nate McCall

>
>
> Rob, can you remember which bug/jira this was? I have not been able to
> find it.
> I'm using 2.1.9.
>
>
https://issues.apache.org/jira/browse/CASSANDRA-7953

Rob may have a different one, but I've something similar from this issue.
Fixed in 2.1.12.


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Unexpected high internode network activity

2016-02-26 Thread Nate McCall

>
>
> Unfortunately, these numbers still don't match at all.
>
> And yes, the cluster is in a single DC and since I am using the EC2
> snitch, replicas are AZ aware.
>
>
Are repairs running on the cluster?

Other thoughts:
- is internode_compression set to 'all' in cassandra.yaml (should be 'all'
by default, but worth checking since you are using lz4 on the client)?
- are you using server-to-server encryption ?

You can compare the output of nodetool netstats on the test cluster with
the AWS cluster as well to see if anything sticks out.


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Debugging write timeouts on Cassandra 2.2.5

2016-02-24 Thread Nate McCall

> Testing the same write path using CQL writes instead demonstrates similar
behavior.

Was this via Java-Driver or the thrift execute_cql_query?

If the latter, what happens when you change the rpc_server_type to sync?

This line in tpstats is real weird:
https://gist.github.com/mheffner/a979ae1a0304480b052a#file-tpstats-out-L22

On Wed, Feb 24, 2016 at 6:04 PM, Mike Heffner  wrote:
>
> Nate,
>
> So we have run several install tests, bisecting the 2.1.x release line,
and we believe that the regression was introduced in version 2.1.5. This is
the first release that clearly hits the timeout for us.
>
> It looks like quite a large release, so our next step will likely be
bisecting the major commits to see if we can narrow it down:
https://github.com/apache/cassandra/blob/3c0a337ebc90b0d99349d0aa152c92b5b3494d8c/CHANGES.txt.
Obviously, any suggestions on potential suspects appreciated.
>
> These are the memtable settings we've configured diff from the defaults
during our testing:
>
> memtable_allocation_type: offheap_objects
> memtable_flush_writers: 8
>
>
> Cheers,
>
> Mike
>
> On Fri, Feb 19, 2016 at 1:46 PM, Nate McCall 
wrote:
>>
>> The biggest change which *might* explain your behavior has to do with
the changes in memtable flushing between 2.0 and 2.1:
>> https://issues.apache.org/jira/browse/CASSANDRA-5549
>>
>> However, the tpstats you posted shows no dropped mutations which would
make me more certain of this as the cause.
>>
>> What values do you have right now for each of these (my recommendations
for each on a c4.2xl with stock cassandra-env.sh are in parenthesis):
>>
>> - memtable_flush_writers (2)
>> - memtable_heap_space_in_mb  (2048)
>> - memtable_offheap_space_in_mb (2048)
>> - memtable_cleanup_threshold (0.11)
>> - memtable_allocation_type (offheap_objects)
>>
>> The biggest win IMO will be moving to offheap_objects. By default,
everything is on heap. Regardless, spending some time tuning these for your
workload will pay off.
>>
>> You may also want to be explicit about
>>
>> - native_transport_max_concurrent_connections
>> - native_transport_max_concurrent_connections_per_ip
>>
>> Depending on the driver, these may now be allowing 32k streams per
connection(!) as detailed in v3 of the native protocol:
>>
https://github.com/apache/cassandra/blob/cassandra-2.1/doc/native_protocol_v3.spec#L130-L152
>>
>>
>>
>> On Fri, Feb 19, 2016 at 8:48 AM, Mike Heffner  wrote:
>>>
>>> Anuj,
>>>
>>> So we originally started testing with Java8 + G1, however we were able
to reproduce the same results with the default CMS settings that ship in
the cassandra-env.sh from the Deb pkg. We didn't detect any large GC pauses
during the runs.
>>>
>>> Query pattern during our testing was 100% writes, batching (via Thrift
mostly) to 5 tables, between 6-1500 rows per batch.
>>>
>>> Mike
>>>
>>> On Thu, Feb 18, 2016 at 12:22 PM, Anuj Wadehra 
wrote:
>>>>
>>>> Whats the GC overhead? Can you your share your GC collector and
settings ?
>>>>
>>>>
>>>> Whats your query pattern? Do you use secondary indexes, batches, in
clause etc?
>>>>
>>>>
>>>> Anuj
>>>>
>>>>
>>>> Sent from Yahoo Mail on Android
>>>>
>>>> On Thu, 18 Feb, 2016 at 8:45 pm, Mike Heffner
>>>>  wrote:
>>>> Alain,
>>>>
>>>> Thanks for the suggestions.
>>>>
>>>> Sure, tpstats are here:
https://gist.github.com/mheffner/a979ae1a0304480b052a. Looking at the
metrics across the ring, there were no blocked tasks nor dropped messages.
>>>>
>>>> Iowait metrics look fine, so it doesn't appear to be blocking on disk.
Similarly, there are no long GC pauses.
>>>>
>>>> We haven't noticed latency on any particular table higher than others
or correlated around the occurrence of a timeout. We have noticed with
further testing that running cassandra-stress against the ring, while our
workload is writing to the same ring, will incur similar 10 second
timeouts. If our workload is not writing to the ring, cassandra stress will
run without hitting timeouts. This seems to imply that our workload pattern
is causing something to block cluster-wide, since the stress tool writes to
a different keyspace then our workload.
>>>>
>>>> I mentioned in another reply that we've tracked it to something
between 2.0.x and 2.1.x, so we are focusing on narrowing which point
release it was introduced in.
>>>>
>>>> Cheers,
>>>>
>>>> Mike

Re: Debugging write timeouts on Cassandra 2.2.5

2016-02-19 Thread Nate McCall

est_timeout_in_ms (10 seconds). CPU across
>>>>>> the cluster
>>>>>> >>>> is < 10% and EBS write load is < 100 IOPS. Cassandra is running
>>>>>> with the
>>>>>> >>>> Oracle JDK 8u60 and we're using G1GC and any GC pauses are less
>>>>>> than 500ms.
>>>>>> >>>>
>>>>>> >>>> We run on c4.2xl instances with GP2 EBS attached storage for
>>>>>> data and
>>>>>> >>>> commitlog directories. The nodes are using EC2 enhanced
>>>>>> networking and have
>>>>>> >>>> the latest Intel network driver module. We are running on HVM
>>>>>> instances
>>>>>> >>>> using Ubuntu 14.04.2.
>>>>>> >>>>
>>>>>> >>>> Our schema is 5 tables, all with COMPACT STORAGE. Each table is
>>>>>> similar
>>>>>> >>>> to the definition here:
>>>>>> >>>> https://gist.github.com/mheffner/4d80f6b53ccaa24cc20a
>>>>>> >>>>
>>>>>> >>>> This is our cassandra.yaml:
>>>>>> >>>>
>>>>>> https://gist.github.com/mheffner/fea80e6e939dd483f94f#file-cassandra-yaml
>>>>>> >>>>
>>>>>> >>>> Like I mentioned we use 8u60 with G1GC and have used many of the
>>>>>> GC
>>>>>> >>>> settings in Al Tobey's tuning guide. This is our upstart config
>>>>>> with JVM and
>>>>>> >>>> other CPU settings:
>>>>>> https://gist.github.com/mheffner/dc44613620b25c4fa46d
>>>>>> >>>>
>>>>>> >>>> We've used several of the sysctl settings from Al's guide as
>>>>>> well:
>>>>>> >>>> https://gist.github.com/mheffner/ea40d58f58a517028152
>>>>>> >>>>
>>>>>> >>>> Our client application is able to write using either Thrift
>>>>>> batches
>>>>>> >>>> using Asytanax driver or CQL async INSERT's using the Datastax
>>>>>> Java driver.
>>>>>> >>>>
>>>>>> >>>> For testing against Thrift (our legacy infra uses this) we write
>>>>>> batches
>>>>>> >>>> of anywhere from 6 to 1500 rows at a time. Our p99 for batch
>>>>>> execution is
>>>>>> >>>> around 45ms but our maximum (p100) sits less than 150ms except
>>>>>> when it
>>>>>> >>>> periodically spikes to the full 10seconds.
>>>>>> >>>>
>>>>>> >>>> Testing the same write path using CQL writes instead demonstrates
>>>>>> >>>> similar behavior. Low p99s except for periodic full timeouts. We
>>>>>> enabled
>>>>>> >>>> tracing for several operations but were unable to get a trace
>>>>>> that completed
>>>>>> >>>> successfully -- Cassandra started logging many messages as:
>>>>>> >>>>
>>>>>> >>>> INFO  [ScheduledTasks:1] - MessagingService.java:946 - _TRACE
>>>>>> messages
>>>>>> >>>> were dropped in last 5000 ms: 52499 for internal timeout and 0
>>>>>> for cross
>>>>>> >>>> node timeout
>>>>>> >>>>
>>>>>> >>>> And all the traces contained rows with a "null" source_elapsed
>>>>>> row:
>>>>>> >>>>
>>>>>> https://gist.githubusercontent.com/mheffner/1d68a70449bd6688a010/raw/0327d7d3d94c3a93af02b64212e3b7e7d8f2911b/trace.out
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> We've exhausted as many configuration option permutations that
>>>>>> we can
>>>>>> >>>> think of. This cluster does not appear to be under any
>>>>>> significant load and
>>>>>> >>>> latencies seem to largely fall in two bands: low normal or max
>>>>>> timeout. This
>>>>>> >>>> seems to imply that something is getting stuck and timing out at
>>>>>> the max
>>>>>> >>>> write timeout.
>>>>>> >>>>
>>>>>> >>>> Any suggestions on what to look for? We had debug enabled for
>>>>>> awhile but
>>>>>> >>>> we didn't see any msg that pointed to something obvious. Happy
>>>>>> to provide
>>>>>> >>>> any more information that may help.
>>>>>> >>>>
>>>>>> >>>> We are pretty much at the point of sprinkling debug around the
>>>>>> code to
>>>>>> >>>> track down what could be blocking.
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> Thanks,
>>>>>> >>>>
>>>>>> >>>> Mike
>>>>>> >>>>
>>>>>> >>>> --
>>>>>> >>>>
>>>>>> >>>>   Mike Heffner 
>>>>>> >>>>   Librato, Inc.
>>>>>> >>>>
>>>>>> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >>
>>>>>> >>   Mike Heffner 
>>>>>> >>   Librato, Inc.
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> >
>>>>>> >   Mike Heffner 
>>>>>> >   Librato, Inc.
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Close the World, Open the Net
>>>>>> http://www.linux-wizard.net
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>   Mike Heffner 
>>>>   Librato, Inc.
>>>>
>>>>
>>>
>>
>>
>> --
>>
>>   Mike Heffner 
>>   Librato, Inc.
>>
>>
>
>
> --
>
>   Mike Heffner 
>   Librato, Inc.
>
>


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Back to the futex()? :(

2016-02-09 Thread Nate McCall

I noticed you have authentication enabled. Make sure you set the following:

- the replication factor for the system_auth keyspace should equal the
number of nodes
- permissions_validity_in_ms is a permission cache timeout. If you are not
doing dynamic permissions or creating/revoking frequently, turn this WAY up

May not be the immediate reason, but the above are definitely not helping
if set at defaults.

On Sat, Feb 6, 2016 at 6:49 PM, Will Hayworth 
wrote:

> Additionally: this isn't the futex_wait bug (or at least it shouldn't
> be?). Amazon says
> <https://forums.aws.amazon.com/thread.jspa?messageID=623731> that was
> fixed several kernel versions before mine, which
> is 4.1.10-17.31.amzn1.x86_64. And the reason my heap is so large is
> because, per CASSANDRA-9472, we can't use offheap until 3.4 is released.
>
> Will
>
> ___
> Will Hayworth
> Developer, Engagement Engine
> Atlassian
>
> My pronoun is "they". <http://pronoun.is/they>
>
>
>
> On Sat, Feb 6, 2016 at 3:28 PM, Will Hayworth 
> wrote:
>
>> *tl;dr: other than CAS operations, what are the potential sources of lock
>> contention in C*?*
>>
>> Hi all! :) I'm a novice Cassandra and Linux admin who's been preparing a
>> small cluster for production, and I've been seeing something weird. For
>> background: I'm running 3.2.1 on a cluster of 12 EC2 m4.2xlarges (32 GB
>> RAM, 8 HT cores) backed by 3.5 TB GP2 EBS volumes. Until late yesterday,
>> that was a cluster of 12 m4.xlarges with 3 TB volumes. I bumped it because
>> while backloading historical data I had been seeing awful throughput (20K
>> op/s at CL.ONE). I'd read through Al Tobey's *amazing* C* tuning guide
>> <https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html> once
>> or twice before but this time I was careful and fixed a bunch of defaults
>> that just weren't right, in cassandra.yaml/JVM options/block device
>> parameters. Folks on IRC were super helpful as always (hat tip to Jeff
>> Jirsa in particular) and pointed out, for example, that I shouldn't be
>> using DTCS for loading historical data--heh. After changing to LTCS,
>> unbatching my writes* and reserving a CPU core for interrupts and fixing
>> the clocksource to TSC, I finally hit 80K early this morning. Hooray! :)
>>
>> Now, my question: I'm still seeing a *ton* of blocked processes in the
>> vmstats, anything from 2 to 9 per 10 second sample period--and this is
>> before EBS is even being hit! I've been trying in vain to figure out what
>> this could be--GC seems very quiet, after all. On Al's page's advice, I've
>> been running strace and, indeed, I've been seeing *tens of thousands of
>> futex() calls* in periods of 10 or 20 seconds. What eludes me is *where* this
>> lock contention is coming from. I'm not using LWTs or performing CAS
>> operations of which I'm aware. Assuming this isn't a red herring, what
>> gives?
>>
>> Sorry for the essay--I just wanted to err on the side of more
>> context--and *thank you* for any advice you'd like to offer,
>> Will
>>
>> P.S. More background if you'd like--I'm running on Amazon Linux 2015.09,
>> using jemalloc 3.6, JDK 1.8.0_65-b17. Here <http://pastebin.com/kuhBmHXG> is
>> my cassandra.yaml and here <http://pastebin.com/fyXeTfRa> are my JVM
>> args. I realized I neglected to adjust memtable_flush_writers as I was
>> writing this--so I'll get on that. Aside from that, I'm not sure what to
>> do. (Thanks, again, for reading.)
>>
>> * They were batched for consistency--I'm hoping to return to using them
>> when I'm back at normal load, which is tiny compared to backloading, but
>> the impact on performance was eye-opening.
>> ___
>> Will Hayworth
>> Developer, Engagement Engine
>> Atlassian
>>
>> My pronoun is "they". <http://pronoun.is/they>
>>
>>
>>
>


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-01-29 Thread Nate McCall

On Fri, Jan 29, 2016 at 12:30 PM, Peddi, Praveen  wrote:
>
> Hello,
> We have another update on performance on 2.1.11. compression_chunk_size
 didn’t really help much but We changed concurrent_compactors from default
to 64 in 2.1.11 and read latencies improved significantly. However, 2.1.11
read latencies are still 1.5 slower than 2.0.9. One thing we noticed in JMX
metric that could affect read latencies is that 2.1.11 is running
ReadRepairedBackground and ReadRepairedBlocking too frequently compared to
2.0.9 even though our read_repair_chance is same on both. Could anyone shed
some light on why 2.1.11 could be running read repair 10 to 50 times more
in spite of same configuration on both clusters?
>
> dclocal_read_repair_chance=0.10 AND
> read_repair_chance=0.00 AND
>
> Here is the table for read repair metrics for both clusters.
> 2.0.9 2.1.11
> ReadRepairedBackground 5MinAvg 0.006 0.1
> 15MinAvg 0.009 0.153
> ReadRepairedBlocking 5MinAvg 0.002 0.55
> 15MinAvg 0.007 0.91

The concurrent_compactors setting is not a surprise. The default in 2.0 was
the number of cores and in 2.1 is now:
"the smaller of (number of disks, number of cores), with a minimum of 2 and
a maximum of 8"
https://github.com/apache/cassandra/blob/cassandra-2.1/conf/cassandra.yaml#L567-L568

So in your case this was "8" in 2.0 vs. "2" in 2.1 (assuming these are
still the stock-ish c3.2xl mentioned previously?). Regardless, 64 is way to
high. Set it back to 8.

Note: this got dropped off the "Upgrading" guide for 2.1 in
https://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt though, so
lots of folks miss it.

Per said upgrading guide - are you sure the data directory is in the same
place between the two versions and you are not pegging the wrong
disk/partition? The default locations changed for data, cache and commitlog:
https://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt#L171-L180

I ask because being really busy on a single disk would cause latency and
potentially dropped messages which could eventually cause a
DigestMismatchException requiring a blocking read repair.

Anything unusual in the node-level IO activity between the two clusters?

That said, the difference in nodetool tpstats output during and after on
both could be insightful.

When we do perf tests internally we usually use a combination of Grafana
and Riemann to monitor Cassandra internals, the JVM and the OS. Otherwise,
it's guess work.

--
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Cassandra Connection Pooling

2016-01-28 Thread Nate McCall

On Thu, Jan 28, 2016 at 4:31 PM, KAMM, BILL  wrote:

> Hi, I’m looking for some good info on connection pooling, using JBoss.  Is
> this something that needs to be configured within JBoss, or is it handled
> directly by the Cassandra classes themselves?  Thanks.
>
>
>
>
>


This thread was on the Java-Driver list recently - it may answer some of
your questions:
https://groups.google.com/a/lists.datastax.com/forum/m/#!topic/java-driver-user/-im4eN_yZbA




-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Nate McCall

>
>
> In this case, 99% of my data could fit in a single 50 MB partition. But if
> I use the standard approach, I have to split my partitions into 50 pieces
> to accommodate the largest data. That means that to query the 700 rows for
> my median case, I have to read 50 partitions instead of one.
>
> If you try to deal with this by starting a new partition when an old one
> fills up, you have a nasty distributed consensus problem, along with
> read-before-write. Cassandra LWT wasn't available the last time I dealt
> with this, but might help with the consensus part today. But there are
> still some nasty corner cases.
>
> I have some thoughts on other ways to solve this, but they all have
> drawbacks. So I thought I'd ask here and hope that someone has a better
> approach.
>
>
Hi Jim - good to see you around again.

If you can segment this upstream by customer/account/whatever, handling the
outliers as an entirely different code path (potentially different cluster
as the workload will be quite different at that point and have different
tuning requirements) would be your best bet. Then a read-before-write makes
sense given it is happening on such a small number of API queries.


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: compaction_throughput_mb_per_sec

2016-01-04 Thread Nate McCall

>
>> Also, as I increase my node count, I technically also have to increase my
>> compaction_throughput which would require a rolling restart across the
>> cluster.
>>
>>
> You can set compaction throughput on each node dynamically via nodetool
> setcompactionthroughput.
>
>
>
Also, the IOPS generated by your worklaod and the efficiency of the JVM
with such are what should drive compaction throughput settings. Raw node
count is orthogonal.

Re: compaction_throughput_mb_per_sec

2016-01-04 Thread Nate McCall

>
>
> Also, as I increase my node count, I technically also have to increase my
> compaction_throughput which would require a rolling restart across the
> cluster.
>
>
You can set compaction throughput on each node dynamically via nodetool
setcompactionthroughput.


-- 
-----
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Cassandra stalls and dropped messages not due to GC

2015-11-02 Thread Nate McCall

>
>
> Forgive me, but what is CMS?
>

Sorry - ConcurrentMarkSweep garbage collector.


>
> No. I’ve tried some mitigations since tuning thread pool sizes and GC, but
> the problem begins with only an upgrade of Cassandra. No other system
> packages, kernels, etc.
>
>
>
>From what 2.0 version did you upgrade? If it was < 2.0.7, you would need to
run 'nodetool upgradesstables'  but I'm not sure the issue would manifest
that way. Otherwise, double check the DSE release notes and upgrade guide.
I've not had any issues like this going from 2.0.x to 2.1.x on vanilla C*.



-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Cassandra stalls and dropped messages not due to GC

2015-10-30 Thread Nate McCall

Does tpstats show unusually high counts for blocked flush writers?

As Sebastian suggests, running ttop will paint a clearer picture about what
is happening within C*. I would however recommend going back to CMS in this
case as that is the devil we all know and more folks will be able to offer
advice on seeing its output (and it removes a delta).


> It’s starting to look to me like it’s possibly related to brief IO spikes
> that are smaller than my usual graphing granularity. It feels surprising to
> me that these would affect the Gossip threads, but it’s the best current
> lead I have with my debugging right now. More to come when I learn it.
>

Probably not the case since this was a result of an upgrade, but I've seen
similar behavior on systems where some kernels had issues with irqbalance
doing the right thing and would end up parking most interrupts on CPU0
(like say for the disk and ethernet modules) regardless of the number of
cores. Check out proc via 'cat /proc/interrupts' and make sure the
interrupts are spread out of CPU cores. You can steer them off manually at
runtime if they are not spread out.

Also, did you upgrade anything besides Cassandra?


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: SSTables are not getting removed

2015-10-30 Thread Nate McCall

>
>
> memtable_offheap_space_in_mb: 4096
>
> memtable_cleanup_threshold: 0.99
>

^ What led to this setting? You are basically telling Cassandra to not
flush the highest-traffic memtable until the memtable space is 99% full.
With that many tables and keyspaces, you are basically locking up
everything on the flush queue, causing substantial back pressure. If you
run 'nodetool tpstats' you will probably see a massive number of 'All Time
Blocked' for FlushWriter and 'Dropped' for Mutations.

Actually, this is probably why you are seeing a lot of small tables: commit
log segments are being filled and blocked from flushing due to the above,
so they have to attempt to flush repeatedly with whatever is there whenever
they get the chance.

thrift_framed_transport_size_in_mb: 150
>

^ This is also a super bad idea. Thrift buffers grow as needed to
accomodate larger results, but they dont ever shrink. This will lead to a
bunch of open connections holding onto large, empty byte arrays. This will
show up immediately in a heap dump inspection.


> concurrent_compactors: 4
>
> compaction_throughput_mb_per_sec: 0
>
> endpoint_snitch: GossipingPropertyFileSnitch
>
>
>
> This grinds our system to a halt and causes a major GC nearly every second.
>
>
>
> So far the only way to get around this is to run a cron job every hour
> that does a “nodetool compact”.
>

What's the output of 'nodetool compactionstats'? CASSANDRA-9882
and CASSANDRA-9592 could be to blame (both fixed in recent versions) or
this could just be a side effect of the memory pressure from the above
settings.

Start back at the default settings (except snitch - GPFS is always a good
place to start) and change settings serially and in small increments based
on feedback gleaned from monitoring runtimes.


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: memtable flush size with LCS

2015-10-28 Thread Nate McCall

>  do you mean that this property is ignored at memtable flush time, and so
> memtables are already allowed to be much larger than sstable_size_in_mb?
>

Yes, 'sstable_size_in_mb' plays no part in the flush process. Flushing is
based on solely on runtime activity and the file size is determined by
whatever was in the memtable at that time.



-- 
---------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: cassandra bootstrapping

2015-10-28 Thread Nate McCall

>
>
> Considering that this concerns mostly auto-bootstrap and seed nodes by
> default do not do that,
> would it be correct to assume that they could be started in parallel and
> then the non-seed should be
> added with the interval apart.
>
> As in seeds-start -> wait -> add non-seed -> wait -> add non-seed.
>
> Would that timeout be only between seeds started and non-seed, or every
> non-seed will have to be
> started serially with wait in between.
>
>
It would be more like start each seed one at a time then start everyone
else. Then it is just 2 minutes after the last node joins.

In other words starting a six node cluster would not be that much faster
than starting a 100 node cluster if each had 3 seeds (I'm pretty sure it
would mostly be network overhead of gossip communication/peer discovery).


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: memtable flush size with LCS

2015-10-27 Thread Nate McCall

The sstable_size_in_mb can be thought of a target for the compaction
process moving the file beyond L0.

Note: If there are more than 32 SSTables in L0, it will switch over to
doing STCS for L0 (you can disable this behavior by passing
-Dcassandra.disable_stcs_in_l0=true as a system property).

With a lot of overwrites, the settings you want to tune will be
gc_grace_seconds in combination with tombstone_threhsold,
tombstone_compaction_interval and maybe unchecked_tombstone_compaction
(there are different opinions about this last one, YMMV). Making these more
aggressive and increasing your sstable_size_in_mb will allow for
potentially capturing more overwrites in a level which will lead to less
fragmentation. However, making the size too large will keep compaction from
triggering on further out levels which can then exacerbate problems
particulary if you have long-lived TTLs.

In general, it is very workload specific, but monitoring the histogram for
the number of ssables used in a read (via
org.apache.cassandra.metrics.ColumnFamily.$KEYSPACE.$TABLE.SSTablesPerReadHistogram.95percentile
or shown manually in nodetool cfhistograms output) after any change will
help you narrow in a good setting.

See
http://docs.datastax.com/en/cql/3.1/cql/cql_reference/compactSubprop.html?scroll=compactSubprop__compactionSubpropertiesLCS
for more details.

On Tue, Oct 27, 2015 at 3:42 PM, Dan Kinder  wrote:
>
> Hi all,
>
> The docs indicate that memtables are triggered to flush when data in the
commitlog is expiring or based on memtable_flush_period_in_ms.
>
> But LCS has a specified sstable size; when using LCS are memtables
flushed when they hit the desired sstable size (default 160MB) or could L0
sstables be much larger than that?
>
> Wondering because I have an overwrite workload where larger memtables
would be helpful, and if I need to increase my LCS sstable size in order to
allow for that.
>
> -dan

--
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: cassandra bootstrapping

2015-10-22 Thread Nate McCall

> I keep on seeing that there should be a 2 minute delay when bootstrapping
a cluster, and
> I have few questions round that.
>
> For starters, is there any reasoning why this is 2min and not less or
more?
> Is this valid mostly for bootstraping an empty cluster ring  or for
> restarting an existing established cluster?

There is a good comment at the top of StorageService#joinTokenRing which
explains the process at a high level:
https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/service/StorageService.java#L791-L803

The method itself is long, but readible and has a series of comments that
explain some of the decisions taken and even reference some issues which
have been encountered over the years.

You can change this value if you really want by passing
"cassandra.ring_delay_ms" as a system property at startup.


--
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Verifying internode SSL

2015-10-13 Thread Nate McCall

> I've configured internode SSL and set it to be used between datacenters
only. Is there a way in the logs to verify SSL is operating between nodes
in different DCs or do I need to break out tcpdump?
>

Even on DC only encryption, you should see the following message in the log:

"Starting Encrypted Messaging Service on SSL port 7001"

With any Java-based thing using SSL, you can always use the following
startup parameter to find out exactly what is going in:

-Djavax.net.debug=ssl

This page will tell you how to interpret the debug output:
http://docs.oracle.com/javase/7/docs/technotes/guides/security/jsse/ReadDebug.html

--
-----
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: CLUSTERING ORDER BY importance with ssd's

2015-10-09 Thread Nate McCall

>
>
> If I am selecting a range from the bottom of the partition, does it make
> much of a difference (considering I only use ssd's) if the clustering order
> is ASC or DESC.
>

The only impact is that there is an extra seek to the bottom of the
partition.

Re: Re : List users getting stuck and not returning results

2015-10-02 Thread Nate McCall

Set the replication factor for the system_auth keyspace equal to the number
of nodes, then issue a repair.

On Fri, Oct 2, 2015 at 6:51 AM, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

> We have 2 clusters running DSE. On one of the clusters we recently added
> additional nodes to a datacenter.
>
> On the cluster where we added nodes, we are getting authentication issues
> from client. We are also unable to "list users" on system_auth keyspace.
> It's getting stuck.
>
> InvalidRequestException(*why:*_UserhasnoSELECTpermission** on <>
> orany of itsparents*)  -> client side error
>
> The other clusters perform fine.
>
> Thanks in advance.
>

-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Unable to remove dead node from cluster.

2015-09-25 Thread Nate McCall

:2.1.8+git20150804.076b0b1]
>>>>> 2015-09-18_23:21:40.80670   at
>>>>> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1822)
>>>>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
>>>>> 2015-09-18_23:21:40.80671   at
>>>>> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1495)
>>>>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
>>>>> 2015-09-18_23:21:40.80671   at
>>>>> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2121)
>>>>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
>>>>> 2015-09-18_23:21:40.80672   at
>>>>> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1009)
>>>>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
>>>>> 2015-09-18_23:21:40.80673   at
>>>>> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1113)
>>>>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
>>>>> 2015-09-18_23:21:40.80673   at
>>>>> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
>>>>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
>>>>> 2015-09-18_23:21:40.80673   at
>>>>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
>>>>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
>>>>> 2015-09-18_23:21:40.80674   at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> ~[na:1.7.0_45]
>>>>> 2015-09-18_23:21:40.80674   at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> ~[na:1.7.0_45]
>>>>> 2015-09-18_23:21:40.80674   at
>>>>> java.lang.Thread.run(Thread.java:744) ~[na:1.7.0_45]
>>>>> 2015-09-18_23:21:40.85812 WARN  23:21:40 Not marking nodes down due to
>>>>> local pause of 10852378435 > 50
>>>>>
>>>>> Any suggestions about how to remove it?
>>>>> Thanks.
>>>>>
>>>>> --
>>>>> Dikang
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Dikang
>>>
>>>
>>
>>
>> --
>> Dikang
>>
>>
>
>
> --
> Dikang
>
>


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: LTCS Strategy Resulting in multiple SSTables

2015-09-16 Thread Nate McCall

You could try altering the table to use STCS, then force a major compaction
via 'nodetool compact', then alter the table back to LCS when it completes.
You may very well hit the same issues in process of doing this, however,
until you upgrade.


On Wed, Sep 16, 2015 at 1:25 PM, Saladi Naidu  wrote:

> Nate,
> Yes we are in process of upgrading to 2.1.9. Meanwhile I am looking for
> correcting the problem, do you know any recovery options to reduce the
> number of SS Tables. As SStbales are keep on increasing, the read
> performance is deteriorating
>
> Naidu Saladi
>
> ------
> *From:* Nate McCall 
> *To:* Cassandra Users ; Saladi Naidu <
> naidusp2...@yahoo.com>
> *Sent:* Tuesday, September 15, 2015 4:53 PM
>
> *Subject:* Re: LTCS Strategy Resulting in multiple SSTables
>
> That's an early 2.1/known buggy version. There have been several issues
> fixed since which could cause that behavior. Most likely
> https://issues.apache.org/jira/browse/CASSANDRA-9592 ?
>
> Upgrade to 2.1.9 and see if the problem persists.
>
>
>
> On Tue, Sep 15, 2015 at 8:31 AM, Saladi Naidu 
> wrote:
>
> We are on 2.1.2 and planning to upgrade to 2.1.9
>
> Naidu Saladi
>
> --
> *From:* Marcus Eriksson 
> *To:* user@cassandra.apache.org; Saladi Naidu 
> *Sent:* Tuesday, September 15, 2015 1:53 AM
> *Subject:* Re: LTCS Strategy Resulting in multiple SSTables
>
> if you are on Cassandra 2.2, it is probably this:
> https://issues.apache.org/jira/browse/CASSANDRA-10270
>
>
>
> On Tue, Sep 15, 2015 at 4:37 AM, Saladi Naidu 
> wrote:
>
> We are using Level Tiered Compaction Strategy on a Column Family. Below
> are CFSTATS from two nodes in same cluster, one node has 880 SStables in L0
> whereas one node just has 1 SSTable in L0. In the node where there are
> multiple SStables, all of them are small size and created same time stamp.
> We ran Compaction, it did not result in much change, node remained with
> huge number of SStables. Due to this large number of SSTables, Read
> performance is being impacted
>
> In same cluster, under same keyspace, we are observing this discrepancy in
> other column families as well. What is going wrong? What is the solution to
> fix this
>
> *---*NODE1*---*
> *Table: category_ranking_dedup*
> *SSTable count: 1*
> *SSTables in each level: [1, 0, 0, 0, 0,
> 0, 0, 0, 0]*
> *Space used (live): 2012037*
> *Space used (total): 2012037*
> *Space used by snapshots (total): 0*
> *SSTable Compression Ratio:
> 0.07677216119569073*
> *Memtable cell count: 990*
> *Memtable data size: 32082*
> *Memtable switch count: 11*
> *Local read count: 2842*
> *Local read latency: 3.215 ms*
> *Local write count: 18309*
> *Local write latency: 5.008 ms*
> *Pending flushes: 0*
> *Bloom filter false positives: 0*
> *Bloom filter false ratio: 0.0*
> *Bloom filter space used: 816*
> *Compacted partition minimum bytes: 87*
> *Compacted partition maximum bytes:
> 25109160*
> *Compacted partition mean bytes: 22844*
> *Average live cells per slice (last five
> minutes): 338.84588318085855*
> *Maximum live cells per slice (last five
> minutes): 10002.0*
> *Average tombstones per slice (last five
> minutes): 36.53307529908515*
> *Maximum tombstones per slice (last five
> minutes): 36895.0*
>
> *NODE2---  *
> *Table: category_ranking_dedup*
> *SSTable count: 808*
> *SSTables in each level: [808/4, 0, 0, 0,
> 0, 0, 0, 0, 0]*
> *Space used (live): 291641980*
> *Space used (total): 291641980*
> *Space used by snapshots (total): 0*
> *SSTable Compression Ratio:
> 0.1431106696818256*
> *Memtable cell count: 4365293*
> *Memtable dat

Re: LTCS Strategy Resulting in multiple SSTables

2015-09-15 Thread Nate McCall

Average live cells per slice (last five
> minutes): 416.1780688985929*
> *Maximum live cells per slice (last five
> minutes): 10002.0*
> *Average tombstones per slice (last five
> minutes): 45.11547792333818*
> *Maximum tombstones per slice (last five
> minutes): 36895.0*
>
>
>
>
> Naidu Saladi
>
>
>
>
>


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Currupt sstables when upgrading from 2.1.8 to 2.1.9

2015-09-15 Thread Nate McCall

You have a/some corrupt SSTables. 2.1.9 is doing strict checking at startup
and reacting based on "disk_failure_policy" per the stack trace.

For details, see:
https://issues.apache.org/jira/browse/CASSANDRA-9686

Either way, you are going to have to run nodetool scrub. I'm not sure if
it's better to do this from 2.1.8 or from 2.1.9 with "disk_failure_policy:
ignore"

It feels like that option got overloaded a bit strangely with the changes
in CASSANDRA-9686 and I have not yet tried it with it's new meaning.

On Tue, Sep 15, 2015 at 5:26 AM, George Sigletos 
wrote:

> Hello,
>
> I tried to upgrade two of our clusters from 2.1.8 to 2.1.9. In some, but
> not all nodes, I got errors about corrupt sstables when restarting. I
> downgraded back to 2.1.8 for now.
>
> Has anybody else faced the same problem? Should sstablescrub fix the
> problem? I ddin't tried that yet.
>
> Kind regards,
> George
>
> ERROR [SSTableBatchOpen:3] 2015-09-14 10:16:03,296 FileUtils.java:447 -
> Exiting forcefully due to file system exception on startup, disk failure
> policy "stop"
> org.apache.cassandra.io.sstable.CorruptSSTableException:
> java.io.EOFException
> at
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
> Source) [na:1.7.0_75]
> at java.util.concurrent.FutureTask.run(Unknown Source)
> [na:1.7.0_75]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source) [na:1.7.0_75]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source) [na:1.7.0_75]
> at java.lang.Thread.run(Unknown Source) [na:1.7.0_75]
> Caused by: java.io.EOFException: null
> at java.io.DataInputStream.readUnsignedShort(Unknown Source)
> ~[na:1.7.0_75]
> at java.io.DataInputStream.readUTF(Unknown Source) ~[na:1.7.0_75]
>     at java.io.DataInputStream.readUTF(Unknown Source) ~[na:1.7.0_75]
> at
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
>



-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: cassandra-stress on 3.0 with column widths benchmark.

2015-09-14 Thread Nate McCall

By default, stress runs stop after throughput has not improved after three
runs. This functionality is a little difficult to figure out from the
documentation, so take a look at (maybe even with a debugger attached):
https://github.com/apache/cassandra/blob/cassandra-2.1/tools/stress/src/org/apache/cassandra/stress/StressAction.java#L111
to see what's going on.

In both scenarios, this may have taken roughly the same time to hit
saturation, but for different reasons and with different results as you
saw.

To really find out why, it would be a good idea to enable one of the
reporters (
http://www.datastax.com/dev/blog/pluggable-metrics-reporting-in-cassandra-2-0-2)
to send cluster metrics to a monitoring system such as Graphite. At the
very least get OpsCenter running and capturing metrics from the cluster.
Either way, getting familiar with a visual picture of the runtime now is
invaluable for really understanding any future production deployment.

On Sun, Sep 13, 2015 at 9:25 PM, Kevin Burton  wrote:

> I’m trying to benchmark two scenarios…
>
> 10 columns with 150 bytes each
>
> vs
>
> 150 columns with 10 bytes each.
>
> The total row “size” would be 1500 bytes (ignoring overhead).
>
> Our app uses 150 columns so I’m trying to see if packing it into a JSON
> structure using one column would improve performance.
>
> I seem to have confirmed my hypothesis.
>
> I’m running two tests:
>
> ./tools/bin/cassandra-stress write -insert -col n=FIXED\(10\)
>> size=FIXED\(150\) | tee cassandra-stress-10-150.log
>>
>
>
>> time ./tools/bin/cassandra-stress write -insert -col n=FIXED\(150\)
>> size=FIXED\(10\) | tee cassandra-stress-150-10.log
>
>
> this shows that the "op rate” is much much lower when running with 150
> columns:
>
> root@util0063 ~/apache-cassandra-3.0.0-beta2 # grep "op rate"
>> cassandra-stress-10-150.log
>> op rate   : 7632 [WRITE:7632]
>> op rate   : 11851 [WRITE:11851]
>> op rate   : 31967 [WRITE:31967]
>> op rate   : 41798 [WRITE:41798]
>> op rate   : 51251 [WRITE:51251]
>> op rate   : 58057 [WRITE:58057]
>> op rate   : 62977 [WRITE:62977]
>> op rate   : 65398 [WRITE:65398]
>> op rate   : 67673 [WRITE:67673]
>> op rate   : 69198 [WRITE:69198]
>> op rate   : 70402 [WRITE:70402]
>> op rate   : 71019 [WRITE:71019]
>> op rate   : 71574 [WRITE:71574]
>> root@util0063 ~/apache-cassandra-3.0.0-beta2 # grep "op rate"
>> cassandra-stress-150-10.log
>> op rate   : 2570 [WRITE:2570]
>> op rate   : 5144 [WRITE:5144]
>> op rate   : 10906 [WRITE:10906]
>> op rate   : 11832 [WRITE:11832]
>> op rate   : 12471 [WRITE:12471]
>> op rate   : 12915 [WRITE:12915]
>> op rate   : 13620 [WRITE:13620]
>> op rate   : 13456 [WRITE:13456]
>> op rate   : 13916 [WRITE:13916]
>> op rate   : 14029 [WRITE:14029]
>> op rate   : 13915 [WRITE:13915]
>
>
> … what’s WEIRD here is that
>
> Both tests take about 10 minutes.  Yet it’s saying that the op rate for
> the second is slower.  Why would that be? That doesn’t make much sense…
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
>
>

-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Question: Gossip Protocol

2015-09-14 Thread Nate McCall

It is hard coded in Gossiper:
https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/gms/Gossiper.java#L83

What requirement are you trying to address by increasing this value?

On Mon, Sep 14, 2015 at 8:26 AM, Thouraya TH  wrote:

>  I find this information :
>
> The gossip process runs every second and exchanges state messages with up
> to three other nodes in the cluster.
>
> here
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureGossipAbout_c.html
>
>
> Please, i ask if it is possible to change this periode, to three seconds ?
>
> Kind regards.
>
>
>
>
> 2015-09-14 14:15 GMT+01:00 Thouraya TH :
>
>> Hi all,
>>
>> Please, the gossip procotol in cassandra is running every ... seconds ?
>>
>>
>> Thank you so much for answers.
>> Best Regards.
>>
>
>


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Should replica placement change after a topology change?

2015-09-10 Thread Nate McCall

>
>
> So if you have a topology that would change if you switched from
>> SimpleStrategy to NetworkTopologyStrategy plus multiple racks, it sounds
>> like a different migration strategy would be needed?
>>
>> I am imagining:
>>
>>1. Switch to a different snitch, and the keyspace from SimpleStrategy
>>to NTS but keep it all in one rack. So effectively the same topology, but
>>with a different snitch.
>>2. Set up a new data centre with the desired topology.
>>3. Change the keyspace to have replicas in the new DC.
>>4. Rebuild all the nodes in the new DC.
>>5. Flip all your clients over to the new DC.
>>6. Decommission your original DC.
>>
>> That would work, yes. I would add :
>
> - 4.5. Repair all nodes.
>

I can confirm that the above process works (definitely include Rob's repair
suggestion, though). It is really the only way we've found to safely go
from SimpleSnitch to rack-aware NTS.

The same process works/is required for SimpleSnitch to Ec2Snitch fwiw.




-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Trace evidence for LOCAL_QUORUM ending up in remote DC

2015-09-09 Thread Nate McCall

Thanks for reporting back, Tom.

Can you drop a comment on the ticket with a sentence or two describing your
specific case and that speculative_retry = NONE was a valid work-around?

That will make it easier for the next folks that come along to have a
concrete problem/solution in a single comment on that ticket.

Glad to hear it worked, though.

On Tue, Sep 8, 2015 at 3:38 PM, Tom van den Berge  wrote:

> Nate,
>
> I've disabled it, and it's been running for about an hour now without
> problems, while before, the problem occurred roughly every few minutes. I
> guess it's safe to say that this proves that CASSANDRA-9753
> <https://issues.apache.org/jira/browse/CASSANDRA-9753> is the cause of
> the problem.
>
> I'm very happy to finally know the cause of this problem! Thanks for
> pointing me in the right direction.
> Tom
>
> On Tue, Sep 8, 2015 at 9:13 PM, Nate McCall 
> wrote:
>
>> Just to be sure: can this bug result in a 0-row result while it should be
>>> > 0 ?
>>>
>> Per Tyler's reference to CASSANDRA-9753
>> <https://issues.apache.org/jira/browse/CASSANDRA-9753>, you would see
>> this if the read was routed by speculative retry to the nodes that were not
>> yet finished being built.
>>
>> Does this work as anticipated when you set speculative_retry to NONE?
>>
>>
>>
>>
>> --
>> -----
>> Nate McCall
>> Austin, TX
>> @zznate
>>
>> Co-Founder & Sr. Technical Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>

-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Trace evidence for LOCAL_QUORUM ending up in remote DC

2015-09-08 Thread Nate McCall

>
> Just to be sure: can this bug result in a 0-row result while it should be
> > 0 ?
>
Per Tyler's reference to CASSANDRA-9753
<https://issues.apache.org/jira/browse/CASSANDRA-9753>, you would see this
if the read was routed by speculative retry to the nodes that were not yet
finished being built.

Does this work as anticipated when you set speculative_retry to NONE?




-- 
-----
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Re : Restoring nodes in a new datacenter, from snapshots in an existing datacenter

2015-08-28 Thread Nate McCall

You cannot use the identical token ranges. You have to capture membership
information somewhere for each datacenter, and use that token information
when briging up the replacement DC.

You can find details on this process here:
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html

That process is straight forward, but it can go south pretty quickly if you
miss a step. It's a really good idea to set asside some time to try this
out in a staging/test system and build a runbook for the process targeting
your specific environment.

On Fri, Aug 28, 2015 at 1:12 PM, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

>
> hi;
>  We have cassandra cluster with Vnodes spanning across 3 data centers.
> We take backup of the snapshots from one datacenter.
>In a doomsday scenario, we want to restore a downed datacenter, with
> snapshots from another datacenter. We have same number of nodes in each
> datacenter.
>
> 1 : We know it requires copying the snapshots and their corresponding
> token ranges to the nodes in new datacenter, and running nodetool refresh.
>
> 2 : The question is, we will now have 2 datacenters, with the same exact
> token ranges. Will that cause any problem.
>
> DC1 : Node-1 : token1..token10
>   Node-2 : token11 .token20
>   Node-3 : token21 . token30
>   Node-4 : token31 . token40
>
>  DC2 : Node-1 : token1.token10
>Node-2 : token11token20
>Node-3 : token21token30
>    Node-4 : token31token40
>
>
> thanks
> Sai
>
>
>
>

-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Re : Decommissioned node appears in logs, and is sometimes marked as "UNREACHEABLE" in `nodetool describecluster`

2015-08-28 Thread Nate McCall

Do they show up in nodetool gossipinfo?

Either way, you probably need to invoke Gossiper.unsafeAssassinateEndpoints
via JMX as described in step 1 here:
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_gossip_purge.html

On Fri, Aug 28, 2015 at 1:32 PM, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

> hi;
> we decommissioned nodes in a datacenter a while back. Those nodes keep
> showing up in the logs, and also sometimes marked as UNREACHABLE when
> `nodetool describecluster` is run.
>
> However these nodes do not show up in `nodetool status` and
> `nodetool ring`.
>
> Below are a couple lines from the logs.
>
> 2015-08-27 04:38:16,180 [GossipStage:473] INFO Gossiper InetAddress /
> 10.0.0.1 is now DOWN
> 2015-08-27 04:38:16,183 [GossipStage:473] INFO StorageService Removing
> tokens [85070591730234615865843651857942052865] for /10.0.0.1
>
> thanks
> Sai
>
>


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: How to get the peer's IP address when writing failed

2015-08-28 Thread Nate McCall

Unfortunately, the addresses/DC of the replicas are not available on the
exception hierarchy within Cassandra.

Fwiw, the DS Java Driver (most native protocol drivers actually) manages
membership dynamically by acting on cluster health events sent back over
the channel by the native transport. Keeping this intelligence down in the
driver makes for significantly less complex cluster management in an
application.

On Wed, Aug 26, 2015 at 3:51 AM, Lu, Boying  wrote:

> Hi, All,
>
>
>
> We have an Cassandra environment with two connected DCs and our
> consistency level of writing operation is EACH_QUORUM.
>
> So if one DC is down, the write will be failed and we get
> TokenRangeOfflineException on the client side (we use netfilix java client
> libraries).
>
>
>
> We want to give more detailed information about this failure. e.g. The IP
> addresses of the broken nodes (on the broken DC in our case).
>
> We checked the TokenRangeOfflineException and its parent class
> ConnectionException.  The only related method is getHost().
>
> But it returns the IP address of the local node (the node that issues the
> writing operation) instead of the remote node on the broken DC.
>
>
>
> Does anyone know how to get such information when writing failed?
>
>
>
> Thanks
>
>
> Boying
>
>
>

-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: 'no such object in table'

2015-08-26 Thread Nate McCall

> LOCAL_JMX=no
>
> if [ "$LOCAL_JMX" = "yes" ]; then
>   JVM_OPTS="$JVM_OPTS -Dcassandra.jmx.local.port=$JMX_PORT
> -XX:+DisableExplicitGC"
> else
>   JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT"
>   JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.rmi.port=$JMX_PORT"
>   JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl=false"
>   JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.authenticate=false"
>   JVM_OPTS="$JVM_OPTS
>
-Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password"
> fi
>

Retry with the following option added to your JVM_OPTS:
java.rmi.server.logCalls=true

This should produce some more information about what is going on.




--
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: AWS multi-region DCs fail to rebuild

2015-07-30 Thread Nate McCall

>
>
> This happens repeatedly when attempting to run the rebuild on just a
> single node
> in the US DC (pointing at the EU DC). I have not yet tried any other node
> from the
> US DC.
>
> Is this a bug or a configuration error perhaps? I know people out there
> are using
> AWS for Cassandra - how are you replicating across regions?
>

There have been some edge cases here in the past:
https://issues.apache.org/jira/browse/CASSANDRA-4026

Check the AWS metadata (
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html)
to see that what the region and AZ endpoints return is consistent with your
keyspace declaration.

If nothing really sticks out, you may want to try just using GPFS and fall
back to setting DC and rack by hand.

Per Rob's point about bleeding edge, I'd be super curious if the existing
setup worked as is on 2.1 or 2.0. I'd be willing to bet you are the first
person trying to make EC2Snitch span regions on 2.2.


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: AssertionError on PasswordAuthenticator

2015-07-27 Thread Nate McCall

>
>
> Any ideas what might be wrong or which prerequisites need to be met? This
> is the first request for a connection.
>
>
>

Sam makes a good point. Make sure you have the username and password
properties set in the configuration file:
https://github.com/apache/incubator-usergrid/blob/master/stack/config/src/main/resources/usergrid-default.properties#L52-L53

See this page for details on configuration:
http://usergrid.readthedocs.org/en/latest/deploy-local.html#install-and-configure-cassandra

For Usergrid specific questions, feel free to stop by our mail list or IRC
channel both of which are listed here:
http://usergrid.incubator.apache.org/community/



-- 
-----
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Unbalanced disk load

2015-07-18 Thread Nate McCall

>
> I am currently benchmarking Cassandra with three machines, and on each
machine I am seeing an unbalanced distribution of data among the data
directories (1 per disk).
> I am concerned that this affects my write performance, is there anything
that I can make the distribution be more even? Would raid0 be my best
option?
>

Using LeveledCompactionStrategy should provide a much better balance.

However, depending on your use case, this may not be the right choice for
your workload, in which case RAID0 with a single data_dir will be the best
option.

> Total size of data is about 2TB, 14B records, all unique. Replication
factor of 1.

RF=1 means *no* redundancy which is a bad idea to run in production (and
sort of defeats the purpose of a system like Cassandra). This is not going
to be an accurate a picture for a load test as it eliminates a lot of
cross-node traffic which you would see with a higher Replication Factor.


--
---------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Significant drop in storage load after 2.1.6->2.1.8 upgrade

2015-07-18 Thread Nate McCall

Perhaps https://issues.apache.org/jira/browse/CASSANDRA-9592 got
compactions moving forward for you? This would explain the drop.

However, the discussion on
https://issues.apache.org/jira/browse/CASSANDRA-9683 seems to be similar to
what you saw and that is currently being investigated.

On Fri, Jul 17, 2015 at 10:24 AM, Mike Heffner  wrote:

> Hi all,
>
> I've been upgrading several of our rings from 2.1.6 to 2.1.8 and I've
> noticed that after the upgrade our storage load drops significantly (I've
> seen up to an 80% drop).
>
> I believe most of the data that is dropped is tombstoned (via TTL
> expiration) and I haven't detected any data loss yet. However, can someone
> point me to what changed between 2.1.6 and 2.1.8 that would lead to such a
> significant drop in tombstoned data? Looking at the changelog there's
> nothing that jumps out at me. This is a CF definition from one of the CFs
> that had a significant drop:
>
> > describe measures_mid_1;
>
> CREATE TABLE "Metrics".measures_mid_1 (
> key blob,
> c1 int,
> c2 blob,
> c3 blob,
> PRIMARY KEY (key, c1, c2)
> ) WITH COMPACT STORAGE
> AND CLUSTERING ORDER BY (c1 ASC, c2 ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 0
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
>
> Thanks,
>
> Mike
>
> --
>
>   Mike Heffner 
>   Librato, Inc.
>
>


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

1 2 3 4 >

1 - 100 of 307 matches

Mail list logo