ApacheCon Cassandra and NGCC 2020 Call for proposals
I am delighted to share with you that we, the Apache Cassandra community, in light of our success at last year at last year's conference, have been given a three day track at this year's ApacheCon in New Orleans, LA, USA [0]. The goal of this track is simple: we are going to get together to talk about Apache Cassandra. As such, this will be the ideal place to network with peers, ask questions, get answers, etc. On day one, we will be having our Next Generation Cassandra Conference (NGCC). All are welcome to attend but this day is targeted for Apache Cassandra committers, contributors and large-scale cluster operators to get together and discuss topics of interest to them for future development efforts. The content will focus on internals and will be geared towards folks with knowledge of the codebase and/or operating Cassandra in very large environments. Talk submissions for NGCC should take this target audience into account. Days two and three will be more general purpose and accessible for a wider audience. If you are interested in speaking here, put something together that tells a story others will want to hear. What we are looking for is general use case submissions that our users will find interesting. This can be how you solved a specific problem or just a general picture into how your organization uses Apache Cassandra. A good submission will embrace the open source ethos of sharing information to help others solve similar problems. NGCC talks will be targeted to 30 minutes with 15 minutes for questions or small break out discussions. General purpose talks will have 50 minutes with five minutes for questions. For more information, including details of how to submit proposals, please see this page: https://acna2020.jamhosted.net Please indicate "Cassandra" as the category and add NGCC at the top of the "Proposal abstract" text box if you are submitting an NGCC talk. If you are interested in helping organize, plan, and review submissions for the Cassandra track, we'll send additional details out closer to the CFP deadline about how you can be involved. [0] https://www.apachecon.com/acna2020/
2020 ASF Community Survey: Users
Hello everyone, If you have an apache.org email, you should have received an email with an invitation to take the 2020 ASF Community Survey. Please take 15 minutes to complete it. If you do not have an apache.org email address or you didn’t receive a link, please follow this link to the survey: https://communitysurvey.limequery.org/454363 This survey is important because it will provide us with scientific information about our community, and shed some light on how we can collaborate better and become more diverse. Our last survey of this kind was implemented in 2016, which means that our existing data about Apache communities is outdated. The deadline to complete the survey is January 4th, 2020. You can find information about privacy on the survey’s Confluence page [1]. Your participation is paramount to the success of this project! Please consider filling out the survey, and share this news with your fellow Apache contributors. As individuals form the Apache community, your opinion matters: we want to hear your voice. If you have any questions about the survey or otherwise, please reach out to us! Kindly, ASF Diversity & Inclusion https://diversity.apache.org/ [1] https://cwiki.apache.org/confluence/display/EDI/Launch+Plan+-+The+2020+ASF+Community+Survey
Cassandra track at ApacheCon 2019 finalized
Hi Folks, The schedule is up for ApacheCon 2019, we could not be happier with the Cassandra track we were able to put together. https://www.apachecon.com/acna19/schedule.html Huge thanks again to everyone that submitted talks. We had 3x the number of submissions of any other project specific track and *almost* as many submissions as the premier big data track!! Make sure you get this on your schedules. It will be a unique opportunity to interface with project developers, other Apache Cassandra users and operators as well as the whole ASF community. Hope to see you all there! Cheers, -Nate
Two day Apache Cassandra track at ApacheConNA 2019
Hi Folks, I am delighted to share with you that we, the Apache Cassandra community, have been given a two day track at this year's ApacheCon North America. The goal of this track is simple: we are going to get together to talk about Apache Cassandra. As such, this will be the ideal place to network with peers, ask questions, get answers, etc. On day one, we will be having our Next Generation Cassandra Conference (NGCC). All are welcome to attend but this day is targeted for Apache Cassandra committers, contributors and large-scale cluster operators to get together and discuss topics of interest to them for future development efforts. The content will focus on internals and will be geared towards folks with knowledge of the codebase and/or operating Cassandra in very large environments. Talk submissions for NGCC should take this target audience into account. Day two will be more general purpose and accessible for a wider audience. If you are interested in speaking here, put something together that tells a story others will want to hear. What we are looking for is general use case submissions that our users will find interesting. This can be how you solved a specific problem or just a general picture into how your organization uses Apache Cassandra. A good submission will embrace the open source ethos of sharing information to help others solve similar problems. NGCC talks will be targeted to 30 minutes with 15 minutes for questions or small break out discussions. General purpose talks will have 40 minutes with five minutes for questions. For more information, including details of how to submit proposals, please see this page: http://cassandra.apache.org/events/2019-apache-cassandra-summit/ Cheers, -Nate - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: too many logDroppedMessages and StatusLogger
Are you using queries with a large number of arguments to an IN clause on a partition key? If so, the coordinator has to: - hold open the client request - unwind the IN clause into individual statements - scatter/gathering those statements around the cluster (each at the requested consistency level!) - pull it all back together and send it out In extreme cases, this can flood internode messaging and make things look slow even when the system is near idle. On Fri, Mar 8, 2019 at 9:27 PM Marco Gasparini wrote: > > Hi all, > > I cannot understand why I get the following logs, they appear every day at > not fixed period of time. I saw them every 2 minutes or every 10 seconds, I > cannot find any pattern. > I took this very example here during an heavy workload of writes and reads > but I get them also during a very little workload and without any active > compaction/repair/streaming process and no high cpu/memory/iowait usage. > >> 2019-03-08 01:49:47,868 INFO [ScheduledTasks:1] MessagingService.java:1246 >> logDroppedMessages READ messages were dropped in last 5000 ms: 0 internal >> and 1 cross node. Mean internal dropped latency: 6357 ms and Mean cross-node >> dropped latency: 6556 ms >> 2019-03-08 01:49:47,868 INFO [ScheduledTasks:1] StatusLogger.java:47 log >> Pool NameActive Pending Completed Blocked All >> Time Blocked >> 2019-03-08 01:49:47,870 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> MutationStage 0 0 17641121 0 >> 0 >> 2019-03-08 01:49:47,870 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> ViewMutationStage 0 0 0 0 >> 0 >> 2019-03-08 01:49:47,870 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> ReadStage 0 06851090 0 >> 0 >> 2019-03-08 01:49:47,870 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> RequestResponseStage 0 0 13646587 0 >> 0 >> 2019-03-08 01:49:47,870 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> ReadRepairStage 0 0 352884 0 >> 0 >> 2019-03-08 01:49:47,870 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> CounterMutationStage 0 0 0 0 >> 0 >> 2019-03-08 01:49:47,870 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> MiscStage 0 0 0 0 >> 0 >> 2019-03-08 01:49:47,870 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> CompactionExecutor0 0 882478 0 >> 0 >> 2019-03-08 01:49:47,871 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> MemtableReclaimMemory 0 0 4101 0 >> 0 >> 2019-03-08 01:49:47,871 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> PendingRangeCalculator0 0 7 0 >> 0 >> 2019-03-08 01:49:47,871 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> GossipStage 0 04399705 0 >> 0 >> 2019-03-08 01:49:47,871 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> SecondaryIndexManagement 0 0 0 0 >> 0 >> 2019-03-08 01:49:47,871 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> HintsDispatcher 0 0 2165 0 >> 0 >> 2019-03-08 01:49:47,871 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> MigrationStage0 0 50 0 >> 0 >> 2019-03-08 01:49:47,871 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> MemtablePostFlush 0 0 4393 0 >> 0 >> 2019-03-08 01:49:47,872 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> PerDiskMemtableFlushWriter_0 0 0 4097 0 >> 0 >> 2019-03-08 01:49:47,872 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> ValidationExecutor0 0 1565 0 >> 0 >> 2019-03-08 01:49:47,872 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> Sampler 0 0 0 0 >> 0 >> 2019-03-08 01:49:47,872 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> MemtableFlushWriter 0 0 4101 0 >> 0 >> 2019-03-08 01:49:47,872 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> InternalResponseStage 0 0 121813 0 >> 0 >> 2019-03-08 01:49:47,872 INFO [ScheduledTasks:1] StatusLogger.java:51 log >> AntiEntropyStage 0 0
Re: Cassandra trace
At this point, query tracing is easier to do from the driver side. Docs for python and java: http://datastax.github.io/python-driver/api/cassandra/query.html# https://github.com/datastax/java-driver/tree/3.x/manual/logging#logging-query-latencies This has been completely redone in 4.0. For details (which also include some good discussion on the current limitations) see: https://issues.apache.org/jira/browse/CASSANDRA-13983 https://issues.apache.org/jira/browse/CASSANDRA-12151 On Tue, Oct 23, 2018 at 5:10 PM Mun Dega wrote: > > Hello, > > Does anyone know how I can see queries coming when they're as prepared > statements when trace is turned on Cassandra 3.x? > > If trace doesn't show, any ideas how I can see these type of queries? - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Cassandra 4.0
When it's ready :) In all seriousness, the past two blog posts include some discussion on our motivations and current goals with regard to 4.0: http://cassandra.apache.org/blog/ On Wed, Oct 24, 2018 at 4:49 AM Abdul Patel wrote: > > Hi all, > > Any idea when 4.0 is planned to release? - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: SNAPSHOT builds?
We'll start publishing snapshot builds in the near future to ease testing (support for such just added via CASSANDRA-12704). On Sun, Sep 30, 2018 at 5:11 AM James Carman wrote: > > Okay, cool. So, 4.0.0-SNAPSHOT doesn’t have Java 11 support quite yet? No > big deal. Just trying to get ahead of the game and be ready once we have it. > Thanks, Jonathan! > > On Sat, Sep 29, 2018 at 11:16 AM Jonathan Haddad wrote: >> >> Hey James, you’ll have to build it. Java 11 is out but the build >> instructions still apply: >> >> http://thelastpickle.com/blog/2018/08/16/java11.html >> >> >> On Sat, Sep 29, 2018 at 7:01 AM James Carman >> wrote: >>> >>> I am trying to find 4.x SNAPSHOT builds. Are they available anywhere >>> handy? I'm trying to work on Java 11 compatibility for a library. >>> >>> Thanks, >>> >>> James >> >> -- >> Jon Haddad >> http://www.rustyrazorblade.com >> twitter: rustyrazorblade - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Separated commit log directory configuration
> We only increased commitlog_total_space_in_mb so that Cassandra fully uses > the dedicated disk, but that may be an error? > The default value for this setting is (per the documentation): > > The default value is the smaller of 8192, and 1/4 of the total space of > the commitlog volume. > > But that doesn't say much (or should it really by 25% of the disk space?) I wouldnt fill any filesystem intentionally that close to its partition size. Most (all?) will start to degrade performance-wise. Unless you are really strapped for disk space, give it some breathing room. This is best chosen by monitoring commitlog rotation frequency in conjunction with disk utilization for your cluster. > > So, my questions would be: > > * What size should I dedicate to this commit log disk? What are the rules of > thumb to discover the "best" size? > * How should I configure the "commitlog_total_space_in_mb" setting > respectively to the size of the disk? Most clusters shouldnt need to adjust this or any of the default commitlog settings unless you have excessively large mutations or require the commitlog getting written to disk more frequently. - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Rolling back Cassandra upgrades (tarball)
> I have a cluster on v3.0.11 I am planning to upgrade this to 3.10. > Is rolling back the binaries a viable solution? What's the goal with moving form 3.0 to 3.x? Also, our latest release in 3.x is 3.11.3 and has a couple of important bug fixes over 3.10 (which is a bit dated at this point). - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Apache Cassandra Blog is now live
You can tell how psyched we are about it because we cross posted! Seriously though - this is by the community for the community, so any ideas - please send them along. On Wed, Aug 8, 2018 at 1:53 PM, sankalp kohli wrote: > Hi, > Apache Cassandra Blog is now live. Check out the first blog post. > > http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html > > Thanks, > Sankalp - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
New community blog with inaugural post on faster streaming in 4.0
Hi folks, We just added a blog section to our site, with a post detailing performance improvements of streaming coming in 4.0: http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html I think it's a good indicator of what we are going for that our first author is not a committer or PMC member. Any subject ideas, please bring them up on the the dev list (d...@cassandra.apache.org) or open a JIRA. As long as it's informative and about Apache Cassandra, we are interested. Thanks, -Nate - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: which driver to use with cassandra 3
Due to how Spring Data binding works, you have to write queries explicitly to use the "...FROM keyspace.table ..." in either the template-method classes (CqlTemplate, etc) or via @Query annontations to avoid the 'use keyspace' overhead. For example, a Repository implementation for a User class (do this all by hand, do not use CassandraRepository per Patrick's point of it being a traveling carnival of cassandra anti-patterns) would look something like: @Query("select id, email, name from myusers.user where id = ?0") User findById(UUID id); Another important note - only the template method classes that work *directly* with prepared statements use them. In other words: *nothing else in the API uses prepared statements.* And this is a massive performance hit in statement parsing alone. There are open issues for this in the SD jira: https://jira.spring.io/browse/DATACASS-578 https://jira.spring.io/browse/DATACASS-510 If you stick to the cqlTemplate methods for working with PreparedStatements and ResultSet extractors, etc, because you want Spring to manage all the configuration, that's totally legit and it will work well. In general, this will be an good API one day as some of the Fluent stuff for working with paged results sets is particularly excellent and well crafted around modern Java paradigms (outside of not using PreparedStatement unfortunately). On Sun, Jul 22, 2018 at 1:15 PM, Goutham reddy wrote: > Hi, > Consider overriding default java driver provided by spring boot if you are > using Datastax clusters with with any of the 3.X Datastax driver. I agree to > Patrick, always have one key space specified to one application in that way > you achieve domain driven applications and cause less overhead avoiding > switching between key spaces. > > Cheers, > Goutham > > On Fri, Jul 20, 2018 at 10:10 AM Patrick McFadin wrote: >> >> Vitaliy, >> >> The DataStax Java driver is very actively maintained by a good size team >> and a lot of great community contributors. It's version 3.x compatible and >> even has some 4.x features starting to creep in. Support for virtual tables >> (https://issues.apache.org/jira/browse/CASSANDRA-7622) was just merged as >> an example. Even the largest DataStax customers have a mix of enterprise + >> OSS and we want to support them either way. Giving developers the most >> consistent experience is part of that goal. >> >> As for spring-data-cassandra, it does pull the latest driver as a part of >> its own build, so you will already have it in your classpath. Spring adds >> some auto-magic that you should be aware. The part you mentioned about the >> schema management, is one to be careful with using. If you use it in dev, >> it's not a huge problem. If it gets out to prod, you could potentially have >> A LOT of concurrent schema changes happening which can lead to bad things. >> Also, some of the spring API features such as findAll() can expose typical >> c* anti-patterns such as "allow filtering" Just be aware of what feature >> does what. And finally, another potential production problem is that if you >> use a lot of keyspaces, Spring will instantiate a new Driver Session object >> per keyspace which can lead to a lot of redundant connection to the >> database. From the driver, a better way is to specify a keyspace per query. >> >> As you are using spring-data-cassandra, please share your experiences if >> you can. There are a lot of developers that would benefit from some >> real-world stories. >> >> Patrick >> >> >> On Fri, Jul 20, 2018 at 4:54 AM Vitaliy Semochkin >> wrote: >>> >>> Thank you very much Duy Hai Doan! >>> I have relatively simple demands and since spring using datastax >>> driver I can always get back to it, >>> though I would prefer to use spring in order to do bootstrapping and >>> resource management for me. >>> On Fri, Jul 20, 2018 at 4:51 PM DuyHai Doan wrote: >>> > >>> > Spring data cassandra is so so ... It has less features (at last at the >>> > time I looked at it) than the default Java driver >>> > >>> > For driver, right now most of people are using Datastax's ones >>> > >>> > On Fri, Jul 20, 2018 at 3:36 PM, Vitaliy Semochkin >>> > wrote: >>> >> >>> >> Hi, >>> >> >>> >> Which driver to use with cassandra 3 >>> >> >>> >> the one that is provided by datastax, netflix or something else. >>> >> >>> >> Spring uses driver from datastax, though is it a reliable solution for >>> >> a long term project, having in mind that datastax and cassandra >>> >> parted? >>> >> >>> >> Regards, >>> >> Vitaliy >>> >> >>> >> - >>> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>> >> For additional commands, e-mail: user-h...@cassandra.apache.org >>> >> >>> > >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: user-h...@cassandra.apache.org >>> > -- > R
CVE-2018-8016 on Apache Cassandra
CVE-2018-8016 describes an issue with the default configuration of Apache Cassandra releases 3.8 through 3.11.1 which binds an unauthenticated JMX/RMI interface to all network interfaces allowing attackers to execute arbitrary Java code via an RMI request. This issue is a regression of the previously disclosed CVE-2015-0225. The regression was introduced in https://issues.apache.org/jira/browse/CASSANDRA-12109. The fix for the regression is implemented in https://issues.apache.org/jira/browse/CASSANDRA-14173. This fix is contained in the 3.11.2 release of Apache Cassandra. - The Apache Cassandra PMC - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: 答复: Time serial column family design
I disagree. Create date as a raw integer is an excellent surrogate for controlling time series "buckets" as it gives you complete control over the granularity. You can even have multiple granularities in the same table - remember that partition key "misses" in Cassandra are pretty lightweight as they won't make it past the bloom filter on the read path. On Wed, Apr 18, 2018 at 10:00 AM, Javier Pareja wrote: > Hi David, > > Could you describe why you chose to include the create date in the > partition key? If the vin in enough "partitioning", meaning that the size > (number of rows x size of row) of each partition is less than 100MB, then > remove the date and just use the create_time, because the date is already > included in that column anyways. > > For example if columns "a" and "b" (from your table) are of max 256 UTF8 > characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows > per partition. You can actually have many more but you don't want to go > much higher for performance reasons. > > If this is not enough you could use create_month instead of create_date, > for example, to reduce the partition size while not being too granular. > > > On Tue, 17 Apr 2018, 22:17 Nate McCall, wrote: > >> Your table design will work fine as you have appropriately bucketed by an >> integer-based 'create_date' field. >> >> Your goal for this refactor should be to remove the "IN" clause from your >> code. This will move the rollup of multiple partition keys being retrieved >> into the client instead of relying on the coordinator assembling the >> results. You have to do more work and add some complexity, but the trade >> off will be much higher performance as you are removing the single >> coordinator as the bottleneck. >> >> On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni >> wrote: >> >>> Hi Nate, >>> >>> Thanks for your reply! >>> >>> Is there other way to design this table to meet this requirement? >>> >>> >>> >>> Best Regards, >>> >>> >>> >>> 倪项菲*/ **David Ni* >>> >>> 中移德电网络科技有限公司 >>> >>> Virtue Intelligent Network Ltd, co. >>> >>> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei >>> >>> Mob: +86 13797007811|Tel: + 86 27 5024 2516 >>> >>> >>> >>> *发件人:* Nate McCall >>> *发送时间:* 2018年4月17日 7:12 >>> *收件人:* Cassandra Users >>> *主题:* Re: Time serial column family design >>> >>> >>> >>> >>> >>> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in >>> (20180416, 20180415, 20180414, 20180413, 20180412….); >>> >>> But this cause the cql query is very long,and I don’t know whether there >>> is limitation for the length of the cql. >>> >>> Please give me some advice,thanks in advance. >>> >>> >>> >>> Using the SELECT ... IN syntax means that: >>> >>> - the driver will not be able to route the queries to the nodes which >>> have the partition >>> >>> - a single coordinator must scatter-gather the query and results >>> >>> >>> >>> Break this up into a series of single statements using the executeAsync >>> method and gather the results via something like Futures in Guava or >>> similar. >>> >> >> >> >> -- >> - >> Nate McCall >> Wellington, NZ >> @zznate >> >> CTO >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> > -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: 答复: Time serial column family design
Your table design will work fine as you have appropriately bucketed by an integer-based 'create_date' field. Your goal for this refactor should be to remove the "IN" clause from your code. This will move the rollup of multiple partition keys being retrieved into the client instead of relying on the coordinator assembling the results. You have to do more work and add some complexity, but the trade off will be much higher performance as you are removing the single coordinator as the bottleneck. On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni wrote: > Hi Nate, > > Thanks for your reply! > > Is there other way to design this table to meet this requirement? > > > > Best Regards, > > > > 倪项菲*/ **David Ni* > > 中移德电网络科技有限公司 > > Virtue Intelligent Network Ltd, co. > > Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei > > Mob: +86 13797007811|Tel: + 86 27 5024 2516 > > > > *发件人:* Nate McCall > *发送时间:* 2018年4月17日 7:12 > *收件人:* Cassandra Users > *主题:* Re: Time serial column family design > > > > > > Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in > (20180416, 20180415, 20180414, 20180413, 20180412….); > > But this cause the cql query is very long,and I don’t know whether there > is limitation for the length of the cql. > > Please give me some advice,thanks in advance. > > > > Using the SELECT ... IN syntax means that: > > - the driver will not be able to route the queries to the nodes which have > the partition > > - a single coordinator must scatter-gather the query and results > > > > Break this up into a series of single statements using the executeAsync > method and gather the results via something like Futures in Guava or > similar. > -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: Time serial column family design
> > > Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in > (20180416, 20180415, 20180414, 20180413, 20180412….); > > But this cause the cql query is very long,and I don’t know whether there > is limitation for the length of the cql. > > Please give me some advice,thanks in advance. > Using the SELECT ... IN syntax means that: - the driver will not be able to route the queries to the nodes which have the partition - a single coordinator must scatter-gather the query and results Break this up into a series of single statements using the executeAsync method and gather the results via something like Futures in Guava or similar.
Re: Mailing list server IPs
Hi Jacques, Thanks for bringing this up. I took a quick look through the INFRA project and saw a couple of resolved issues that might help: https://issues.apache.org/jira/browse/INFRA-6584?jql=project%20%3D%20INFRA%20AND%20text%20~%20%22mail%20server%20whitelist%22 If those don't do it for you, please open a new issue with INFRA. On Sat, Apr 14, 2018 at 1:19 AM, Jacques-Henri Berthemet < jacques-henri.berthe...@genesys.com> wrote: > I checked with IT and I missed an email on the period where I got the last > bounce. It’s not a very big deal but I’d like to have it fixed if > possible. > > > > Gmail servers are very picky on SMTP traffic and reject a lot of things. > > > > *--* > > *Jacques-Henri Berthemet* > > > > *From:* Nicolas Guyomar [mailto:nicolas.guyo...@gmail.com] > *Sent:* Friday, April 13, 2018 3:15 PM > *To:* user@cassandra.apache.org > *Subject:* Re: Mailing list server IPs > > > > Hi, > > > > I receive similar messages from time to time, and I'm using Gmail ;) I > believe I never missed a mail on the ML and that you can safely ignore this > message > > > > On 13 April 2018 at 15:06, Jacques-Henri Berthemet < > jacques-henri.berthe...@genesys.com> wrote: > > Hi, > > > > I’m getting bounce messages from the ML from time to time, see attached > example. Our IT told me that they need to whitelist all IPs used by > Cassandra ML server. Is there a way to get those IPs? > > > > Sorry if it’s not really related to Cassandra itself but I didn’t find > anything in http://untroubled.org/ezmlm/ezman/ezman5.html commands. > > > > Regards, > > -- > > Jacques-Henri Berthemet > > > > -- Forwarded message -- > From: "user-h...@cassandra.apache.org" > To: Jacques-Henri Berthemet > Cc: > Bcc: > Date: Fri, 6 Apr 2018 20:47:22 + > Subject: Warning from user@cassandra.apache.org > Hi! This is the ezmlm program. I'm managing the > user@cassandra.apache.org mailing list. > > > Messages to you from the user mailing list seem to > have been bouncing. I've attached a copy of the first bounce > message I received. > > If this message bounces too, I will send you a probe. If the probe bounces, > I will remove your address from the user mailing list, > without further notice. > > > I've kept a list of which messages from the user mailing list have > bounced from your address. > > Copies of these messages may be in the archive. > To retrieve a set of messages 123-145 (a maximum of 100 per request), > send a short message to: > > > To receive a subject and author list for the last 100 or so messages, > send a short message to: > > > Here are the message numbers: > >60535 >60536 >60548 > > --- Enclosed is a copy of the bounce message I received. > > Return-Path: <> > Received: (qmail 8848 invoked for bounce); 27 Mar 2018 14:22:11 - > Date: 27 Mar 2018 14:22:11 -0000 > From: mailer-dae...@apache.org > To: user-return-605...@cassandra.apache.org > Subject: failure notice > > > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > > > -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: OOM after a while during compacting
> > > - Heap size is set to 8GB > - Using G1GC > - I tried moving the memtable out of the heap. It helped but I still got > an OOM last night > - Concurrent compactors is set to 1 but it still happens and also tried > setting throughput between 16 and 128, no changes. > That heap size is way to small for G1GC. Switch back to the defaults with CMS. IME, G1 needs > 20g for *just* the JVM to see improvements (but this also depends on workload and a few other factors). Stick with the CMS defaults unless you have some evidence-based experiment to try. Also worth noting that with a 1TB gp2 EBS volume, you only have 3k IOPS to play with before you are subject to rate limiting. If you allocate a volume greater than 3.33TB, you get 10K IOPS and the rate limiting goes away (you can see this playing around with the EBS sizing in the AWS calculator: http://calculator.s3.amazonaws.com/index.html). Another common mistake here is accidentally putting the commitlog on the boot volume which has a super low amount of IOPS given it's 64g (?iirc) by default.
Re: cassl 2.1.x seed node update via JMX
This capability was *just* added in CASSANDRA-14190 and only in trunk. Previously (as described in the ticket above), the seed node list is only updated when doing a shadow round, removing an endpoint or restarting (look for callers of o.a.c.gms.Gossiper#buildSeedsList() if you're curious). A rolling restart is the usual SOP for that. On Fri, Mar 23, 2018 at 9:54 AM, Carl Mueller wrote: > We have a cluster that is subject to the one-year gossip bug. > > We'd like to update the seed node list via JMX without restart, since our > foolishly single-seed-node in this forsaken cluster is being autoculled in > AWS. > > Is this possible? It is not marked volatile in the Config of the source > code, so I doubt it. > -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: Migration of keyspace to another new cluster
> Hi, > We got a requirement to migrate only one keyspace data from one cluster to > other cluster. And we no longer need the old cluster anymore. Can you > suggest what are the best possible ways we can achieve it. > > Regards > Goutham Reddy > Temporarily treat the new cluster as a new datacenter for the current cluster and follow the process for adding a datacenter for that keyspace. When complete remove the old datacenter/cluster similarly.
Re: Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes?
> We do archiving data in Order to make assumptions on it in future. So, yes > we expect to grow continously. In the mean time I learned to go for > predictable grow per partition rather than unpredictable large > partitioning. So today we are growing 250.000.000 Records per Day going > into a single table and heading towards to about 100 times that number this > year. A Partition will grow one Record a Day, which should give us good > horizontal scaleability, but means 250.000.000 to 25.000.000.000 > partitions. Hope this Numbers should not make me feel uncomfortable :) > There will be some additional tuning to do at around ~200 million partitions per table per node. Specifically bloom filters and index summaries. Depending on partition size and read access patterns, tuning compression settings will have a big effect as well given the volume.
Re: Hints folder missing in Cassandra
> The environment is built using established images for Cassandra 3.10. > Unfortunately the debug log does not indicate any errors before I start > seeing the WARN for missing hints folder. I understand that hints file will > be deleted after replay is complete, but not sure of the root cause of why > the hints folder is getting deleted. > When I look at the nodetool status or nodetool ring - it indicates that > all nodes are up and running in normal state, no node went down. Also, I do > not see anything the debug logs indicating that a node went down. In such a > scenario, I am not sure why would HintsWriterExecutor would get triggered. > > That error code (O_RDONLY) in the log message indicates that the hints folder has had its permission bits set to read only. We've had several issues with some of the tools doing this type of thing when they are run as the root user. Is this specific node one on which you use any of the tools like sstableloader or similar? If so, are you running them as root? Another thought - if it is on a different partition than the data directory, is there free space left on the underlying device holding: /var/lib/cassandra/hints? -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: Setting min_index_interval to 1?
> > > Another was the crazy idea I started with of setting min_index_interval to > 1. My guess was that this would cause it to read all index entries, and > effectively have them all cached permanently. And it would read them > straight out of the SSTables on every restart. Would this work? Other than > probably causing a really long startup time, are there issues with this? > > I've never tried that. It sounds like you understand the potential impact on memory and startup time. If you have the data in such a way that you can easily experiment, I would like to see a breakdown of the impact on response time vs. memory usage as well as where the point of diminishing returns is on turning this down towards 1 (I think there will be a sweet spot somewhere).
Re: What happens if multiple processes send create table if not exist statement to cassandra?
> Thanks a lot for that explanation Jeff!! I am trying to see if there is > any JIRA ticket that talks about incorporating LWT in scenarios you > mentioned? > https://issues.apache.org/jira/browse/CASSANDRA-10699
Re: Upgrade to 3.11.1 give SSLv2Hello is disabled error
> > We use Oracle jdk1.8.0_152 on all nodes and as I understand oracle use a > dot in the protocol name (TLSv1.2) and I use the same protocol name and > cipher names in the 3.0.14 nodes and the one I try to upgrade to 3.11.1. > I agree with Stefan's assessment and share his confusion. Would you be willing to add the following to the startup options with the explicitly configured "TLSv1.2" and post the results? -Djavax.net.debug=ssl That should provide additional detail on the SSL handshake.
Re: 3.0.15 or 3.11.1
> > Can you please provide dome JIRAs for superior fixes and performance > improvements which are present in 3.11.1 but are missing in 3.0.15. > > For the security conscious, CASSANDRA-11695 allows you to use Cassandra's authentication and authorization to lock down JMX/nodetool access instead of relying on per-node configuration. -- ----- Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: NVMe SSD benchmarking with Cassandra
> > > > > In regards to setting read ahead, how is this set for nvme drives? Also, > below is our compression settings for the table… It’s the same as our tests > that we are doing against SAS SSDs so I don’t think the compression > settings would be the issue… > > > Check blockdev --report between the old and the new servers to see if there is a difference. Are there other deltas in the disk layouts between the old and new servers (ie. LVM, mdadm, etc.)? You can control read ahead via 'blockdev --setra' or via poking the kernel: /sys/block/[YOUR DRIVE]/queue/read_ahead_kb In both cases, changes are instantaneous so you can do it on a canary and monitor for effect. Also, i'd be curious to know (since you have this benchmark setup) if you got the degradation you are currently seeing if you set concurrent_reads and concurrent_writes back to their defaults. -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: Cassandra proxy to control read/write throughput
The following presentation describes in detail a technique for using coordinator-only nodes which will give you similar behavior (particularly slides 12 to 14): https://www.slideshare.net/DataStax/optimizing-your- cluster-with-coordinator-nodes-eric-lubow-simplereach-cassandra-summit-2016 On Thu, Oct 26, 2017 at 12:07 PM, AI Rumman wrote: > Hi, > > I am using different versions of Casandra in my environment where I have > 60 nodes are running for different applications. Each application is > connecting to its own cluster. I am thinking about abstracting the > Cassandra IP from app drivers. > App will communicate to one proxy IP which will redirect traffic to > appropriate Cassandra cluster. The reason behind this thinking is to merge > multiple clusters and control the read/write throughput from proxy based on > the application. > If anyone knows about pg_bouncer for Postgresql, I am thinking something > similar to that. > Have anyone worked in such a project? Can you please share some idea? > > Thanks. > -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: Understanding Messages in the Debug.log
> > > The message in the debug log is > > DEBUG [GossipStage:1] 2017-09-21 09:19:52,627 FailureDetector.java:456 - > Ignoring interval time of 2000275419 > > > Did you truncate the log message? There should be and "for [endpoint]" on the end which should help you narrow things down to a set of problem nodes. I agree with Jeff in that this is most likely NTP sync issue or network flap, though.
Re: system_auth replication factor in Cassandra 2.1
Regardless, if you are not modifying users frequently (with five you most likely are not), make sure turn the permission cache wyyy up. In 2.1 that is just: permissions_validity_in_ms (default is 2000 or 2 seconds). Feel free to set it to 1 day or some such. The corresponding async update parameter (permissions_update_interval_in_ms) can be set to a slightly smaller value. If you really need to, you can drop the cache via the "invalidate" operation on the "org.apache.cassandra.auth:type=PermissionsCache" mbean (on each node) to revoke a user for example. In later versions, you would have to do the same with: - roles_validity_in_ms - credentials_validity_in_ms and their corresponding 'interval' parameters.
Re: Cassandra All host(s) tried for query failed (no host was tried)
If these app instances sit idle for a while, they might just be timing out their sockets. You can tweak socket settings on the driver as described here: https://github.com/datastax/java-driver/tree/3.x/manual/socket_options Perhaps start with explicitly setting keepAlive to true as that may or may not be set depending on whether it's using the native epoll extension or NIO directly (more details about such on the page above). On Thu, Aug 31, 2017 at 3:10 AM, Ivan Iliev wrote: > Hello everyone, > > We are using Cassandra 3.9 for storing quite a lot of data produced from > our tester machines. > > Occasionally, we are seeing issues with apps not being able to communicate > with Cassandra nodes, returning the following errors (captured in > servicemix logs): > >> by: com.datastax.driver.core.exceptions.NoHostAvailableException: All >> host(s) tried for query failed (no host was tried) >> at com.datastax.driver.core.RequestHandler.reportNoMoreHosts( >> RequestHandler.java:218) >> at com.datastax.driver.core.RequestHandler.access$1000( >> RequestHandler.java:43) >> at com.datastax.driver.core.RequestHandler$SpeculativeExecution. >> sendRequest(RequestHandler.java:284) >> at com.datastax.driver.core.RequestHandler.startNewExecution( >> RequestHandler.java:115) >> at com.datastax.driver.core.RequestHandler.sendRequest( >> RequestHandler.java:91) >> at com.datastax.driver.core.SessionManager.executeAsync( >> SessionManager.java:132) >> ... 107 more > > > As a result, apps that try to send data to cassandra get crashed due to > running out of memory and we have to restart the containers in which they > run. > > So far I have not been able to identify what might be the cause for this > as nothing (at least I could not find anything relevant on the timestamps) > in the cassandra debug and system logs. > > Could you share some insight on this ? What to check and where to start > from , in order to troubleshoot this. > > Thanks ! > Ivan > -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: Cassandra seems slow when having many read operations
On Sat, Jul 15, 2017 at 6:37 AM, Felipe Esteves < felipe.este...@b2wdigital.com> wrote: > > One point I've noticed, is that Opscenter show "OS: Disk Latency" max with > high values when the problem occurs, but it doesn't reflect in server > directly monitoring, in these tools the IO and latency of disks seems ok. > YMMV, but I've seen something like this due to an issue balancing IRQ interrupts on older 3 series kernels. Check the output of 'cat /proc/interrupts' and make sure the interrupts for the disks and network driver(s) particularly are not contending. This article explains the issue in detail (as well as how to fix it): http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux
Re: Unbalanced cluster
You wouldnt have a build file laying around for that, would you? On Tue, Jul 11, 2017 at 3:23 PM, Nate McCall wrote: > On Tue, Jul 11, 2017 at 3:20 AM, Avi Kivity wrote: > >> >> >> >> [1] https://github.com/avikivity/shardsim >> > > Avi, that's super handy - thanks for posting. > -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: Unbalanced cluster
On Tue, Jul 11, 2017 at 3:20 AM, Avi Kivity wrote: > > > > [1] https://github.com/avikivity/shardsim > Avi, that's super handy - thanks for posting.
Re: commitlog_total_space_in_mb tuning
> > > We're running with 128G memory and 30G heap size. Maybe it's good idea > to increase the commitlog_total_space. On the other hand, even with 8G > commitlog_total_space, replaying CL after restart takes more than 5 > minutes. > > In our case, the actual problem is it's causing lots of read repair > timeouts as the repair mutations are dropped. Which causes Cassandra JVM > hang or sometimes crash. > Do you have a mix of a small number of really heavily written to tables and a larger number of tables with fewer writes? One thing I've had success with when waitingOnSegmentAllocation spiked is setting memtable_flush_period_in_ms on the less busy tables (obviously not all the same so you don't flush storm). This seems to keep the block-and-tackle CL rotation cleaner with fewer tables to flush.
Re: Definition of QUORUM consistency level
> We have CL.TWO. > > > This was actually the original motivation for CL.TWO and CL.THREE if memory serves: https://issues.apache.org/jira/browse/CASSANDRA-2013
Re: Definition of QUORUM consistency level
> > > So, for the quorum, what we really want is that there is one overlap among >> the nodes in write path and read path. It actually was my assumption for a >> long time that we need (N/2 + 1) for write and just need (N/2) for read, >> because it's enough to provide the strong consistency. >> > > You are write about ... > *right (lol!).
Re: Definition of QUORUM consistency level
> So, for the quorum, what we really want is that there is one overlap among > the nodes in write path and read path. It actually was my assumption for a > long time that we need (N/2 + 1) for write and just need (N/2) for read, > because it's enough to provide the strong consistency. > You are write about strong consistency with that calculation, but if I want to issue a QUORUM read just by itself, I would expect a majority of nodes to reply. How it was written might be immaterial to my use case of reading 'from a majority.' -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: Order by for aggregated values
> > > My application is a real-time application. It monitors devices in the > network and displays the top N devices for various parameters averaged over > a time period. A query may involve anywhere from 10 to 50k devices, and > anywhere from 5 to 2000 intervals. We expect a query to take less than 2 > seconds. > > > > My impression was that Spark is aimed at larger scale analytics. > > > > I am ok with the limitation on “group by”. I am intending to use async > queries and token-aware load balancing to partition the query and execute > it in parallel on each node. > > > This sounds a lot more like a use case for a streaming system (run in parallel with Cassandra). Apache Flink might be one avenue to explore - their Cassandra integration works fine, btw. A lot of folks are doing similar things with Apache Beam as well as it has quite an elegant paradigm for the use case you describe, particularly if you need to combine batching with streaming. (FYI, their "CassandraIO" is about to be merged in master: https://github.com/apache/beam/pull/592#issuecomment-306618338). -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: hanging validation compaction
terator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100) >> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowI >> terator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32) >> org.apache.cassandra.utils.AbstractIterator.hasNext(Abstract >> Iterator.java:47) >> org.apache.cassandra.utils.MergeIterator$Candidate.advance( >> MergeIterator.java:374) >> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance( >> MergeIterator.java:186) >> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNe >> xt(MergeIterator.java:155) >> org.apache.cassandra.utils.AbstractIterator.hasNext(Abstract >> Iterator.java:47) >> org.apache.cassandra.db.rows.UnfilteredRowIterators$Unfilter >> edRowMergeIterator.computeNext(UnfilteredRowIterators.java:500) >> org.apache.cassandra.db.rows.UnfilteredRowIterators$Unfilter >> edRowMergeIterator.computeNext(UnfilteredRowIterators.java:360) >> org.apache.cassandra.utils.AbstractIterator.hasNext(Abstract >> Iterator.java:47) >> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133) >> org.apache.cassandra.db.rows.UnfilteredRowIterators.digest(U >> nfilteredRowIterators.java:178) >> org.apache.cassandra.repair.Validator.rowHash(Validator.java:221) >> org.apache.cassandra.repair.Validator.add(Validator.java:160) >> org.apache.cassandra.db.compaction.CompactionManager.doValid >> ationCompaction(CompactionManager.java:1364) >> org.apache.cassandra.db.compaction.CompactionManager.access$ >> 700(CompactionManager.java:85) >> org.apache.cassandra.db.compaction.CompactionManager$13. >> call(CompactionManager.java:933) >> java.util.concurrent.FutureTask.run(FutureTask.java:266) >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> java.util.concurrent.FutureTask.run(FutureTask.java:266) >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1142) >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:617) >> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$ >> threadLocalDeallocator$0(NamedThreadFactory.java:79) >> org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$5/1371495133.run(Unknown >> Source) >> java.lang.Thread.run(Thread.java:745) >> >> On Thu, 2017-04-13 at 08:47 +0200, benjamin roth wrote: >> >> You should connect to the node with JConsole and see where the compaction >> thread is stuck >> >> 2017-04-13 8:34 GMT+02:00 Roland Otta : >> >> hi, >> >> we have the following issue on our 3.10 development cluster. >> >> we are doing regular repairs with thelastpickle's fork of creaper. >> sometimes the repair (it is a full repair in that case) hangs because >> of a stuck validation compaction >> >> nodetool compactionstats gives me >> a1bb45c0-1fc6-11e7-81de-0fb0b3f5a345 Validation bds ad_event >> 805955242 841258085 bytes 95.80% >> we have here no more progress for hours >> >> nodetool tpstats shows >> alidationExecutor1 1 16186 0 >>0 >> >> i checked the logs on the affected node and could not find any >> suspicious errors. >> >> anyone that already had this issue and knows how to cope with that? >> >> a restart of the node helps to finish the repair ... but i am not sure >> whether that somehow breaks the full repair >> >> bg, >> roland >> >> >> > -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: UNSUBSCRIBE
To unsubscribe from this list, please send an email to user-unsubscr...@cassandra.apache.org Thanks! On Wed, Apr 12, 2017 at 6:37 AM, Lawrence Turcotte < lawrence.turco...@gmail.com> wrote: > UNSUBSCRIBE >
Re: Unsubscribe
Hi John, Please send an email to user-unsubscr...@cassandra.apache.org to unsubscribe from this list. On Fri, Apr 7, 2017 at 8:58 AM, John Buczkowski wrote: > *From:* eugene miretsky [mailto:eugene.miret...@gmail.com] > *Sent:* Thursday, April 06, 2017 4:36 PM > *To:* user@cassandra.apache.org > *Subject:* Why are automatic anti-entropy repairs required when hinted > hand-off is enabled? > > > > Hi, > > > > As I see it, if hinted handoff is enabled, the only time data can be > inconsistent is when: > >1. A node is down for longer than the max_hint_window >2. The coordinator node crushes before all the hints have been replayed > > Why is it still recommended to perform frequent automatic repairs, as well > as enable read repair? Can't I just run a repair after one of the nodes is > down? The only problem I see with this approach is a long repair job > (instead of small incremental repairs). But other than that, are there any > other issues/corner-cases? > > > > Cheers, > > Eugene >
Re: [Cassandra 3.0.9] Cannot allocate memory
On Thu, Mar 23, 2017 at 11:18 AM, Abhishek Kumar Maheshwari < abhishek.maheshw...@timesinternet.in> wrote: > JVM config is as below: > > > > -Xms16G > > -Xmx16G > > -Xmn3000M > > > I don't think it is the cause, but you need to remove Xmn when using G1GC.
Re: Scrubbing corrupted SStable.
The snapshots are hard links on the file system, so everything is included. You can use the "--no-snapshot" option to disable snapshots. On Tue, Mar 21, 2017 at 5:01 PM, Pranay akula wrote: > > I am trying to scrub a Column family using nodetool scrub, is it going to create snapshots for sstables which are corrupted or for all the sstables it is going to scrub ?? and to remove snapshots created does running nodetool clearsnapshot is enough or do i need to manually delete pre-scrub data from snapshots of that Column family ?? > > I can see significant increase in Data after starting scrub. > > > > > Thanks > Pranay. -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: ONE has much higher latency than LOCAL_ONE
On Wed, Mar 22, 2017 at 12:48 PM, Shannon Carey wrote: > > The cluster is in two DCs, and yes the client is deployed locally to each DC. First off, what is the goal of using ONE instead of LOCAL_ONE? If it's failover, this could be addressed with a RetryPolicy starting wth LOCAL_ONE and falling back to ONE. Are you using the ".withLocalDc" option in the DCAwareRoundRobinPolicy builder? (It's been a while since I've gone through this in detail, though). If you could provide a snippet that included the complete options passed to the builder that might be helpful. Also, check for the complete forms of these two logging messages on the app side during startup (the second one is at INFO so adjust if needed): "Some contact points don't match local data center. Local DC = {}. Non-conforming contact points: {}" "Using data-center name '{}' for DCAwareRoundRobinPolicy..." Make sure those line up with the cluster topology and your expectations. Actually, in typing that up, it may be more appropriate to move the conversation over here since this is probably driver specific: https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: ONE has much higher latency than LOCAL_ONE
On Wed, Mar 22, 2017 at 1:11 PM, Nate McCall wrote: > > > On Wed, Mar 22, 2017 at 12:48 PM, Shannon Carey > wrote: > > > > The cluster is in two DCs, and yes the client is deployed locally to > each DC. > > First off, what is the goal of using ONE instead of LOCAL_ONE? If it's > failover, this could be addressed with a RetryPolicy starting wth LOCAL_ONE > and falling back to ONE. > > Just read your previous thread about this. That's pretty un-intuitive and counter to the way I remember that working (though admittedly, it's been a while). Do please open a thread on the driver mailing list, i'm curious about the response.
Re: spikes in blocked native transport requests
See the details on: https://issues.apache.org/jira/browse/CASSANDRA-11363 You may need to add -Dcassandra.max_queued_native_transport_requests=4096 as a startup parameter. YMMV though, I suggest reading through the above to get a complete picture. On Mon, Mar 20, 2017 at 11:10 PM, Roland Otta wrote: > > well. i checked it now. > > we have some stw collections from 100 to 200ms every 5 to 60 seconds. > i am not sure whether the blocked threads are related to that but anyway these pauses are too long for low latency applications. > > so i wil check gc tuning first and will check afterwards whether the blocked threads still exist afterwards. > > > > On Mon, 2017-03-20 at 08:55 +0100, benjamin roth wrote: > > Did you check STW GCs? > You can do that with 'nodetool gcstats', by looking at the gc.log or observing GC related JMX metrics. > > 2017-03-20 8:52 GMT+01:00 Roland Otta : > > we have a datacenter which is currently used exlusively for spark batch > jobs. > > in case batch jobs are running against that environment we can see very > high peaks in blocked native transport requests (up to 10k / minute). > > i am concerned because i guess that will slow other queries (in case > other applications are going to use that dc as well). > > i already tried increasing native_transport_max_threads + > concurrent_reads without success. > > during the jobs i cant find any resource limitiations on my hardware > (iops, disk usage, cpu, ... is fine). > > am i missing something? any suggestions how to cope with that? > > br// > roland > > > > > -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: Grouping time series data into blocks of times
I think you would be better served by using a streaming system like Apache Flink (http://flink.apache.org) and checkpointing occasionally to Cassandra. This is a significant increase in complexity, but you are describing a real-time streaming use case with the need for watermarking time windows and Flink has that all built in.
Re: High disk io read load
> - Node A has 512 tokens and Node B 256. So it has double the load (data). > - Node A also has 2 SSDs, Node B only 1 SSD (according to load) > I very rarely see heterogeneous vnode counts in the same cluster. I would almost guarantee you are the only one doing this with MVs as well. That said, since you have different IO hardware, are you sure the system configurations (eg. block size, read ahead, etc) are the same on both machines? Is dstat showing a similar order of magnitude of network traffic in vs. IO for what you would expect? -- ----- Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: Cipher Suite Cassandra 2.1.14 Encryption
Is AES-GCM supported in python by default? I have a vague recollection that it is not (certainly possible my knowledge is outdated as well). On Wed, Dec 21, 2016 at 10:21 AM, Jacob Shadix wrote: > I was testing client encryption w/cqlsh and get the following error when > using TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 as the cipher. Any ideas why? > > Last error: _ssl.c:492: EOF occurred in violation of protocol")}) > -- Jacob Shadix > -- ----- Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: High CPU on nodes
https://issues.apache.org/jira/browse/CASSANDRA-6908 Disable DynamicSnitch by adding the following to cassandra.yaml (it is a not in the file by default): dynamic_snitch: false On Wed, Dec 21, 2016 at 8:40 AM, Anubhav Kale wrote: > CIL > > > > *From:* Alain RODRIGUEZ [mailto:arodr...@gmail.com] > *Sent:* Saturday, December 17, 2016 5:18 AM > *To:* user@cassandra.apache.org > *Subject:* Re: High CPU on nodes > > > > Hi, > > > > What does 'nodetool netstats' looks like on those nodes? > > > > *Its not doing any streaming.* > > > > we have 30GB heap > > > > How is the JVM / GC doing? Are you using G1GC or CMS? This setting would > be bad for CMS. > > > > *G1. GC is doing fine. I don’t see any long pauses beyond 200 ms.* > > > > You can use this tool to understand were the CPU is being used > https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md# > ttop-command > <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faragozin%2Fjvm-tools%2Fblob%2Fmaster%2Fsjk-core%2FCOMMANDS.md%23ttop-command&data=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cab2c0fcf99a447694b0908d4267f3036%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636175775106811606&sdata=R%2FouOelExm1C3okjg9zEJsdlCiDRrhy8%2B9n3SIqC4fg%3D&reserved=0> > . > > > > I hope that helps, > > > > C*heers, > > --- > > Alain Rodriguez - @arodream - al...@thelastpickle.com > > France > > > > The Last Pickle - Apache Cassandra Consulting > > http://www.thelastpickle.com > <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.thelastpickle.com&data=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cab2c0fcf99a447694b0908d4267f3036%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636175775106811606&sdata=kZPi%2B43OyWGNr%2FAmJsLflVOWkSMI0V7oK4x%2Ff%2FR27BU%3D&reserved=0> > > > > > > > > 2016-12-17 0:10 GMT+01:00 Anubhav Kale : > > Hello, > > > > I am trying to fight a high CPU problem on some of our nodes. Thread dumps > show that it’s not GC threads (we have 30GB heap), iostat %iowait confirms > it’s not disk (ranges between 0.3 – 0.9%). One of the ways in which the > problem manifests is that the nodes can’t compact SSTables and it happens > randomly. We run Cassandra 2.1.13 on Azure Premium Storage (network > attached SSDs). > > > > One of the sample threads that was taking high CPU shows : > > > > "pool-13-thread-1" #3352 > <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.datastax.com%2Fhc%2Frequests%2F3352&data=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cab2c0fcf99a447694b0908d4267f3036%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636175775106811606&sdata=OP%2FepExQP5HyrBitvVlyjCj4cVXpB0zc8Oj5TWapduY%3D&reserved=0> > prio=5 os_prio=0 tid=0x7f2275340bb0 nid=0x1b0b runnable > [0x7f33ffaae000] > java.lang.Thread.State: RUNNABLE > at java.util.TimSort.gallopRight(TimSort.java:632) > at java.util.TimSort.mergeLo(TimSort.java:739) > at java.util.TimSort.mergeAt(TimSort.java:514) > at java.util.TimSort.mergeCollapse(TimSort.java:441) > at java.util.TimSort.sort(TimSort.java:245) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at org.apache.cassandra.locator.DynamicEndpointSnitch. > sortByProximityWithScore(DynamicEndpointSnitch.java:163) > at org.apache.cassandra.locator.DynamicEndpointSnitch. > sortByProximityWithBadness(DynamicEndpointSnitch.java:200) > at org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximity( > DynamicEndpointSnitch.java:152) > at org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints( > StorageProxy.java:1581) > at org.apache.cassandra.service.StorageProxy.getRangeSlice( > StorageProxy.java:1739) > > > > Looking at code, I can’t figure out why things like this would require a > high CPU and I don’t find any JIRAs relating this as well. So, what can I > do next to troubleshoot this ? > > > > Thanks ! > > > -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: Cassandra Encryption
You should be using a root certificate for signing all the node certificates to create a trust chain. That way nodes won't have to explicitly know about each other, only the root certificate. This post has some details: http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html On Tue, Nov 22, 2016 at 9:07 PM, Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > yes, I am generating separate certificate for each node. > even if I use the same certificate how does it helps? > > On Mon, Nov 21, 2016 at 9:02 PM, Vladimir Yudovin > wrote: > >> Hi Jai, >> >> so do you generate separate certificate for each node? Why not use one >> certificate for all nodes? >> >> Best regards, Vladimir Yudovin, >> >> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud >> CassandraLaunch your cluster in minutes.* >> >> >> On Mon, 21 Nov 2016 17:25:11 -0500*Jai Bheemsen Rao Dhanwada >> >* wrote >> >> Hello, >> >> I am setting up encryption on one of my cassandra cluster using the below >> procedure. >> >> server_encryption_options: >> internode_encryption: all >> keystore: /etc/keystore >> keystore_password: x >> truststore: /etc/truststore >> truststore_password: x >> >> http://docs.oracle.com/javase/6/docs/technotes/guides/securi >> ty/jsse/JSSERefGuide.html#CreateKeystore >> >> However, one difficulty with this approach is whenever I am adding a new >> node I had to rolling restart all the C* nodes in the cluster, so that the >> truststore is updated with the new server information. >> >> Is there a way to automatically trigger a reload so that the truststore >> is updated on the existing machines without restart. >> >> Can someone please help ? >> >> >> > -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: Client-side timeouts after dropping table
If you can get to them in the test env. you want to look in o.a.c.metrics.CommitLog for: - TotalCommitlogSize: if this hovers near commitlog_size_in_mb and never goes down, you are thrashing on segment allocation - WaitingOnCommit: this is the time spent waiting on calls to sync and will start to climb real fast if you cant sync within sync_interval - WaitingOnSegmentAllocation: how long it took to allocate a new commitlog segment, if it is all over the place it is IO bound Try turning all the commit log settings way down for low-IO test infrastructure like this. Maybe total commit log size of like 32mb with 4mb segments (or even lower depending on test data volume) so they basically flush constantly and don't try to hold any tables open. Also lower concurrent_writes substantially while you are at it to add some write throttling. On Wed, Sep 21, 2016 at 2:14 PM, John Sanda wrote: > I have seen in various threads on the list that 3.0.x is probably best for > prod. Just wondering though if there is anything in particular in 3.7 to be > weary of. > > I need to check with one of our QA engineers to get specifics on the > storage. Here is what I do know. We have a blade center running lots of > virtual machines for various testing. Some of those vm's are running > Cassandra and the Java web apps I previously mentioned via docker > containers. The storage is shared. Beyond that I don't have any more > specific details at the moment. I can also tell you that the storage can be > quite slow. > > I have come across different threads that talk to one degree or another > about the flush queue getting full. I have been looking at the code in > ColumnFamilyStore.java. Is perDiskFlushExecutors the thread pool I should > be interested in? It uses an unbounded queue, so I am not really sure what > it means for it to get full. Is there anything I can check or look for to > see if writes are getting blocked? > > On Tue, Sep 20, 2016 at 8:41 PM, Jonathan Haddad > wrote: > >> If you haven't yet deployed to prod I strongly recommend *not* using 3.7. >> >> >> What network storage are you using? Outside of a handful of highly >> experienced experts using EBS in very specific ways, it usually ends in >> failure. >> >> On Tue, Sep 20, 2016 at 3:30 PM John Sanda wrote: >> >>> I am deploying multiple Java web apps that connect to a Cassandra 3.7 >>> instance. Each app creates its own schema at start up. One of the schema >>> changes involves dropping a table. I am seeing frequent client-side >>> timeouts reported by the DataStax driver after the DROP TABLE statement is >>> executed. I don't see this behavior in all environments. I do see it >>> consistently in a QA environment in which Cassandra is running in docker >>> with network storage, so writes are pretty slow from the get go. In my logs >>> I see a lot of tables getting flushed, which I guess are all of the dirty >>> column families in the respective commit log segment. Then I seen a whole >>> bunch of flushes getting queued up. Can I reach a point in which too many >>> table flushes get queued such that writes would be blocked? >>> >>> >>> -- >>> >>> - John >>> >> > > > -- > > - John > -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: What cipher suites are support in Cassandra 3.7 ?
Your best bet is to use 256bit AES via "TLS_RSA_WITH_AES_256_CBC_SHA" since that is (usually) hardware accelerated on recent CPUs. The security page on the docs site has a lot of good information: http://cassandra.apache.org/doc/latest/operating/security.html The above contains a link to the following that is worth calling out directly based on your question: https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/FIPS.html If you want to know more about the implementation, the config eventually is passed through Netty's io.netty.handler.ssl.SslHandler ( https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/transport/Server.java#L367) which is itself well documented regarding connection lifecycle: https://netty.io/4.0/api/io/netty/handler/ssl/SslHandler.html On Sat, Sep 3, 2016 at 10:44 AM, Eric Ho wrote: > > I'm trying to enable SSL (internode + client). > But I need to specify the suites but I don't know which ones are supported by C*.. > Any pointers much appreciated. > thx > > -- > > -eric ho > -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: Issue in internode encryption in cassandra
> > > I am using internode encryption in cassandra, with self signed CA it works fine. but with other product CA m getting this error "Filtering out TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA as it isnt supported by the socket” > You've specified ECDHE_RSA as the cypher. This is a new-ish cypher based on elliptic curve cryptography and it may not be available to some distributions. Run "openssl ciphers ECDH" on the node and the client to ensure they both support that algorithm (my guess is one or the other won't). This article provides an excellent description of ECDH: https://vincent.bernat.im/en/blog/2011-ssl-perfect-forward-secrecy.html#diffie-hellman-with-elliptic-curves Unless you have a specific requirement, use "TLS_RSA_WITH_AES_256_CBC_SHA." -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: Re : Recommended procedure for enabling SSL on a live production cluster
If you migrate to the latest 2.1 first, you can make this a non-issue as 2.1.12 and above support simultaneous SSL and plain on the same port for exactly this use case: https://issues.apache.org/jira/browse/CASSANDRA-10559 On Thu, Jul 21, 2016 at 3:02 AM, sai krishnam raju potturi < pskraj...@gmail.com> wrote: > hi ; > if possible could someone shed some light on this. I followed a > post from the lastpickle which was very informative, but we had some > concerns when it came to enabling SSL on a live production cluster. > > > http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html > > 1 : We generally remove application traffic from a DC which has ongoing > changes, just not to affect end customers if things go south during the > update. > > 2 : So once DC-A has been restarted after enabling SSL, this would be > missing writes during that period, as the DC-A would be shown as down by > the other DC's. We will not be able to put back application traffic on DC-A > until we run inter-dc repairs, which will happen only when SSL has been > enabled on all DC's. > > 3 : Repeating the procedure for every DC will lead to some missed writes > across all DC's. > > 4 : We could do the rolling restart of a DC-A with application traffic on, > but we are concerned if for any infrastructure related reason we have an > issue, we will have to serve traffic from another DC-B, which might be > missing on writes to the DC-A during that period. > > We have 4 DC's which 50 nodes each. > > > thanks > Sai > > -- Forwarded message -- > From: sai krishnam raju potturi > Date: Mon, Jul 18, 2016 at 11:06 AM > Subject: Re : Recommended procedure for enabling SSL on a live production > cluster > To: user@cassandra.apache.org > > > Hi; > We have a Cassandra cluster ( version 2.0.14 ) spanning across 4 > datacenters with 50 nodes each. We are planning to enable SSL between the > datacenters. We are following the standard procedure for enabling SSL ( > http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html) > . We were planning to enable SSL for each datacenter at a time. > > During the rolling restart, it's expected that the nodes in the > datacenter that had the service restarted, will show as down by the nodes > in other datacenters that have not restarted the service. This would lead > to missed writes among various nodes during this procedure. > > What would be the recommended procedure for enabling SSL on a live > production cluster without the chaos. > > thanks > Sai > > -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: Question about hector api documentation
> I used to be surprised that people still ask about Hector here; and that > questions here on Hector always seem to mirror new Hector questions on > Stack Overflow. The problem (I think), is that places like Edureka! are > still charging people $300 for a Cassandra training class, where they still > actively teach people to use Hector: > > http://www.edureka.co/cassandra-course-curriculum > > I was wondering where these kept coming from... I shut that project down a year ago and had not taken a commit of any substance for three years. +1 on the Java-Driver with either: - the object mapper module: https://github.com/datastax/java-driver/tree/3.0/manual/object_mapper - or Achilles: https://github.com/doanduyhai/Achilles -- ----- Nate McCall Austin, TX @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: Lot's of hints, but only on a few nodes
The most immediate work-around would be to nodetool disablehints around the cluster before you load data. This would stop it snowballing from hints at least. On Tue, May 10, 2016 at 7:49 AM, Erik Forsberg wrote: > I have this situation where a few (like, 3-4 out of 84) nodes misbehave. > Very long GC pauses, dropping out of cluster etc. > > This happens while loading data (via CQL), and analyzing metrics it looks > like on these few nodes, a lot of hints are being generated close to the > time when they start to misbehave. > > Since this is Cassandra 2.0.13 which have a less than optimal hints > implementation, largs numbers of hints is a GC troublemaker. > > Again looking at metrics, it looks like hints are being generated for a > large number of nodes, so it doesn't look like the destination nodes are at > fault. So, I'm confused. > > Any Hints (pun intended) on what could cause a few nodes to generate more > hints than the rest of the cluster? > > Regards, > \EF > -- - Nate McCall Austin, TX @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: tuning repairs and compaction options
> > Hi, we are running a 9 node cluster under load. The nodes are running in > EC2 on i2.2xlarge instances. Cassandra version is 2.2.4. One node was down > yesterday for more than 3 hours. So we manually started an incremental > repair this morning via nodetool (anti-entropy repair?) > > What we can see is that user CPU on that node goes up to over 95% and also > goes up on all other nodes. Also the number of SSTables is exploding, I > guess due to anticompaction. > You might be seeing https://issues.apache.org/jira/browse/CASSANDRA-10342 (fixed in 2.2.6). > > What are my tuning options to have a more gentle repair behaviour? Which > settings should I look at if I want CPU to stay below 50% for instance. My > worry is always to impact the read/write performance during times when we > do anti-entropy repairs. > +1 on cassandra_range_repair script. -- - Nate McCall Austin, TX @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: In memory code and query executions
On Mon, May 2, 2016 at 11:04 AM, Corry Opdenakker wrote: > Hi all, > > Is it possible to execute queries towards an embedded cassandra db whyle > bypassing completely the TCP (or IPC) protocol stack? > tl,dr: it is not for the faint of heart and you must understand *exactly* what you are doing. First I have to ask is there something specific that is not working the way you anticipate? Short answer is yes, though: https://github.com/apache/cassandra/tree/cassandra-2.1/examples/client_only This was removed in > 2.1 because very few people were using it and it was confusing to have there as an "example." I would not call this embedded so much as running a "client-mode proxy" but same idea. Apparantly the embedded cassandra is by default accessed using localhost as > hostname which will result in an IPC optimized connection I assume. > Not quite sure what you mean here? > Is there a way to fully omit the Tcp/ipc stack and execute queries > directly in-memory at the cassandra database? preferrably in a (query > resultset -> to -> appcode) zero-copy approach. > > Again, yes per the link above, but you would need to modify a few things for recent versions. The general approach is there however. You could even go a level below QueryProcessor and invoke methods on StorageProxy directly, bypassing the parse/PS lookup. That all said, you need to understand: - These are all internal APIs and as such can and will change substantially without warning even between point releases - Understanding the internals to use them correctly at this level requires a deep understanding of the code base - You will be bypassing a substantial amount of validation and could easily insert data that will corrupt your table - You can potentially put a lot more pressure on portions of the system that anticipate upstream throttling In sum: it's possible, but put something in production first using standard APIs before you go this deep. This is not the level at which you want to write your first app against Cassandra. -- - Nate McCall Austin, TX @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com
Re: nodetool -h fails Connection refused
You need to set LOCAL_JMX=false It will then read the rest of this stanza: https://github.com/apache/cassandra/blob/cassandra-2.2/conf/cassandra-env.sh#L284-L288 Using the defaults above as-is, you will need to add JMX authentication. Details are here: https://docs.datastax.com/en/cassandra/2.2/cassandra/configuration/secureNodetoolSSL.html A lot of this can be controlled with system properties as well: http://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html The default config files for JMX authentication and access included in the JVM also have extensive details in the comments: $JAVA_HOME/jre/lib/management/jmxremote.access $JAVA_HOME/jre/lib/management/jmxremote.password.template On Tue, Apr 19, 2016 at 8:40 PM, Alaa Zubaidi (PDF) wrote: > Hi, > > I am trying to run nodetool remotely. but its not working: > I am running Cassandra 2.2.5 on CentOS 6. > listen_address: is set to > rpc_address: is set to 0.0.0.0 > broadcast_rpc_address: is set to > > I changed the following in cassadnra-env.sh > JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=" > -Dcom.sun.management.jmxremote.port=7199 > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.authenticate=false > > "nodetool -h -p status" results in: > failed to connect to 'hostname' - Connection Exception: 'Connection > refused" > > netstat -nl | grep 7199 > tcp00127.0.0.1:71990.0.0.0:*LISTEN > > ONLY "nodetool -h localhost" works > > Any idea how to fix it? > > Thanks, > Alaa > > > *This message may contain confidential and privileged information. If it > has been sent to you in error, please reply to advise the sender of the > error and then immediately permanently delete it and all attachments to it > from your systems. If you are not the intended recipient, do not read, > copy, disclose or otherwise use this message or any attachments to it. The > sender disclaims any liability for such unauthorized use. PLEASE NOTE that > all incoming e-mails sent to PDF e-mail accounts will be archived and may > be scanned by us and/or by external service providers to detect and prevent > threats to our systems, investigate illegal or inappropriate behavior, > and/or eliminate unsolicited promotional e-mails (“spam”). If you have any > concerns about this process, please contact us at * > *legal.departm...@pdf.com* *.* -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Experience with Kubernetes
> Does anybody here have any experience, positive or negative, with > deploying Cassandra (or DSE) clusters using Kubernetes? I don't have any > immediate need (or experience), but I am curious about the pros and cons. > > The last time I played around with kubernetes+cassandra, you could not specify node allocations across failure boundaries (AZs, Regions, etc). To me, that makes it not interesting outside of development or trivial setups. It does look like they are getting farther along on "ubernetes" which should fix this: https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/federation.md -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Unexplainably large reported partition sizes
> > > Rob, can you remember which bug/jira this was? I have not been able to > find it. > I'm using 2.1.9. > > https://issues.apache.org/jira/browse/CASSANDRA-7953 Rob may have a different one, but I've something similar from this issue. Fixed in 2.1.12. -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Unexpected high internode network activity
> > > Unfortunately, these numbers still don't match at all. > > And yes, the cluster is in a single DC and since I am using the EC2 > snitch, replicas are AZ aware. > > Are repairs running on the cluster? Other thoughts: - is internode_compression set to 'all' in cassandra.yaml (should be 'all' by default, but worth checking since you are using lz4 on the client)? - are you using server-to-server encryption ? You can compare the output of nodetool netstats on the test cluster with the AWS cluster as well to see if anything sticks out. -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Debugging write timeouts on Cassandra 2.2.5
> Testing the same write path using CQL writes instead demonstrates similar behavior. Was this via Java-Driver or the thrift execute_cql_query? If the latter, what happens when you change the rpc_server_type to sync? This line in tpstats is real weird: https://gist.github.com/mheffner/a979ae1a0304480b052a#file-tpstats-out-L22 On Wed, Feb 24, 2016 at 6:04 PM, Mike Heffner wrote: > > Nate, > > So we have run several install tests, bisecting the 2.1.x release line, and we believe that the regression was introduced in version 2.1.5. This is the first release that clearly hits the timeout for us. > > It looks like quite a large release, so our next step will likely be bisecting the major commits to see if we can narrow it down: https://github.com/apache/cassandra/blob/3c0a337ebc90b0d99349d0aa152c92b5b3494d8c/CHANGES.txt. Obviously, any suggestions on potential suspects appreciated. > > These are the memtable settings we've configured diff from the defaults during our testing: > > memtable_allocation_type: offheap_objects > memtable_flush_writers: 8 > > > Cheers, > > Mike > > On Fri, Feb 19, 2016 at 1:46 PM, Nate McCall wrote: >> >> The biggest change which *might* explain your behavior has to do with the changes in memtable flushing between 2.0 and 2.1: >> https://issues.apache.org/jira/browse/CASSANDRA-5549 >> >> However, the tpstats you posted shows no dropped mutations which would make me more certain of this as the cause. >> >> What values do you have right now for each of these (my recommendations for each on a c4.2xl with stock cassandra-env.sh are in parenthesis): >> >> - memtable_flush_writers (2) >> - memtable_heap_space_in_mb (2048) >> - memtable_offheap_space_in_mb (2048) >> - memtable_cleanup_threshold (0.11) >> - memtable_allocation_type (offheap_objects) >> >> The biggest win IMO will be moving to offheap_objects. By default, everything is on heap. Regardless, spending some time tuning these for your workload will pay off. >> >> You may also want to be explicit about >> >> - native_transport_max_concurrent_connections >> - native_transport_max_concurrent_connections_per_ip >> >> Depending on the driver, these may now be allowing 32k streams per connection(!) as detailed in v3 of the native protocol: >> https://github.com/apache/cassandra/blob/cassandra-2.1/doc/native_protocol_v3.spec#L130-L152 >> >> >> >> On Fri, Feb 19, 2016 at 8:48 AM, Mike Heffner wrote: >>> >>> Anuj, >>> >>> So we originally started testing with Java8 + G1, however we were able to reproduce the same results with the default CMS settings that ship in the cassandra-env.sh from the Deb pkg. We didn't detect any large GC pauses during the runs. >>> >>> Query pattern during our testing was 100% writes, batching (via Thrift mostly) to 5 tables, between 6-1500 rows per batch. >>> >>> Mike >>> >>> On Thu, Feb 18, 2016 at 12:22 PM, Anuj Wadehra wrote: >>>> >>>> Whats the GC overhead? Can you your share your GC collector and settings ? >>>> >>>> >>>> Whats your query pattern? Do you use secondary indexes, batches, in clause etc? >>>> >>>> >>>> Anuj >>>> >>>> >>>> Sent from Yahoo Mail on Android >>>> >>>> On Thu, 18 Feb, 2016 at 8:45 pm, Mike Heffner >>>> wrote: >>>> Alain, >>>> >>>> Thanks for the suggestions. >>>> >>>> Sure, tpstats are here: https://gist.github.com/mheffner/a979ae1a0304480b052a. Looking at the metrics across the ring, there were no blocked tasks nor dropped messages. >>>> >>>> Iowait metrics look fine, so it doesn't appear to be blocking on disk. Similarly, there are no long GC pauses. >>>> >>>> We haven't noticed latency on any particular table higher than others or correlated around the occurrence of a timeout. We have noticed with further testing that running cassandra-stress against the ring, while our workload is writing to the same ring, will incur similar 10 second timeouts. If our workload is not writing to the ring, cassandra stress will run without hitting timeouts. This seems to imply that our workload pattern is causing something to block cluster-wide, since the stress tool writes to a different keyspace then our workload. >>>> >>>> I mentioned in another reply that we've tracked it to something between 2.0.x and 2.1.x, so we are focusing on narrowing which point release it was introduced in. >>>> >>>> Cheers, >>>> >>>> Mike
Re: Debugging write timeouts on Cassandra 2.2.5
est_timeout_in_ms (10 seconds). CPU across >>>>>> the cluster >>>>>> >>>> is < 10% and EBS write load is < 100 IOPS. Cassandra is running >>>>>> with the >>>>>> >>>> Oracle JDK 8u60 and we're using G1GC and any GC pauses are less >>>>>> than 500ms. >>>>>> >>>> >>>>>> >>>> We run on c4.2xl instances with GP2 EBS attached storage for >>>>>> data and >>>>>> >>>> commitlog directories. The nodes are using EC2 enhanced >>>>>> networking and have >>>>>> >>>> the latest Intel network driver module. We are running on HVM >>>>>> instances >>>>>> >>>> using Ubuntu 14.04.2. >>>>>> >>>> >>>>>> >>>> Our schema is 5 tables, all with COMPACT STORAGE. Each table is >>>>>> similar >>>>>> >>>> to the definition here: >>>>>> >>>> https://gist.github.com/mheffner/4d80f6b53ccaa24cc20a >>>>>> >>>> >>>>>> >>>> This is our cassandra.yaml: >>>>>> >>>> >>>>>> https://gist.github.com/mheffner/fea80e6e939dd483f94f#file-cassandra-yaml >>>>>> >>>> >>>>>> >>>> Like I mentioned we use 8u60 with G1GC and have used many of the >>>>>> GC >>>>>> >>>> settings in Al Tobey's tuning guide. This is our upstart config >>>>>> with JVM and >>>>>> >>>> other CPU settings: >>>>>> https://gist.github.com/mheffner/dc44613620b25c4fa46d >>>>>> >>>> >>>>>> >>>> We've used several of the sysctl settings from Al's guide as >>>>>> well: >>>>>> >>>> https://gist.github.com/mheffner/ea40d58f58a517028152 >>>>>> >>>> >>>>>> >>>> Our client application is able to write using either Thrift >>>>>> batches >>>>>> >>>> using Asytanax driver or CQL async INSERT's using the Datastax >>>>>> Java driver. >>>>>> >>>> >>>>>> >>>> For testing against Thrift (our legacy infra uses this) we write >>>>>> batches >>>>>> >>>> of anywhere from 6 to 1500 rows at a time. Our p99 for batch >>>>>> execution is >>>>>> >>>> around 45ms but our maximum (p100) sits less than 150ms except >>>>>> when it >>>>>> >>>> periodically spikes to the full 10seconds. >>>>>> >>>> >>>>>> >>>> Testing the same write path using CQL writes instead demonstrates >>>>>> >>>> similar behavior. Low p99s except for periodic full timeouts. We >>>>>> enabled >>>>>> >>>> tracing for several operations but were unable to get a trace >>>>>> that completed >>>>>> >>>> successfully -- Cassandra started logging many messages as: >>>>>> >>>> >>>>>> >>>> INFO [ScheduledTasks:1] - MessagingService.java:946 - _TRACE >>>>>> messages >>>>>> >>>> were dropped in last 5000 ms: 52499 for internal timeout and 0 >>>>>> for cross >>>>>> >>>> node timeout >>>>>> >>>> >>>>>> >>>> And all the traces contained rows with a "null" source_elapsed >>>>>> row: >>>>>> >>>> >>>>>> https://gist.githubusercontent.com/mheffner/1d68a70449bd6688a010/raw/0327d7d3d94c3a93af02b64212e3b7e7d8f2911b/trace.out >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> We've exhausted as many configuration option permutations that >>>>>> we can >>>>>> >>>> think of. This cluster does not appear to be under any >>>>>> significant load and >>>>>> >>>> latencies seem to largely fall in two bands: low normal or max >>>>>> timeout. This >>>>>> >>>> seems to imply that something is getting stuck and timing out at >>>>>> the max >>>>>> >>>> write timeout. >>>>>> >>>> >>>>>> >>>> Any suggestions on what to look for? We had debug enabled for >>>>>> awhile but >>>>>> >>>> we didn't see any msg that pointed to something obvious. Happy >>>>>> to provide >>>>>> >>>> any more information that may help. >>>>>> >>>> >>>>>> >>>> We are pretty much at the point of sprinkling debug around the >>>>>> code to >>>>>> >>>> track down what could be blocking. >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> Thanks, >>>>>> >>>> >>>>>> >>>> Mike >>>>>> >>>> >>>>>> >>>> -- >>>>>> >>>> >>>>>> >>>> Mike Heffner >>>>>> >>>> Librato, Inc. >>>>>> >>>> >>>>>> >>> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> -- >>>>>> >> >>>>>> >> Mike Heffner >>>>>> >> Librato, Inc. >>>>>> >> >>>>>> > >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > >>>>>> > Mike Heffner >>>>>> > Librato, Inc. >>>>>> > >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Close the World, Open the Net >>>>>> http://www.linux-wizard.net >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> Mike Heffner >>>> Librato, Inc. >>>> >>>> >>> >> >> >> -- >> >> Mike Heffner >> Librato, Inc. >> >> > > > -- > > Mike Heffner > Librato, Inc. > > -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Back to the futex()? :(
I noticed you have authentication enabled. Make sure you set the following: - the replication factor for the system_auth keyspace should equal the number of nodes - permissions_validity_in_ms is a permission cache timeout. If you are not doing dynamic permissions or creating/revoking frequently, turn this WAY up May not be the immediate reason, but the above are definitely not helping if set at defaults. On Sat, Feb 6, 2016 at 6:49 PM, Will Hayworth wrote: > Additionally: this isn't the futex_wait bug (or at least it shouldn't > be?). Amazon says > <https://forums.aws.amazon.com/thread.jspa?messageID=623731> that was > fixed several kernel versions before mine, which > is 4.1.10-17.31.amzn1.x86_64. And the reason my heap is so large is > because, per CASSANDRA-9472, we can't use offheap until 3.4 is released. > > Will > > ___ > Will Hayworth > Developer, Engagement Engine > Atlassian > > My pronoun is "they". <http://pronoun.is/they> > > > > On Sat, Feb 6, 2016 at 3:28 PM, Will Hayworth > wrote: > >> *tl;dr: other than CAS operations, what are the potential sources of lock >> contention in C*?* >> >> Hi all! :) I'm a novice Cassandra and Linux admin who's been preparing a >> small cluster for production, and I've been seeing something weird. For >> background: I'm running 3.2.1 on a cluster of 12 EC2 m4.2xlarges (32 GB >> RAM, 8 HT cores) backed by 3.5 TB GP2 EBS volumes. Until late yesterday, >> that was a cluster of 12 m4.xlarges with 3 TB volumes. I bumped it because >> while backloading historical data I had been seeing awful throughput (20K >> op/s at CL.ONE). I'd read through Al Tobey's *amazing* C* tuning guide >> <https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html> once >> or twice before but this time I was careful and fixed a bunch of defaults >> that just weren't right, in cassandra.yaml/JVM options/block device >> parameters. Folks on IRC were super helpful as always (hat tip to Jeff >> Jirsa in particular) and pointed out, for example, that I shouldn't be >> using DTCS for loading historical data--heh. After changing to LTCS, >> unbatching my writes* and reserving a CPU core for interrupts and fixing >> the clocksource to TSC, I finally hit 80K early this morning. Hooray! :) >> >> Now, my question: I'm still seeing a *ton* of blocked processes in the >> vmstats, anything from 2 to 9 per 10 second sample period--and this is >> before EBS is even being hit! I've been trying in vain to figure out what >> this could be--GC seems very quiet, after all. On Al's page's advice, I've >> been running strace and, indeed, I've been seeing *tens of thousands of >> futex() calls* in periods of 10 or 20 seconds. What eludes me is *where* this >> lock contention is coming from. I'm not using LWTs or performing CAS >> operations of which I'm aware. Assuming this isn't a red herring, what >> gives? >> >> Sorry for the essay--I just wanted to err on the side of more >> context--and *thank you* for any advice you'd like to offer, >> Will >> >> P.S. More background if you'd like--I'm running on Amazon Linux 2015.09, >> using jemalloc 3.6, JDK 1.8.0_65-b17. Here <http://pastebin.com/kuhBmHXG> is >> my cassandra.yaml and here <http://pastebin.com/fyXeTfRa> are my JVM >> args. I realized I neglected to adjust memtable_flush_writers as I was >> writing this--so I'll get on that. Aside from that, I'm not sure what to >> do. (Thanks, again, for reading.) >> >> * They were batched for consistency--I'm hoping to return to using them >> when I'm back at normal load, which is tiny compared to backloading, but >> the impact on performance was eye-opening. >> ___ >> Will Hayworth >> Developer, Engagement Engine >> Atlassian >> >> My pronoun is "they". <http://pronoun.is/they> >> >> >> > -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Slow performance after upgrading from 2.0.9 to 2.1.11
On Fri, Jan 29, 2016 at 12:30 PM, Peddi, Praveen wrote: > > Hello, > We have another update on performance on 2.1.11. compression_chunk_size didn’t really help much but We changed concurrent_compactors from default to 64 in 2.1.11 and read latencies improved significantly. However, 2.1.11 read latencies are still 1.5 slower than 2.0.9. One thing we noticed in JMX metric that could affect read latencies is that 2.1.11 is running ReadRepairedBackground and ReadRepairedBlocking too frequently compared to 2.0.9 even though our read_repair_chance is same on both. Could anyone shed some light on why 2.1.11 could be running read repair 10 to 50 times more in spite of same configuration on both clusters? > > dclocal_read_repair_chance=0.10 AND > read_repair_chance=0.00 AND > > Here is the table for read repair metrics for both clusters. > 2.0.9 2.1.11 > ReadRepairedBackground 5MinAvg 0.006 0.1 > 15MinAvg 0.009 0.153 > ReadRepairedBlocking 5MinAvg 0.002 0.55 > 15MinAvg 0.007 0.91 The concurrent_compactors setting is not a surprise. The default in 2.0 was the number of cores and in 2.1 is now: "the smaller of (number of disks, number of cores), with a minimum of 2 and a maximum of 8" https://github.com/apache/cassandra/blob/cassandra-2.1/conf/cassandra.yaml#L567-L568 So in your case this was "8" in 2.0 vs. "2" in 2.1 (assuming these are still the stock-ish c3.2xl mentioned previously?). Regardless, 64 is way to high. Set it back to 8. Note: this got dropped off the "Upgrading" guide for 2.1 in https://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt though, so lots of folks miss it. Per said upgrading guide - are you sure the data directory is in the same place between the two versions and you are not pegging the wrong disk/partition? The default locations changed for data, cache and commitlog: https://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt#L171-L180 I ask because being really busy on a single disk would cause latency and potentially dropped messages which could eventually cause a DigestMismatchException requiring a blocking read repair. Anything unusual in the node-level IO activity between the two clusters? That said, the difference in nodetool tpstats output during and after on both could be insightful. When we do perf tests internally we usually use a combination of Grafana and Riemann to monitor Cassandra internals, the JVM and the OS. Otherwise, it's guess work. -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Cassandra Connection Pooling
On Thu, Jan 28, 2016 at 4:31 PM, KAMM, BILL wrote: > Hi, I’m looking for some good info on connection pooling, using JBoss. Is > this something that needs to be configured within JBoss, or is it handled > directly by the Cassandra classes themselves? Thanks. > > > > > This thread was on the Java-Driver list recently - it may answer some of your questions: https://groups.google.com/a/lists.datastax.com/forum/m/#!topic/java-driver-user/-im4eN_yZbA -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Data Modeling: Partition Size and Query Efficiency
> > > In this case, 99% of my data could fit in a single 50 MB partition. But if > I use the standard approach, I have to split my partitions into 50 pieces > to accommodate the largest data. That means that to query the 700 rows for > my median case, I have to read 50 partitions instead of one. > > If you try to deal with this by starting a new partition when an old one > fills up, you have a nasty distributed consensus problem, along with > read-before-write. Cassandra LWT wasn't available the last time I dealt > with this, but might help with the consensus part today. But there are > still some nasty corner cases. > > I have some thoughts on other ways to solve this, but they all have > drawbacks. So I thought I'd ask here and hope that someone has a better > approach. > > Hi Jim - good to see you around again. If you can segment this upstream by customer/account/whatever, handling the outliers as an entirely different code path (potentially different cluster as the workload will be quite different at that point and have different tuning requirements) would be your best bet. Then a read-before-write makes sense given it is happening on such a small number of API queries. -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: compaction_throughput_mb_per_sec
> >> Also, as I increase my node count, I technically also have to increase my >> compaction_throughput which would require a rolling restart across the >> cluster. >> >> > You can set compaction throughput on each node dynamically via nodetool > setcompactionthroughput. > > > Also, the IOPS generated by your worklaod and the efficiency of the JVM with such are what should drive compaction throughput settings. Raw node count is orthogonal.
Re: compaction_throughput_mb_per_sec
> > > Also, as I increase my node count, I technically also have to increase my > compaction_throughput which would require a rolling restart across the > cluster. > > You can set compaction throughput on each node dynamically via nodetool setcompactionthroughput. -- ----- Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Cassandra stalls and dropped messages not due to GC
> > > Forgive me, but what is CMS? > Sorry - ConcurrentMarkSweep garbage collector. > > No. I’ve tried some mitigations since tuning thread pool sizes and GC, but > the problem begins with only an upgrade of Cassandra. No other system > packages, kernels, etc. > > > >From what 2.0 version did you upgrade? If it was < 2.0.7, you would need to run 'nodetool upgradesstables' but I'm not sure the issue would manifest that way. Otherwise, double check the DSE release notes and upgrade guide. I've not had any issues like this going from 2.0.x to 2.1.x on vanilla C*. -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Cassandra stalls and dropped messages not due to GC
Does tpstats show unusually high counts for blocked flush writers? As Sebastian suggests, running ttop will paint a clearer picture about what is happening within C*. I would however recommend going back to CMS in this case as that is the devil we all know and more folks will be able to offer advice on seeing its output (and it removes a delta). > It’s starting to look to me like it’s possibly related to brief IO spikes > that are smaller than my usual graphing granularity. It feels surprising to > me that these would affect the Gossip threads, but it’s the best current > lead I have with my debugging right now. More to come when I learn it. > Probably not the case since this was a result of an upgrade, but I've seen similar behavior on systems where some kernels had issues with irqbalance doing the right thing and would end up parking most interrupts on CPU0 (like say for the disk and ethernet modules) regardless of the number of cores. Check out proc via 'cat /proc/interrupts' and make sure the interrupts are spread out of CPU cores. You can steer them off manually at runtime if they are not spread out. Also, did you upgrade anything besides Cassandra? -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: SSTables are not getting removed
> > > memtable_offheap_space_in_mb: 4096 > > memtable_cleanup_threshold: 0.99 > ^ What led to this setting? You are basically telling Cassandra to not flush the highest-traffic memtable until the memtable space is 99% full. With that many tables and keyspaces, you are basically locking up everything on the flush queue, causing substantial back pressure. If you run 'nodetool tpstats' you will probably see a massive number of 'All Time Blocked' for FlushWriter and 'Dropped' for Mutations. Actually, this is probably why you are seeing a lot of small tables: commit log segments are being filled and blocked from flushing due to the above, so they have to attempt to flush repeatedly with whatever is there whenever they get the chance. thrift_framed_transport_size_in_mb: 150 > ^ This is also a super bad idea. Thrift buffers grow as needed to accomodate larger results, but they dont ever shrink. This will lead to a bunch of open connections holding onto large, empty byte arrays. This will show up immediately in a heap dump inspection. > concurrent_compactors: 4 > > compaction_throughput_mb_per_sec: 0 > > endpoint_snitch: GossipingPropertyFileSnitch > > > > This grinds our system to a halt and causes a major GC nearly every second. > > > > So far the only way to get around this is to run a cron job every hour > that does a “nodetool compact”. > What's the output of 'nodetool compactionstats'? CASSANDRA-9882 and CASSANDRA-9592 could be to blame (both fixed in recent versions) or this could just be a side effect of the memory pressure from the above settings. Start back at the default settings (except snitch - GPFS is always a good place to start) and change settings serially and in small increments based on feedback gleaned from monitoring runtimes. -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: memtable flush size with LCS
> do you mean that this property is ignored at memtable flush time, and so > memtables are already allowed to be much larger than sstable_size_in_mb? > Yes, 'sstable_size_in_mb' plays no part in the flush process. Flushing is based on solely on runtime activity and the file size is determined by whatever was in the memtable at that time. -- --------- Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: cassandra bootstrapping
> > > Considering that this concerns mostly auto-bootstrap and seed nodes by > default do not do that, > would it be correct to assume that they could be started in parallel and > then the non-seed should be > added with the interval apart. > > As in seeds-start -> wait -> add non-seed -> wait -> add non-seed. > > Would that timeout be only between seeds started and non-seed, or every > non-seed will have to be > started serially with wait in between. > > It would be more like start each seed one at a time then start everyone else. Then it is just 2 minutes after the last node joins. In other words starting a six node cluster would not be that much faster than starting a 100 node cluster if each had 3 seeds (I'm pretty sure it would mostly be network overhead of gossip communication/peer discovery). -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: memtable flush size with LCS
The sstable_size_in_mb can be thought of a target for the compaction process moving the file beyond L0. Note: If there are more than 32 SSTables in L0, it will switch over to doing STCS for L0 (you can disable this behavior by passing -Dcassandra.disable_stcs_in_l0=true as a system property). With a lot of overwrites, the settings you want to tune will be gc_grace_seconds in combination with tombstone_threhsold, tombstone_compaction_interval and maybe unchecked_tombstone_compaction (there are different opinions about this last one, YMMV). Making these more aggressive and increasing your sstable_size_in_mb will allow for potentially capturing more overwrites in a level which will lead to less fragmentation. However, making the size too large will keep compaction from triggering on further out levels which can then exacerbate problems particulary if you have long-lived TTLs. In general, it is very workload specific, but monitoring the histogram for the number of ssables used in a read (via org.apache.cassandra.metrics.ColumnFamily.$KEYSPACE.$TABLE.SSTablesPerReadHistogram.95percentile or shown manually in nodetool cfhistograms output) after any change will help you narrow in a good setting. See http://docs.datastax.com/en/cql/3.1/cql/cql_reference/compactSubprop.html?scroll=compactSubprop__compactionSubpropertiesLCS for more details. On Tue, Oct 27, 2015 at 3:42 PM, Dan Kinder wrote: > > Hi all, > > The docs indicate that memtables are triggered to flush when data in the commitlog is expiring or based on memtable_flush_period_in_ms. > > But LCS has a specified sstable size; when using LCS are memtables flushed when they hit the desired sstable size (default 160MB) or could L0 sstables be much larger than that? > > Wondering because I have an overwrite workload where larger memtables would be helpful, and if I need to increase my LCS sstable size in order to allow for that. > > -dan -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: cassandra bootstrapping
> I keep on seeing that there should be a 2 minute delay when bootstrapping a cluster, and > I have few questions round that. > > For starters, is there any reasoning why this is 2min and not less or more? > Is this valid mostly for bootstraping an empty cluster ring or for > restarting an existing established cluster? There is a good comment at the top of StorageService#joinTokenRing which explains the process at a high level: https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/service/StorageService.java#L791-L803 The method itself is long, but readible and has a series of comments that explain some of the decisions taken and even reference some issues which have been encountered over the years. You can change this value if you really want by passing "cassandra.ring_delay_ms" as a system property at startup. -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Verifying internode SSL
> I've configured internode SSL and set it to be used between datacenters only. Is there a way in the logs to verify SSL is operating between nodes in different DCs or do I need to break out tcpdump? > Even on DC only encryption, you should see the following message in the log: "Starting Encrypted Messaging Service on SSL port 7001" With any Java-based thing using SSL, you can always use the following startup parameter to find out exactly what is going in: -Djavax.net.debug=ssl This page will tell you how to interpret the debug output: http://docs.oracle.com/javase/7/docs/technotes/guides/security/jsse/ReadDebug.html -- ----- Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: CLUSTERING ORDER BY importance with ssd's
> > > If I am selecting a range from the bottom of the partition, does it make > much of a difference (considering I only use ssd's) if the clustering order > is ASC or DESC. > The only impact is that there is an extra seek to the bottom of the partition.
Re: Re : List users getting stuck and not returning results
Set the replication factor for the system_auth keyspace equal to the number of nodes, then issue a repair. On Fri, Oct 2, 2015 at 6:51 AM, sai krishnam raju potturi < pskraj...@gmail.com> wrote: > We have 2 clusters running DSE. On one of the clusters we recently added > additional nodes to a datacenter. > > On the cluster where we added nodes, we are getting authentication issues > from client. We are also unable to "list users" on system_auth keyspace. > It's getting stuck. > > InvalidRequestException(*why:*_UserhasnoSELECTpermission** on <> > orany of itsparents*) -> client side error > > The other clusters perform fine. > > Thanks in advance. > -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Unable to remove dead node from cluster.
:2.1.8+git20150804.076b0b1] >>>>> 2015-09-18_23:21:40.80670 at >>>>> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1822) >>>>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1] >>>>> 2015-09-18_23:21:40.80671 at >>>>> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1495) >>>>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1] >>>>> 2015-09-18_23:21:40.80671 at >>>>> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2121) >>>>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1] >>>>> 2015-09-18_23:21:40.80672 at >>>>> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1009) >>>>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1] >>>>> 2015-09-18_23:21:40.80673 at >>>>> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1113) >>>>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1] >>>>> 2015-09-18_23:21:40.80673 at >>>>> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49) >>>>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1] >>>>> 2015-09-18_23:21:40.80673 at >>>>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) >>>>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1] >>>>> 2015-09-18_23:21:40.80674 at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> ~[na:1.7.0_45] >>>>> 2015-09-18_23:21:40.80674 at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> ~[na:1.7.0_45] >>>>> 2015-09-18_23:21:40.80674 at >>>>> java.lang.Thread.run(Thread.java:744) ~[na:1.7.0_45] >>>>> 2015-09-18_23:21:40.85812 WARN 23:21:40 Not marking nodes down due to >>>>> local pause of 10852378435 > 50 >>>>> >>>>> Any suggestions about how to remove it? >>>>> Thanks. >>>>> >>>>> -- >>>>> Dikang >>>>> >>>>> >>> >>> >>> -- >>> Dikang >>> >>> >> >> >> -- >> Dikang >> >> > > > -- > Dikang > > -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: LTCS Strategy Resulting in multiple SSTables
You could try altering the table to use STCS, then force a major compaction via 'nodetool compact', then alter the table back to LCS when it completes. You may very well hit the same issues in process of doing this, however, until you upgrade. On Wed, Sep 16, 2015 at 1:25 PM, Saladi Naidu wrote: > Nate, > Yes we are in process of upgrading to 2.1.9. Meanwhile I am looking for > correcting the problem, do you know any recovery options to reduce the > number of SS Tables. As SStbales are keep on increasing, the read > performance is deteriorating > > Naidu Saladi > > ------ > *From:* Nate McCall > *To:* Cassandra Users ; Saladi Naidu < > naidusp2...@yahoo.com> > *Sent:* Tuesday, September 15, 2015 4:53 PM > > *Subject:* Re: LTCS Strategy Resulting in multiple SSTables > > That's an early 2.1/known buggy version. There have been several issues > fixed since which could cause that behavior. Most likely > https://issues.apache.org/jira/browse/CASSANDRA-9592 ? > > Upgrade to 2.1.9 and see if the problem persists. > > > > On Tue, Sep 15, 2015 at 8:31 AM, Saladi Naidu > wrote: > > We are on 2.1.2 and planning to upgrade to 2.1.9 > > Naidu Saladi > > -- > *From:* Marcus Eriksson > *To:* user@cassandra.apache.org; Saladi Naidu > *Sent:* Tuesday, September 15, 2015 1:53 AM > *Subject:* Re: LTCS Strategy Resulting in multiple SSTables > > if you are on Cassandra 2.2, it is probably this: > https://issues.apache.org/jira/browse/CASSANDRA-10270 > > > > On Tue, Sep 15, 2015 at 4:37 AM, Saladi Naidu > wrote: > > We are using Level Tiered Compaction Strategy on a Column Family. Below > are CFSTATS from two nodes in same cluster, one node has 880 SStables in L0 > whereas one node just has 1 SSTable in L0. In the node where there are > multiple SStables, all of them are small size and created same time stamp. > We ran Compaction, it did not result in much change, node remained with > huge number of SStables. Due to this large number of SSTables, Read > performance is being impacted > > In same cluster, under same keyspace, we are observing this discrepancy in > other column families as well. What is going wrong? What is the solution to > fix this > > *---*NODE1*---* > *Table: category_ranking_dedup* > *SSTable count: 1* > *SSTables in each level: [1, 0, 0, 0, 0, > 0, 0, 0, 0]* > *Space used (live): 2012037* > *Space used (total): 2012037* > *Space used by snapshots (total): 0* > *SSTable Compression Ratio: > 0.07677216119569073* > *Memtable cell count: 990* > *Memtable data size: 32082* > *Memtable switch count: 11* > *Local read count: 2842* > *Local read latency: 3.215 ms* > *Local write count: 18309* > *Local write latency: 5.008 ms* > *Pending flushes: 0* > *Bloom filter false positives: 0* > *Bloom filter false ratio: 0.0* > *Bloom filter space used: 816* > *Compacted partition minimum bytes: 87* > *Compacted partition maximum bytes: > 25109160* > *Compacted partition mean bytes: 22844* > *Average live cells per slice (last five > minutes): 338.84588318085855* > *Maximum live cells per slice (last five > minutes): 10002.0* > *Average tombstones per slice (last five > minutes): 36.53307529908515* > *Maximum tombstones per slice (last five > minutes): 36895.0* > > *NODE2--- * > *Table: category_ranking_dedup* > *SSTable count: 808* > *SSTables in each level: [808/4, 0, 0, 0, > 0, 0, 0, 0, 0]* > *Space used (live): 291641980* > *Space used (total): 291641980* > *Space used by snapshots (total): 0* > *SSTable Compression Ratio: > 0.1431106696818256* > *Memtable cell count: 4365293* > *Memtable dat
Re: LTCS Strategy Resulting in multiple SSTables
Average live cells per slice (last five > minutes): 416.1780688985929* > *Maximum live cells per slice (last five > minutes): 10002.0* > *Average tombstones per slice (last five > minutes): 45.11547792333818* > *Maximum tombstones per slice (last five > minutes): 36895.0* > > > > > Naidu Saladi > > > > > -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Currupt sstables when upgrading from 2.1.8 to 2.1.9
You have a/some corrupt SSTables. 2.1.9 is doing strict checking at startup and reacting based on "disk_failure_policy" per the stack trace. For details, see: https://issues.apache.org/jira/browse/CASSANDRA-9686 Either way, you are going to have to run nodetool scrub. I'm not sure if it's better to do this from 2.1.8 or from 2.1.9 with "disk_failure_policy: ignore" It feels like that option got overloaded a bit strangely with the changes in CASSANDRA-9686 and I have not yet tried it with it's new meaning. On Tue, Sep 15, 2015 at 5:26 AM, George Sigletos wrote: > Hello, > > I tried to upgrade two of our clusters from 2.1.8 to 2.1.9. In some, but > not all nodes, I got errors about corrupt sstables when restarting. I > downgraded back to 2.1.8 for now. > > Has anybody else faced the same problem? Should sstablescrub fix the > problem? I ddin't tried that yet. > > Kind regards, > George > > ERROR [SSTableBatchOpen:3] 2015-09-14 10:16:03,296 FileUtils.java:447 - > Exiting forcefully due to file system exception on startup, disk failure > policy "stop" > org.apache.cassandra.io.sstable.CorruptSSTableException: > java.io.EOFException > at > org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown > Source) [na:1.7.0_75] > at java.util.concurrent.FutureTask.run(Unknown Source) > [na:1.7.0_75] > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) [na:1.7.0_75] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) [na:1.7.0_75] > at java.lang.Thread.run(Unknown Source) [na:1.7.0_75] > Caused by: java.io.EOFException: null > at java.io.DataInputStream.readUnsignedShort(Unknown Source) > ~[na:1.7.0_75] > at java.io.DataInputStream.readUTF(Unknown Source) ~[na:1.7.0_75] > at java.io.DataInputStream.readUTF(Unknown Source) ~[na:1.7.0_75] > at > org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106) > ~[apache-cassandra-2.1.9.jar:2.1.9] > -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: cassandra-stress on 3.0 with column widths benchmark.
By default, stress runs stop after throughput has not improved after three runs. This functionality is a little difficult to figure out from the documentation, so take a look at (maybe even with a debugger attached): https://github.com/apache/cassandra/blob/cassandra-2.1/tools/stress/src/org/apache/cassandra/stress/StressAction.java#L111 to see what's going on. In both scenarios, this may have taken roughly the same time to hit saturation, but for different reasons and with different results as you saw. To really find out why, it would be a good idea to enable one of the reporters ( http://www.datastax.com/dev/blog/pluggable-metrics-reporting-in-cassandra-2-0-2) to send cluster metrics to a monitoring system such as Graphite. At the very least get OpsCenter running and capturing metrics from the cluster. Either way, getting familiar with a visual picture of the runtime now is invaluable for really understanding any future production deployment. On Sun, Sep 13, 2015 at 9:25 PM, Kevin Burton wrote: > I’m trying to benchmark two scenarios… > > 10 columns with 150 bytes each > > vs > > 150 columns with 10 bytes each. > > The total row “size” would be 1500 bytes (ignoring overhead). > > Our app uses 150 columns so I’m trying to see if packing it into a JSON > structure using one column would improve performance. > > I seem to have confirmed my hypothesis. > > I’m running two tests: > > ./tools/bin/cassandra-stress write -insert -col n=FIXED\(10\) >> size=FIXED\(150\) | tee cassandra-stress-10-150.log >> > > >> time ./tools/bin/cassandra-stress write -insert -col n=FIXED\(150\) >> size=FIXED\(10\) | tee cassandra-stress-150-10.log > > > this shows that the "op rate” is much much lower when running with 150 > columns: > > root@util0063 ~/apache-cassandra-3.0.0-beta2 # grep "op rate" >> cassandra-stress-10-150.log >> op rate : 7632 [WRITE:7632] >> op rate : 11851 [WRITE:11851] >> op rate : 31967 [WRITE:31967] >> op rate : 41798 [WRITE:41798] >> op rate : 51251 [WRITE:51251] >> op rate : 58057 [WRITE:58057] >> op rate : 62977 [WRITE:62977] >> op rate : 65398 [WRITE:65398] >> op rate : 67673 [WRITE:67673] >> op rate : 69198 [WRITE:69198] >> op rate : 70402 [WRITE:70402] >> op rate : 71019 [WRITE:71019] >> op rate : 71574 [WRITE:71574] >> root@util0063 ~/apache-cassandra-3.0.0-beta2 # grep "op rate" >> cassandra-stress-150-10.log >> op rate : 2570 [WRITE:2570] >> op rate : 5144 [WRITE:5144] >> op rate : 10906 [WRITE:10906] >> op rate : 11832 [WRITE:11832] >> op rate : 12471 [WRITE:12471] >> op rate : 12915 [WRITE:12915] >> op rate : 13620 [WRITE:13620] >> op rate : 13456 [WRITE:13456] >> op rate : 13916 [WRITE:13916] >> op rate : 14029 [WRITE:14029] >> op rate : 13915 [WRITE:13915] > > > … what’s WEIRD here is that > > Both tests take about 10 minutes. Yet it’s saying that the op rate for > the second is slower. Why would that be? That doesn’t make much sense… > > -- > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> > > -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Question: Gossip Protocol
It is hard coded in Gossiper: https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/gms/Gossiper.java#L83 What requirement are you trying to address by increasing this value? On Mon, Sep 14, 2015 at 8:26 AM, Thouraya TH wrote: > I find this information : > > The gossip process runs every second and exchanges state messages with up > to three other nodes in the cluster. > > here > http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureGossipAbout_c.html > > > Please, i ask if it is possible to change this periode, to three seconds ? > > Kind regards. > > > > > 2015-09-14 14:15 GMT+01:00 Thouraya TH : > >> Hi all, >> >> Please, the gossip procotol in cassandra is running every ... seconds ? >> >> >> Thank you so much for answers. >> Best Regards. >> > > -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Should replica placement change after a topology change?
> > > So if you have a topology that would change if you switched from >> SimpleStrategy to NetworkTopologyStrategy plus multiple racks, it sounds >> like a different migration strategy would be needed? >> >> I am imagining: >> >>1. Switch to a different snitch, and the keyspace from SimpleStrategy >>to NTS but keep it all in one rack. So effectively the same topology, but >>with a different snitch. >>2. Set up a new data centre with the desired topology. >>3. Change the keyspace to have replicas in the new DC. >>4. Rebuild all the nodes in the new DC. >>5. Flip all your clients over to the new DC. >>6. Decommission your original DC. >> >> That would work, yes. I would add : > > - 4.5. Repair all nodes. > I can confirm that the above process works (definitely include Rob's repair suggestion, though). It is really the only way we've found to safely go from SimpleSnitch to rack-aware NTS. The same process works/is required for SimpleSnitch to Ec2Snitch fwiw. -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Trace evidence for LOCAL_QUORUM ending up in remote DC
Thanks for reporting back, Tom. Can you drop a comment on the ticket with a sentence or two describing your specific case and that speculative_retry = NONE was a valid work-around? That will make it easier for the next folks that come along to have a concrete problem/solution in a single comment on that ticket. Glad to hear it worked, though. On Tue, Sep 8, 2015 at 3:38 PM, Tom van den Berge wrote: > Nate, > > I've disabled it, and it's been running for about an hour now without > problems, while before, the problem occurred roughly every few minutes. I > guess it's safe to say that this proves that CASSANDRA-9753 > <https://issues.apache.org/jira/browse/CASSANDRA-9753> is the cause of > the problem. > > I'm very happy to finally know the cause of this problem! Thanks for > pointing me in the right direction. > Tom > > On Tue, Sep 8, 2015 at 9:13 PM, Nate McCall > wrote: > >> Just to be sure: can this bug result in a 0-row result while it should be >>> > 0 ? >>> >> Per Tyler's reference to CASSANDRA-9753 >> <https://issues.apache.org/jira/browse/CASSANDRA-9753>, you would see >> this if the read was routed by speculative retry to the nodes that were not >> yet finished being built. >> >> Does this work as anticipated when you set speculative_retry to NONE? >> >> >> >> >> -- >> ----- >> Nate McCall >> Austin, TX >> @zznate >> >> Co-Founder & Sr. Technical Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> > > -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Trace evidence for LOCAL_QUORUM ending up in remote DC
> > Just to be sure: can this bug result in a 0-row result while it should be > > 0 ? > Per Tyler's reference to CASSANDRA-9753 <https://issues.apache.org/jira/browse/CASSANDRA-9753>, you would see this if the read was routed by speculative retry to the nodes that were not yet finished being built. Does this work as anticipated when you set speculative_retry to NONE? -- ----- Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Re : Restoring nodes in a new datacenter, from snapshots in an existing datacenter
You cannot use the identical token ranges. You have to capture membership information somewhere for each datacenter, and use that token information when briging up the replacement DC. You can find details on this process here: http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html That process is straight forward, but it can go south pretty quickly if you miss a step. It's a really good idea to set asside some time to try this out in a staging/test system and build a runbook for the process targeting your specific environment. On Fri, Aug 28, 2015 at 1:12 PM, sai krishnam raju potturi < pskraj...@gmail.com> wrote: > > hi; > We have cassandra cluster with Vnodes spanning across 3 data centers. > We take backup of the snapshots from one datacenter. >In a doomsday scenario, we want to restore a downed datacenter, with > snapshots from another datacenter. We have same number of nodes in each > datacenter. > > 1 : We know it requires copying the snapshots and their corresponding > token ranges to the nodes in new datacenter, and running nodetool refresh. > > 2 : The question is, we will now have 2 datacenters, with the same exact > token ranges. Will that cause any problem. > > DC1 : Node-1 : token1..token10 > Node-2 : token11 .token20 > Node-3 : token21 . token30 > Node-4 : token31 . token40 > > DC2 : Node-1 : token1.token10 >Node-2 : token11token20 >Node-3 : token21token30 > Node-4 : token31token40 > > > thanks > Sai > > > > -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Re : Decommissioned node appears in logs, and is sometimes marked as "UNREACHEABLE" in `nodetool describecluster`
Do they show up in nodetool gossipinfo? Either way, you probably need to invoke Gossiper.unsafeAssassinateEndpoints via JMX as described in step 1 here: http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_gossip_purge.html On Fri, Aug 28, 2015 at 1:32 PM, sai krishnam raju potturi < pskraj...@gmail.com> wrote: > hi; > we decommissioned nodes in a datacenter a while back. Those nodes keep > showing up in the logs, and also sometimes marked as UNREACHABLE when > `nodetool describecluster` is run. > > However these nodes do not show up in `nodetool status` and > `nodetool ring`. > > Below are a couple lines from the logs. > > 2015-08-27 04:38:16,180 [GossipStage:473] INFO Gossiper InetAddress / > 10.0.0.1 is now DOWN > 2015-08-27 04:38:16,183 [GossipStage:473] INFO StorageService Removing > tokens [85070591730234615865843651857942052865] for /10.0.0.1 > > thanks > Sai > > -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: How to get the peer's IP address when writing failed
Unfortunately, the addresses/DC of the replicas are not available on the exception hierarchy within Cassandra. Fwiw, the DS Java Driver (most native protocol drivers actually) manages membership dynamically by acting on cluster health events sent back over the channel by the native transport. Keeping this intelligence down in the driver makes for significantly less complex cluster management in an application. On Wed, Aug 26, 2015 at 3:51 AM, Lu, Boying wrote: > Hi, All, > > > > We have an Cassandra environment with two connected DCs and our > consistency level of writing operation is EACH_QUORUM. > > So if one DC is down, the write will be failed and we get > TokenRangeOfflineException on the client side (we use netfilix java client > libraries). > > > > We want to give more detailed information about this failure. e.g. The IP > addresses of the broken nodes (on the broken DC in our case). > > We checked the TokenRangeOfflineException and its parent class > ConnectionException. The only related method is getHost(). > > But it returns the IP address of the local node (the node that issues the > writing operation) instead of the remote node on the broken DC. > > > > Does anyone know how to get such information when writing failed? > > > > Thanks > > > Boying > > > -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: 'no such object in table'
> LOCAL_JMX=no > > if [ "$LOCAL_JMX" = "yes" ]; then > JVM_OPTS="$JVM_OPTS -Dcassandra.jmx.local.port=$JMX_PORT > -XX:+DisableExplicitGC" > else > JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT" > JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.rmi.port=$JMX_PORT" > JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl=false" > JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.authenticate=false" > JVM_OPTS="$JVM_OPTS > -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password" > fi > Retry with the following option added to your JVM_OPTS: java.rmi.server.logCalls=true This should produce some more information about what is going on. -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: AWS multi-region DCs fail to rebuild
> > > This happens repeatedly when attempting to run the rebuild on just a > single node > in the US DC (pointing at the EU DC). I have not yet tried any other node > from the > US DC. > > Is this a bug or a configuration error perhaps? I know people out there > are using > AWS for Cassandra - how are you replicating across regions? > There have been some edge cases here in the past: https://issues.apache.org/jira/browse/CASSANDRA-4026 Check the AWS metadata ( http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html) to see that what the region and AZ endpoints return is consistent with your keyspace declaration. If nothing really sticks out, you may want to try just using GPFS and fall back to setting DC and rack by hand. Per Rob's point about bleeding edge, I'd be super curious if the existing setup worked as is on 2.1 or 2.0. I'd be willing to bet you are the first person trying to make EC2Snitch span regions on 2.2. -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: AssertionError on PasswordAuthenticator
> > > Any ideas what might be wrong or which prerequisites need to be met? This > is the first request for a connection. > > > Sam makes a good point. Make sure you have the username and password properties set in the configuration file: https://github.com/apache/incubator-usergrid/blob/master/stack/config/src/main/resources/usergrid-default.properties#L52-L53 See this page for details on configuration: http://usergrid.readthedocs.org/en/latest/deploy-local.html#install-and-configure-cassandra For Usergrid specific questions, feel free to stop by our mail list or IRC channel both of which are listed here: http://usergrid.incubator.apache.org/community/ -- ----- Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Unbalanced disk load
> > I am currently benchmarking Cassandra with three machines, and on each machine I am seeing an unbalanced distribution of data among the data directories (1 per disk). > I am concerned that this affects my write performance, is there anything that I can make the distribution be more even? Would raid0 be my best option? > Using LeveledCompactionStrategy should provide a much better balance. However, depending on your use case, this may not be the right choice for your workload, in which case RAID0 with a single data_dir will be the best option. > Total size of data is about 2TB, 14B records, all unique. Replication factor of 1. RF=1 means *no* redundancy which is a bad idea to run in production (and sort of defeats the purpose of a system like Cassandra). This is not going to be an accurate a picture for a load test as it eliminates a lot of cross-node traffic which you would see with a higher Replication Factor. -- --------- Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Significant drop in storage load after 2.1.6->2.1.8 upgrade
Perhaps https://issues.apache.org/jira/browse/CASSANDRA-9592 got compactions moving forward for you? This would explain the drop. However, the discussion on https://issues.apache.org/jira/browse/CASSANDRA-9683 seems to be similar to what you saw and that is currently being investigated. On Fri, Jul 17, 2015 at 10:24 AM, Mike Heffner wrote: > Hi all, > > I've been upgrading several of our rings from 2.1.6 to 2.1.8 and I've > noticed that after the upgrade our storage load drops significantly (I've > seen up to an 80% drop). > > I believe most of the data that is dropped is tombstoned (via TTL > expiration) and I haven't detected any data loss yet. However, can someone > point me to what changed between 2.1.6 and 2.1.8 that would lead to such a > significant drop in tombstoned data? Looking at the changelog there's > nothing that jumps out at me. This is a CF definition from one of the CFs > that had a significant drop: > > > describe measures_mid_1; > > CREATE TABLE "Metrics".measures_mid_1 ( > key blob, > c1 int, > c2 blob, > c3 blob, > PRIMARY KEY (key, c1, c2) > ) WITH COMPACT STORAGE > AND CLUSTERING ORDER BY (c1 ASC, c2 ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 0 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99.0PERCENTILE'; > > Thanks, > > Mike > > -- > > Mike Heffner > Librato, Inc. > > -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com