Re: com.datastax.driver.core.exceptions.SyntaxError: line 1:37 no viable alternative at character
Sorry, misprint //composeQuery() = INSERT INTO packets (id, fingerprint, mark) VALUES (?, ?, ?); PreparedStatement preparedStatement = session.prepare(composeQuery()); //exception happens here! 2015-06-24 11:20 GMT+02:00 Serega Sheypak serega.shey...@gmail.com: Hi, I'm trying to use bounded query and I get weird error: Here is a query: Bounded query: INSERT INTO packets (id, fingerprint, mark) VALUES (?, ?, ?); Here is a code: PreparedStatement preparedStatement = session.prepare(composeQuery()); //composeQuery returns INSERT INTO packets (id, fingerprint, mark) VALUES (?, ?, ?); BoundStatement boundStatement = new BoundStatement(preparedStatement); //EXCEPTION HERE boundStatement.bind(UUID.randomUUID(), RandomStringUtils.random(10), 1); session.execute(boundStatement); If I use cqlsh and run INSERT INTO packets (id, fingerprint, mark) VALUES (now(), 'xxx', 1); it works Stacktrace: Exception in thread main com.datastax.driver.core.exceptions.SyntaxError: line 1:37 no viable alternative at character ' ' at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:35) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289) at com.datastax.driver.core.AbstractSession.prepare(AbstractSession.java:79) at stress.StressTest.runBound(StressTest.java:89) at stress.Main.main(Main.java:29) Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:37 no viable alternative at character ' ' at com.datastax.driver.core.Responses$Error.asException(Responses.java:101) at com.datastax.driver.core.SessionManager$1.apply(SessionManager.java:185) at com.datastax.driver.core.SessionManager$1.apply(SessionManager.java:160) at com.google.common.util.concurrent.Futures$1.apply(Futures.java:720) at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:859) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Re: After Restart Nodes had lost data
No, I did not. On 24 Jun 2015, at 06:05, Jason Wee peich...@gmail.commailto:peich...@gmail.com wrote: on the node 192.168.2.100, did you run repair after its status is UN? On Wed, Jun 24, 2015 at 2:46 AM, Jean Tremblay jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com wrote: Dear Alain, Thank you for your reply. Ok, yes I did not drain. The cluster was loaded with tons of records, and no new records were added since few weeks. Each node had a load of about 160 GB as seen in the “nodetool status. I killed the cassandradeamon, and restarted it. After cassandra was restarted, I could see in the “nodetool status” a load of 5 GB!! I don’t use counters. I use RF 3 on 5 nodes. I did not change the replication factor. I have two types of read queries. One use QUORUM for read, and the other use ONE for consistency level. I did not change the Topology. Are you sure this node had data before you restart it ? Actually the full story is: - I stopped node0(192.168.2.100), and I restarted it. - I stopped node1(192.168.2.101). - I made a nodetool status and I noticed that node0 was UN and had a load 5 GB. I found this really weird because all the other ones had about 160GB. I also saw that node1 was DN with a load of about 160GB. - I restarted node1. - I made a nodetool status and I noticed that node1 was UN and had a load of also 5GB, it previously had a load of about 160GB. That I’m sure. - Then my program could no longer query C*. Neither the QUORUM nor the ONE consistency level statements could read data. What does a nodetool status mykeyspace outputs ? I cannot try this anymore. I flushed the whole cluster, and I am currently reloading everything. I was too much in a hurry. I have a demo tomorrow, and I will manage to have it back before tomorrow. After my bad decision of flushing the cluster, I realised that I could have bootstrapped again my two nodes. Learning by doing. It’s like the whole cluster is paralysed -- what does it mean, be more accurate on this please. You should tell us action that were taken before this occurred and now what is not working since a C* cluster in this state could perfectly run. No SPOF. What I did before? Well this cluster was basically idling. I was only making lots of select on it. I was loaded since few weeks. But what I noticed when I restarted node0 is the following: INFO [InternalResponseStage:1] 2015-06-23 11:45:32,723 ColumnFamilyStore.java:882 - Enqueuing flush of schema_columnfamilies: 131587 (0%) on-heap, 0 (0%) off-heap INFO [MemtableFlushWriter:2] 2015-06-23 11:45:32,723 Memtable.java:346 - Writing Memtable-schema_columnfamilies@917967643(34850 serialized bytes, 585 ops, 0%/0% of on/off-heap limit) WARN [GossipTasks:1] 2015-06-23 11:45:33,459 FailureDetector.java:251 - Not marking nodes down due to local pause of 25509152054 50 INFO [MemtableFlushWriter:1] 2015-06-23 11:45:33,982 Memtable.java:385 - Completed flushing /home/maia/apache-cassandra-DATA/data/system/local-7ad54392bcdd35a684174e047860b377/system-local-ka-11-Data.db (5274 bytes) for commitlog position ReplayPos ition(segmentId=1435052707645, position=144120) INFO [GossipStage:1] 2015-06-23 11:45:33,985 StorageService.java:1642 - Node /192.168.2.101http://192.168.2.101 state jump to normal INFO [GossipStage:1] 2015-06-23 11:45:33,991 Gossiper.java:987 - Node /192.168.2.102http://192.168.2.102 has restarted, now UP INFO [SharedPool-Worker-1] 2015-06-23 11:45:33,992 Gossiper.java:954 - InetAddress /192.168.2.102http://192.168.2.102 is now UP INFO [HANDSHAKE-/192.168.2.102http://192.168.2.102] 2015-06-23 11:45:33,993 OutboundTcpConnection.java:485 - Handshaking version with /192.168.2.102http://192.168.2.102 INFO [GossipStage:1] 2015-06-23 11:45:33,993 StorageService.java:1642 - Node /192.168.2.102http://192.168.2.102 state jump to normal INFO [GossipStage:1] 2015-06-23 11:45:33,999 Gossiper.java:987 - Node /192.168.2.103http://192.168.2.103 has restarted, now UP INFO [SharedPool-Worker-1] 2015-06-23 11:45:33,999 Gossiper.java:954 - InetAddress /192.168.2.103http://192.168.2.103 is now UP INFO [GossipStage:1] 2015-06-23 11:45:34,001 StorageService.java:1642 - Node /192.168.2.103http://192.168.2.103 state jump to normal INFO [HANDSHAKE-/192.168.2.103http://192.168.2.103] 2015-06-23 11:45:34,020 OutboundTcpConnection.java:485 - Handshaking version with /192.168.2.103http://192.168.2.103 INFO [main] 2015-06-23 11:45:34,021 StorageService.java:1642 - Node zennode0/192.168.2.100http://192.168.2.100 state jump to normal INFO [GossipStage:1] 2015-06-23 11:45:34,028 StorageService.java:1642 - Node /192.168.2.104http://192.168.2.104 state jump to normal INFO [main] 2015-06-23 11:45:34,038 CassandraDaemon.java:583 - Waiting for gossip to settle before accepting client requests... INFO [GossipStage:1] 2015-06-23 11:45:34,039 StorageService.java:1642 - Node /192.168.2.101http://192.168.2.101 state jump to normal INFO
Re: Counters 2.1 Accuracy
IMO, the main concern of C*'s counter is, it is not idempotent. For example, if you add a counter and get a timeout error, you can not know whether it is successful. For non-counter writes, they are idempotent so you can just retry, but if you retry in counter, there may be a double write. 2015-06-23 12:23 GMT+08:00 Mike Trienis mike.trie...@orcsol.com: Hi All, I'm fairly new to Cassandra and am planning on using it as a datastore for an Apache Spark cluster. The use case is fairly simple, read the raw data and perform aggregates and push the rolled up data back to Cassandra. The data models will use counters pretty heavily so I'd like to understand what kind of accuracy should I expect from Cassandra 2.1 when increment the counters. - http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters The blog post above states that the new counter implementations are safer although I'm not sure what that means in practice. Will the counters be 99.99% accurate? How often will they be over or under counted? Thanks, Mike. -- Thanks, Phil Yang
DTCS - nodetool repair - TTL
Hello all, We are running c* version 2.0.15. We have 5 nodes with RF=3. We are using DTCS and on all inserts we have a TTL of 30 days. We have no deletes.We just have one CF. When i run nodetool repair on a node i notice a lot of extra sst tables created, this I think is due to the fact that its trying to reconcile the correct values across different nodes. What i am trying to figure out now is how will this affect the performance after the ttl is reached for rows. As far as i understood from Spotify DTCS https://labs.spotify.com/tag/dtcs/ it looks like DTCS will drop the whole SST table once the ttl is reached as it compacts data which are inserted around the same time into same SST table. Now when repair happens we have these new SST Tables which are earlier in the timeline and hence will have tombstones alive for sometime. for ex if the machine is up for 2 weeks and i run repair now for the first time then the new sst tables might have data which is from anywhere in the previous weeks and hence even though the SST tables created during week 1 will get dropped off in the starting of 5th Week because of repair there will additional SST tables which will have tombstones till they reach their eventual drop state a few weeks later. Am i thinking correct ? This means that we might still have lot of tombstones lying around as compaction is less frequent for older tables ? thanks anishek
com.datastax.driver.core.exceptions.SyntaxError: line 1:37 no viable alternative at character
Hi, I'm trying to use bounded query and I get weird error: Here is a query: Bounded query: INSERT INTO packets (id, fingerprint, mark) VALUES (?, ?, ?); Here is a code: PreparedStatement preparedStatement = session.prepare(composeQuery()); //composeQuery returns INSERT INTO packets (id, fingerprint, mark) VALUES (?, ?, ?); BoundStatement boundStatement = new BoundStatement(preparedStatement); //EXCEPTION HERE boundStatement.bind(UUID.randomUUID(), RandomStringUtils.random(10), 1); session.execute(boundStatement); If I use cqlsh and run INSERT INTO packets (id, fingerprint, mark) VALUES (now(), 'xxx', 1); it works Stacktrace: Exception in thread main com.datastax.driver.core.exceptions.SyntaxError: line 1:37 no viable alternative at character ' ' at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:35) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289) at com.datastax.driver.core.AbstractSession.prepare(AbstractSession.java:79) at stress.StressTest.runBound(StressTest.java:89) at stress.Main.main(Main.java:29) Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:37 no viable alternative at character ' ' at com.datastax.driver.core.Responses$Error.asException(Responses.java:101) at com.datastax.driver.core.SessionManager$1.apply(SessionManager.java:185) at com.datastax.driver.core.SessionManager$1.apply(SessionManager.java:160) at com.google.common.util.concurrent.Futures$1.apply(Futures.java:720) at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:859) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Re: 10000+ CF support from Cassandra
any ideas or advises? On Mon, Jun 22, 2015 at 10:55 AM, Arun Chaitanya chaitan64a...@gmail.com wrote: Hello All, Now we settled on the following approach. I want to know if there are any problems that you foresee in the production environment. Our Approach: Use Off Heap Memory Modifications to default cassandra.yaml and cassandra-env.sh * memory_allocator: JEMallocAllocator (https://issues.apache.org/jira/browse/CASSANDRA-7883) * memtable_allocation_type: offheap_objects By above two, the slab allocation (https://issues.apache.org/jira/browse/CASSANDRA-5935), which requires 1MB heap memory per table, is disabled. The memory for table metadata, caches and memtable are thus allocated natively and does not affect GC performance. * tombstone_failure_threshold: 1 Without this, C* throws TombstoneOverwhelmingException while in startup. This setting looks problematic so I want to know why just creating tables makes so many tombstones ... * -XX:+UseG1GC It is good for reducing GC time. Without this, full GCs 1s are observed. We created 5000 column families with about 1000 entries per column family. The read/write performance seems to stable. The problem we saw is with startup time. Cassandra Start Time (s) 20 349 Average CPU Usage (%) 40 49.65 GC Actitivy (%) 2.6 0.6 Thanks a lot in advance. On Tue, Jun 2, 2015 at 11:26 AM, graham sanderson gra...@vast.com wrote: I strongly advise against this approach. Jon, I think so too. But so you actually foresee any problems with this approach? I can think of a few. [I want to evaluate if we can live with this problem] Just to be clear, I’m not saying this is a great approach, I AM saying that it may be better than having 1+ CFs, which was the original question (it really depends on the use case which wasn’t well defined)… map size limit may be a problem, and then there is the CQL vs thrift question which could start a flame war; ideally CQL maps should give you the same flexibility as arbitrary thrift columns On Jun 1, 2015, at 9:44 PM, Jonathan Haddad j...@jonhaddad.com wrote: Sorry for this naive question but how important is this tuning? Can this have a huge impact in production? Massive. Here's a graph of when we did some JVM tuning at my previous company: http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png About an order of magnitude difference in performance. Jon On Mon, Jun 1, 2015 at 7:20 PM Arun Chaitanya chaitan64a...@gmail.com wrote: Thanks Jon and Jack, I strongly advise against this approach. Jon, I think so too. But so you actually foresee any problems with this approach? I can think of a few. [I want to evaluate if we can live with this problem] - No more CQL. - No data types, everything needs to be a blob. - Limited clustering Keys and default clustering order. First off, different workloads need different tuning. Sorry for this naive question but how important is this tuning? Can this have a huge impact in production? You might want to consider a model where you have an application layer that maps logical tenant tables into partition keys within a single large Casandra table, or at least a relatively small number of Cassandra tables. It will depend on the typical size of your tenant tables - very small ones would make sense within a single partition, while larger ones should have separate partitions for a tenant's data. The key here is that tables are expensive, but partitions are cheap and scale very well with Cassandra. We are actually trying similar approach. But we don't want to expose this to application layer. We are attempting to hide this and provide an API. Finally, you said 10 clusters, but did you mean 10 nodes? You might want to consider a model where you do indeed have multiple clusters, where each handles a fraction of the tenants, since there is no need for separate tenants to be on the same cluster. I meant 10 clusters. We want to split our tables across multiple clusters if above approach is not possible. [But it seems to be very costly] Thanks, On Fri, May 29, 2015 at 5:49 AM, Jack Krupansky jack.krupan...@gmail.com wrote: How big is each of the tables - are they all fairly small or fairly large? Small as in no more than thousands of rows or large as in tens of millions or hundreds of millions of rows? Small tables are are not ideal for a Cassandra cluster since the rows would be spread out across the nodes, even though it might make more sense for each small table to be on a single node. You might want to consider a model where you have an application layer that maps logical tenant tables into partition keys within a single large Casandra table, or at least a relatively small number of Cassandra tables. It will depend on the
Re: com.datastax.driver.core.exceptions.SyntaxError: line 1:37 no viable alternative at character
omg!!! It was some weird unprinted character. That is why C* driver failed to parse it 2015-06-24 11:35 GMT+02:00 Serega Sheypak serega.shey...@gmail.com: Sorry, misprint //composeQuery() = INSERT INTO packets (id, fingerprint, mark) VALUES (?, ?, ?); PreparedStatement preparedStatement = session.prepare(composeQuery()); //exception happens here! 2015-06-24 11:20 GMT+02:00 Serega Sheypak serega.shey...@gmail.com: Hi, I'm trying to use bounded query and I get weird error: Here is a query: Bounded query: INSERT INTO packets (id, fingerprint, mark) VALUES (?, ?, ?); Here is a code: PreparedStatement preparedStatement = session.prepare(composeQuery()); //composeQuery returns INSERT INTO packets (id, fingerprint, mark) VALUES (?, ?, ?); BoundStatement boundStatement = new BoundStatement(preparedStatement); //EXCEPTION HERE boundStatement.bind(UUID.randomUUID(), RandomStringUtils.random(10), 1); session.execute(boundStatement); If I use cqlsh and run INSERT INTO packets (id, fingerprint, mark) VALUES (now(), 'xxx', 1); it works Stacktrace: Exception in thread main com.datastax.driver.core.exceptions.SyntaxError: line 1:37 no viable alternative at character ' ' at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:35) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289) at com.datastax.driver.core.AbstractSession.prepare(AbstractSession.java:79) at stress.StressTest.runBound(StressTest.java:89) at stress.Main.main(Main.java:29) Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:37 no viable alternative at character ' ' at com.datastax.driver.core.Responses$Error.asException(Responses.java:101) at com.datastax.driver.core.SessionManager$1.apply(SessionManager.java:185) at com.datastax.driver.core.SessionManager$1.apply(SessionManager.java:160) at com.google.common.util.concurrent.Futures$1.apply(Futures.java:720) at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:859) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Re: Any use-case about a migration from SQL Server to Cassandra?
This article from Spotify Labs is a really nice write up of migrating SQL (Postgres in this case) to Cassandra Carlos Alonso | Software Engineer | @calonso https://twitter.com/calonso On 23 June 2015 at 20:23, Alex Popescu al...@datastax.com wrote: On Tue, Jun 23, 2015 at 12:13 PM, Marcos Ortiz mlor...@uci.cu wrote: 2- They used heavily C# in a Microsoft-based environment, so I need to know if the .Net driver is ready to use for production The DataStax C# driver has been used in production for quite a while by numerous users. It is the most up-to-date, feature rich, and tunable C# driver for Apache Cassandra and DataStax Enterprise. Anyways, if there's anything missing we are always happy to improve it. (as you can see from my sig, I do work for DataStax, but the above is very true) -- Bests, Alex Popescu | @al3xandru Sen. Product Manager @ DataStax
Re: Adding Nodes With Inconsistent Data
This is no longer an issue in 2.1. https://issues.apache.org/jira/browse/CASSANDRA-2434 We now make sure the replica we bootstrap from is the one that will no longer own that range On Wed, Jun 24, 2015 at 4:58 PM, Alain RODRIGUEZ arodr...@gmail.com wrote: It looks to me that can indeed happen theoretically (I might be wrong). However, - Hinted Handoff tends to remove this issue, if this is big worry, you might want to make sure HH are enabled and well tuned - Read Repairs (synchronous or not) might have mitigate things also, if you read fresh data. You can set this to higher values. - After an outage, you should always run a nodetool repair on the node that went done - following the best practices, or because you understand the reasons - or just trust HH if it is enough to you. So I would say that you can always shoot yourself in your foot, whatever you do, yet following best practices or understanding the internals is the key imho. I would say it is a good question though. Alain. 2015-06-24 19:43 GMT+02:00 Anuj Wadehra anujw_2...@yahoo.co.in: Hi, We faced a scenario where we lost little data after adding 2 nodes in the cluster. There were intermittent dropped mutations in the cluster. Need to verify my understanding how this may have happened to do Root Cause Analysis: Scenario: 3 nodes, RF=3, Read / Write CL= Quorum 1. Due to overloaded cluster, some writes just happened on 2 nodes: node 1 node 2 whike asynchronous mutations dropped on node 3. So say key K with Token T was not written to 3. 2. I added node 4 and suppose as per newly calculated ranges, now token T is supposed to have replicas on node 1, node 3, and node 4. Unfortunately node 4 started bootstrapping from node 3 where key K was missing. 3. After 2 min gap recommended, I added node 5 and as per new token distribution suppose token T now is suppossed to have replicas on node 3, node 4 and node 5. Again node 5 bootstrapped from node 3 where data was misssing. So now key K is lost and thats how we list very few rows. Moreover, in step 1 situation could be worse. we can also have a scenario where some writes just happened on one of three replicas and cassandra chooses replicas where this data is missing for streaming ranges to 2 new nodes. Am I making sense? We are using C* 2.0.3. Thanks Anuj Sent from Yahoo Mail on Android https://overview.mail.yahoo.com/mobile/?.src=Android -- http://twitter.com/tjake
Re: Adding Nodes With Inconsistent Data
It looks to me that can indeed happen theoretically (I might be wrong). However, - Hinted Handoff tends to remove this issue, if this is big worry, you might want to make sure HH are enabled and well tuned - Read Repairs (synchronous or not) might have mitigate things also, if you read fresh data. You can set this to higher values. - After an outage, you should always run a nodetool repair on the node that went done - following the best practices, or because you understand the reasons - or just trust HH if it is enough to you. So I would say that you can always shoot yourself in your foot, whatever you do, yet following best practices or understanding the internals is the key imho. I would say it is a good question though. Alain. 2015-06-24 19:43 GMT+02:00 Anuj Wadehra anujw_2...@yahoo.co.in: Hi, We faced a scenario where we lost little data after adding 2 nodes in the cluster. There were intermittent dropped mutations in the cluster. Need to verify my understanding how this may have happened to do Root Cause Analysis: Scenario: 3 nodes, RF=3, Read / Write CL= Quorum 1. Due to overloaded cluster, some writes just happened on 2 nodes: node 1 node 2 whike asynchronous mutations dropped on node 3. So say key K with Token T was not written to 3. 2. I added node 4 and suppose as per newly calculated ranges, now token T is supposed to have replicas on node 1, node 3, and node 4. Unfortunately node 4 started bootstrapping from node 3 where key K was missing. 3. After 2 min gap recommended, I added node 5 and as per new token distribution suppose token T now is suppossed to have replicas on node 3, node 4 and node 5. Again node 5 bootstrapped from node 3 where data was misssing. So now key K is lost and thats how we list very few rows. Moreover, in step 1 situation could be worse. we can also have a scenario where some writes just happened on one of three replicas and cassandra chooses replicas where this data is missing for streaming ranges to 2 new nodes. Am I making sense? We are using C* 2.0.3. Thanks Anuj Sent from Yahoo Mail on Android https://overview.mail.yahoo.com/mobile/?.src=Android
Re: [MASSMAIL]Re: Any use-case about a migration from SQL Server to Cassandra?
I guess it is this one, enjoy it: https://labs.spotify.com/2015/06/23/user-database-switch/ :-) 2015-06-24 22:57 GMT+02:00 Marcos Ortiz mlor...@uci.cu: Where is the link, Carlos? On 24/06/15 07:18, Carlos Alonso wrote: This article from Spotify Labs is a really nice write up of migrating SQL (Postgres in this case) to Cassandra Carlos Alonso | Software Engineer | @calonso https://twitter.com/calonso On 23 June 2015 at 20:23, Alex Popescu al...@datastax.com wrote: On Tue, Jun 23, 2015 at 12:13 PM, Marcos Ortiz mlor...@uci.cu wrote: 2- They used heavily C# in a Microsoft-based environment, so I need to know if the .Net driver is ready to use for production The DataStax C# driver has been used in production for quite a while by numerous users. It is the most up-to-date, feature rich, and tunable C# driver for Apache Cassandra and DataStax Enterprise. Anyways, if there's anything missing we are always happy to improve it. (as you can see from my sig, I do work for DataStax, but the above is very true) -- Bests, Alex Popescu | @al3xandru Sen. Product Manager @ DataStax -- Marcos Ortiz http://about.me/marcosortiz, Sr. Product Manager (Data Infrastructure) at UCI @marcosluis2186 http://twitter.com/marcosluis2186
Re: [MASSMAIL]Re: Any use-case about a migration from SQL Server to Cassandra?
https://labs.spotify.com/2015/06/23/user-database-switch/ On Wed, Jun 24, 2015 at 5:57 PM, Marcos Ortiz mlor...@uci.cu wrote: Where is the link, Carlos? On 24/06/15 07:18, Carlos Alonso wrote: This article from Spotify Labs is a really nice write up of migrating SQL (Postgres in this case) to Cassandra Carlos Alonso | Software Engineer | @calonso https://twitter.com/calonso On 23 June 2015 at 20:23, Alex Popescu al...@datastax.com wrote: On Tue, Jun 23, 2015 at 12:13 PM, Marcos Ortiz mlor...@uci.cu wrote: 2- They used heavily C# in a Microsoft-based environment, so I need to know if the .Net driver is ready to use for production The DataStax C# driver has been used in production for quite a while by numerous users. It is the most up-to-date, feature rich, and tunable C# driver for Apache Cassandra and DataStax Enterprise. Anyways, if there's anything missing we are always happy to improve it. (as you can see from my sig, I do work for DataStax, but the above is very true) -- Bests, Alex Popescu | @al3xandru Sen. Product Manager @ DataStax -- Marcos Ortiz http://about.me/marcosortiz, Sr. Product Manager (Data Infrastructure) at UCI @marcosluis2186 http://twitter.com/marcosluis2186 -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br http://www.chaordic.com.br/* +55 48 3232.3200
Re: [MASSMAIL]Re: Any use-case about a migration from SQL Server to Cassandra?
Where is the link, Carlos? On 24/06/15 07:18, Carlos Alonso wrote: This article from Spotify Labs is a really nice write up of migrating SQL (Postgres in this case) to Cassandra Carlos Alonso | Software Engineer | @calonso https://twitter.com/calonso On 23 June 2015 at 20:23, Alex Popescu al...@datastax.com mailto:al...@datastax.com wrote: On Tue, Jun 23, 2015 at 12:13 PM, Marcos Ortiz mlor...@uci.cu mailto:mlor...@uci.cu wrote: 2- They used heavily C# in a Microsoft-based environment, so I need to know if the .Net driver is ready to use for production The DataStax C# driver has been used in production for quite a while by numerous users. It is the most up-to-date, feature rich, and tunable C# driver for Apache Cassandra and DataStax Enterprise. Anyways, if there's anything missing we are always happy to improve it. (as you can see from my sig, I do work for DataStax, but the above is very true) -- Bests, Alex Popescu | @al3xandru Sen. Product Manager @ DataStax -- Marcos Ortiz http://about.me/marcosortiz, Sr. Product Manager (Data Infrastructure) at UCI @marcosluis2186 http://twitter.com/marcosluis2186
Re: 10000+ CF support from Cassandra
By entries, do you mean rows or columns? Please clarify how many columns each of your tables has, and how many rows you are populating for each table. In case I didn't make it clear earlier, limit yourself to low hundreds (like 250) of tables and you should be fine. Thousands of tables is a clear anti-pattern for Cassandra - not recommended. If it works for you, great, but if not, don't say you weren't warned. Disabling of slab allocation is an expert-only feature - its use is generally an anti-pattern, not recommended. -- Jack Krupansky On Sun, Jun 21, 2015 at 10:55 PM, Arun Chaitanya chaitan64a...@gmail.com wrote: Hello All, Now we settled on the following approach. I want to know if there are any problems that you foresee in the production environment. Our Approach: Use Off Heap Memory Modifications to default cassandra.yaml and cassandra-env.sh * memory_allocator: JEMallocAllocator (https://issues.apache.org/jira/browse/CASSANDRA-7883) * memtable_allocation_type: offheap_objects By above two, the slab allocation (https://issues.apache.org/jira/browse/CASSANDRA-5935), which requires 1MB heap memory per table, is disabled. The memory for table metadata, caches and memtable are thus allocated natively and does not affect GC performance. * tombstone_failure_threshold: 1 Without this, C* throws TombstoneOverwhelmingException while in startup. This setting looks problematic so I want to know why just creating tables makes so many tombstones ... * -XX:+UseG1GC It is good for reducing GC time. Without this, full GCs 1s are observed. We created 5000 column families with about 1000 entries per column family. The read/write performance seems to stable. The problem we saw is with startup time. Cassandra Start Time (s) 20 349 Average CPU Usage (%) 40 49.65 GC Actitivy (%) 2.6 0.6 Thanks a lot in advance. On Tue, Jun 2, 2015 at 11:26 AM, graham sanderson gra...@vast.com wrote: I strongly advise against this approach. Jon, I think so too. But so you actually foresee any problems with this approach? I can think of a few. [I want to evaluate if we can live with this problem] Just to be clear, I’m not saying this is a great approach, I AM saying that it may be better than having 1+ CFs, which was the original question (it really depends on the use case which wasn’t well defined)… map size limit may be a problem, and then there is the CQL vs thrift question which could start a flame war; ideally CQL maps should give you the same flexibility as arbitrary thrift columns On Jun 1, 2015, at 9:44 PM, Jonathan Haddad j...@jonhaddad.com wrote: Sorry for this naive question but how important is this tuning? Can this have a huge impact in production? Massive. Here's a graph of when we did some JVM tuning at my previous company: http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png About an order of magnitude difference in performance. Jon On Mon, Jun 1, 2015 at 7:20 PM Arun Chaitanya chaitan64a...@gmail.com wrote: Thanks Jon and Jack, I strongly advise against this approach. Jon, I think so too. But so you actually foresee any problems with this approach? I can think of a few. [I want to evaluate if we can live with this problem] - No more CQL. - No data types, everything needs to be a blob. - Limited clustering Keys and default clustering order. First off, different workloads need different tuning. Sorry for this naive question but how important is this tuning? Can this have a huge impact in production? You might want to consider a model where you have an application layer that maps logical tenant tables into partition keys within a single large Casandra table, or at least a relatively small number of Cassandra tables. It will depend on the typical size of your tenant tables - very small ones would make sense within a single partition, while larger ones should have separate partitions for a tenant's data. The key here is that tables are expensive, but partitions are cheap and scale very well with Cassandra. We are actually trying similar approach. But we don't want to expose this to application layer. We are attempting to hide this and provide an API. Finally, you said 10 clusters, but did you mean 10 nodes? You might want to consider a model where you do indeed have multiple clusters, where each handles a fraction of the tenants, since there is no need for separate tenants to be on the same cluster. I meant 10 clusters. We want to split our tables across multiple clusters if above approach is not possible. [But it seems to be very costly] Thanks, On Fri, May 29, 2015 at 5:49 AM, Jack Krupansky jack.krupan...@gmail.com wrote: How big is each of the tables - are they all fairly small or fairly large? Small as in no more than thousands
InvalidQueryException: Invalid amount of bind variables
Hello. I'm having some problems with Cassandra driver for Java. Here is a simple Scala project: https://github.com/afiskon/scala-cassandra-example When I run it I get following output: http://paste.ubuntu.com/11767987/ As I understand this piece of code: ``` private val id = id private val description = description QB.insertInto(table) .value(id, dto.id) .value(description, dto.descr) .getQueryString ``` ... generates query string: INSERT INTO todo_list(id,description) VALUES (1,?) But I can't figure out why the second value is missing. What am I doing wrong? -- Best regards, Eax Melanhovich http://eax.me/
Read is slower in 2.1.6 than 2.0.14?
Hi, we recently experimented read performance on both versions and found read is slower in 2.1.6. Here is our setup: 1. Machines: 3 physical hosts. Each node has 24 cores CPU, 256G memory and 8x600GB SAS disks with raid 1. 2. Replica is 3 and a billion rows of data is inserted. 3. Key cache capacity is increased to 50G on each node. 4. Keep querying the same set of a million partition keys in a loop. Result: For 2.0.14, we can get an average of 6 ms while for 2.1.6, we can only get 18 ms It seems key cache hit rate 0.011 is pretty low even though the same set of keys were used. Has anybody done similar read performance testing? Could you share your results? Thanks, Zhiyan
Re: InvalidQueryException: Invalid amount of bind variables
Ok, I discovered that passing Statement instead of string to executeAsync method solves a problem: https://github.com/afiskon/scala-cassandra-example/commit/4f3f30597a4df340f739e4ec53ec9ee3d87da495 Still, according to documentation for getQueryString method described problem should be considered a bug, right? On Wed, 24 Jun 2015 17:35:22 +0300 Eax Melanhovich m...@eax.me wrote: Hello. I'm having some problems with Cassandra driver for Java. Here is a simple Scala project: https://github.com/afiskon/scala-cassandra-example When I run it I get following output: http://paste.ubuntu.com/11767987/ As I understand this piece of code: ``` private val id = id private val description = description QB.insertInto(table) .value(id, dto.id) .value(description, dto.descr) .getQueryString ``` ... generates query string: INSERT INTO todo_list(id,description) VALUES (1,?) But I can't figure out why the second value is missing. What am I doing wrong? -- Best regards, Eax Melanhovich http://eax.me/
Range not found after nodetool decommission
ERROR [OptionalTasks:1] 2015-06-25 08:56:19,156 CassandraDaemon.java:223 - Exception in thread Thread[OptionalTasks:1,5,main] java.lang.AssertionError: -110036444293069784 not found in -- Ranger Tsao
Re: 10000+ CF support from Cassandra
Hi Jack, When I mean entries, I meant rows. Each column family has about 200 columns. Disabling of slab allocation is an expert-only feature - its use is generally an anti-pattern, not recommended. I understand this and have seen this recommendation at several places. I want to understand the consequences? Is it performance, maintenance or scalability, that is at stake. In our use case we have about 3000 column families (ofcourse modelled in RDBMS). If I were to limit to 250 column families, do you advise us to use multiple clusters (the problem being cost ineffective)? If we were to use a single cluster and support 3000 column families, the only idea is to group few column families and store them in one column family. In this case, grouping is a difficult task, imo. And if we want an abstraction of grouping for developer, we need special connector for Hadoop/Spark systems. So I do not want to enter this territory. Sorry for such questions, but I am still wondering if I am the only one facing this problem. Thanks a lot, Arun On Wed, Jun 24, 2015 at 10:28 PM, Jack Krupansky jack.krupan...@gmail.com wrote: By entries, do you mean rows or columns? Please clarify how many columns each of your tables has, and how many rows you are populating for each table. In case I didn't make it clear earlier, limit yourself to low hundreds (like 250) of tables and you should be fine. Thousands of tables is a clear anti-pattern for Cassandra - not recommended. If it works for you, great, but if not, don't say you weren't warned. Disabling of slab allocation is an expert-only feature - its use is generally an anti-pattern, not recommended. -- Jack Krupansky On Sun, Jun 21, 2015 at 10:55 PM, Arun Chaitanya chaitan64a...@gmail.com wrote: Hello All, Now we settled on the following approach. I want to know if there are any problems that you foresee in the production environment. Our Approach: Use Off Heap Memory Modifications to default cassandra.yaml and cassandra-env.sh * memory_allocator: JEMallocAllocator (https://issues.apache.org/jira/browse/CASSANDRA-7883) * memtable_allocation_type: offheap_objects By above two, the slab allocation (https://issues.apache.org/jira/browse/CASSANDRA-5935), which requires 1MB heap memory per table, is disabled. The memory for table metadata, caches and memtable are thus allocated natively and does not affect GC performance. * tombstone_failure_threshold: 1 Without this, C* throws TombstoneOverwhelmingException while in startup. This setting looks problematic so I want to know why just creating tables makes so many tombstones ... * -XX:+UseG1GC It is good for reducing GC time. Without this, full GCs 1s are observed. We created 5000 column families with about 1000 entries per column family. The read/write performance seems to stable. The problem we saw is with startup time. Cassandra Start Time (s) 20 349 Average CPU Usage (%) 40 49.65 GC Actitivy (%) 2.6 0.6 Thanks a lot in advance. On Tue, Jun 2, 2015 at 11:26 AM, graham sanderson gra...@vast.com wrote: I strongly advise against this approach. Jon, I think so too. But so you actually foresee any problems with this approach? I can think of a few. [I want to evaluate if we can live with this problem] Just to be clear, I’m not saying this is a great approach, I AM saying that it may be better than having 1+ CFs, which was the original question (it really depends on the use case which wasn’t well defined)… map size limit may be a problem, and then there is the CQL vs thrift question which could start a flame war; ideally CQL maps should give you the same flexibility as arbitrary thrift columns On Jun 1, 2015, at 9:44 PM, Jonathan Haddad j...@jonhaddad.com wrote: Sorry for this naive question but how important is this tuning? Can this have a huge impact in production? Massive. Here's a graph of when we did some JVM tuning at my previous company: http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png About an order of magnitude difference in performance. Jon On Mon, Jun 1, 2015 at 7:20 PM Arun Chaitanya chaitan64a...@gmail.com wrote: Thanks Jon and Jack, I strongly advise against this approach. Jon, I think so too. But so you actually foresee any problems with this approach? I can think of a few. [I want to evaluate if we can live with this problem] - No more CQL. - No data types, everything needs to be a blob. - Limited clustering Keys and default clustering order. First off, different workloads need different tuning. Sorry for this naive question but how important is this tuning? Can this have a huge impact in production? You might want to consider a model where you have an application layer that maps logical tenant tables into partition keys within a single
Re: 10000+ CF support from Cassandra
I would say that it's mostly a performance issue, tied to memory management, but the main problem is that a large number of tables invites a whole host of clluster management difficulties that require... expert attention, which then means you need an expert to maintain and enhance it. Cassandra scales in two ways: number of rows and number of nodes, but not number of tables. Both number of tables and number of columns per row need to be kept moderate for your cluster to be manageable and perform well. Adding a tenant ID to your table partition key is the optimal approach to multi-tenancy at this stage with Cassandra. That, and maybe also assigning subsets of the tenants to different tables, as well as having separate clusters if your number of tenants and rows gets too large. -- Jack Krupansky On Wed, Jun 24, 2015 at 11:55 AM, Arun Chaitanya chaitan64a...@gmail.com wrote: Hi Jack, When I mean entries, I meant rows. Each column family has about 200 columns. Disabling of slab allocation is an expert-only feature - its use is generally an anti-pattern, not recommended. I understand this and have seen this recommendation at several places. I want to understand the consequences? Is it performance, maintenance or scalability, that is at stake. In our use case we have about 3000 column families (ofcourse modelled in RDBMS). If I were to limit to 250 column families, do you advise us to use multiple clusters (the problem being cost ineffective)? If we were to use a single cluster and support 3000 column families, the only idea is to group few column families and store them in one column family. In this case, grouping is a difficult task, imo. And if we want an abstraction of grouping for developer, we need special connector for Hadoop/Spark systems. So I do not want to enter this territory. Sorry for such questions, but I am still wondering if I am the only one facing this problem. Thanks a lot, Arun On Wed, Jun 24, 2015 at 10:28 PM, Jack Krupansky jack.krupan...@gmail.com wrote: By entries, do you mean rows or columns? Please clarify how many columns each of your tables has, and how many rows you are populating for each table. In case I didn't make it clear earlier, limit yourself to low hundreds (like 250) of tables and you should be fine. Thousands of tables is a clear anti-pattern for Cassandra - not recommended. If it works for you, great, but if not, don't say you weren't warned. Disabling of slab allocation is an expert-only feature - its use is generally an anti-pattern, not recommended. -- Jack Krupansky On Sun, Jun 21, 2015 at 10:55 PM, Arun Chaitanya chaitan64a...@gmail.com wrote: Hello All, Now we settled on the following approach. I want to know if there are any problems that you foresee in the production environment. Our Approach: Use Off Heap Memory Modifications to default cassandra.yaml and cassandra-env.sh * memory_allocator: JEMallocAllocator (https://issues.apache.org/jira/browse/CASSANDRA-7883) * memtable_allocation_type: offheap_objects By above two, the slab allocation (https://issues.apache.org/jira/browse/CASSANDRA-5935), which requires 1MB heap memory per table, is disabled. The memory for table metadata, caches and memtable are thus allocated natively and does not affect GC performance. * tombstone_failure_threshold: 1 Without this, C* throws TombstoneOverwhelmingException while in startup. This setting looks problematic so I want to know why just creating tables makes so many tombstones ... * -XX:+UseG1GC It is good for reducing GC time. Without this, full GCs 1s are observed. We created 5000 column families with about 1000 entries per column family. The read/write performance seems to stable. The problem we saw is with startup time. Cassandra Start Time (s) 20 349 Average CPU Usage (%) 40 49.65 GC Actitivy (%) 2.6 0.6 Thanks a lot in advance. On Tue, Jun 2, 2015 at 11:26 AM, graham sanderson gra...@vast.com wrote: I strongly advise against this approach. Jon, I think so too. But so you actually foresee any problems with this approach? I can think of a few. [I want to evaluate if we can live with this problem] Just to be clear, I’m not saying this is a great approach, I AM saying that it may be better than having 1+ CFs, which was the original question (it really depends on the use case which wasn’t well defined)… map size limit may be a problem, and then there is the CQL vs thrift question which could start a flame war; ideally CQL maps should give you the same flexibility as arbitrary thrift columns On Jun 1, 2015, at 9:44 PM, Jonathan Haddad j...@jonhaddad.com wrote: Sorry for this naive question but how important is this tuning? Can this have a huge impact in production? Massive. Here's a graph of when we did some JVM tuning at my previous company:
Adding Nodes With Inconsistent Data
Hi, We faced a scenario where we lost little data after adding 2 nodes in the cluster. There were intermittent dropped mutations in the cluster. Need to verify my understanding how this may have happened to do Root Cause Analysis: Scenario: 3 nodes, RF=3, Read / Write CL= Quorum 1. Due to overloaded cluster, some writes just happened on 2 nodes: node 1 node 2 whike asynchronous mutations dropped on node 3. So say key K with Token T was not written to 3. 2. I added node 4 and suppose as per newly calculated ranges, now token T is supposed to have replicas on node 1, node 3, and node 4. Unfortunately node 4 started bootstrapping from node 3 where key K was missing. 3. After 2 min gap recommended, I added node 5 and as per new token distribution suppose token T now is suppossed to have replicas on node 3, node 4 and node 5. Again node 5 bootstrapped from node 3 where data was misssing. So now key K is lost and thats how we list very few rows. Moreover, in step 1 situation could be worse. we can also have a scenario where some writes just happened on one of three replicas and cassandra chooses replicas where this data is missing for streaming ranges to 2 new nodes. Am I making sense? We are using C* 2.0.3. Thanks Anuj Sent from Yahoo Mail on Android