Re: large range read in Cassandra
For the benefit of others, I ended up finding out that the CQL library I was using (https://github.com/gocql/gocql) at this time leaves paging page size defaulted to no paging, so Cassandra was trying to pull all rows of the partition into memory at once. Setting the page size to a reasonable number seems to have done the trick. On Tue, Nov 25, 2014 at 2:54 PM, Dan Kinder dkin...@turnitin.com wrote: Thanks, very helpful Rob, I'll watch for that. On Tue, Nov 25, 2014 at 11:45 AM, Robert Coli rc...@eventbrite.com wrote: On Tue, Nov 25, 2014 at 10:45 AM, Dan Kinder dkin...@turnitin.com wrote: To be clear, I expect this range query to take a long time and perform relatively heavy I/O. What I expected Cassandra to do was use auto-paging ( https://issues.apache.org/jira/browse/CASSANDRA-4415, http://stackoverflow.com/questions/17664438/iterating-through-cassandra-wide-row-with-cql3) so that we aren't literally pulling the entire thing in. Am I misunderstanding this use case? Could you clarify why exactly it would slow way down? It seems like with each read it should be doing a simple range read from one or two sstables. If you're paging through a single partition, that's likely to be fine. When you said range reads ... over rows my impression was you were talking about attempting to page through millions of partitions. With that confusion cleared up, the likely explanation for lack of availability in your case is heap pressure/GC time. Look for GCs around that time. Also, if you're using authentication, make sure that your authentication keyspace has a replication factor greater than 1. =Rob -- Dan Kinder Senior Software Engineer Turnitin – www.turnitin.com dkin...@turnitin.com -- Dan Kinder Senior Software Engineer Turnitin – www.turnitin.com dkin...@turnitin.com
Re: Upgrading from 1.2 to 2.1 questions
I would not use 2.1.2 for production yet. It doesn't seem stable enough based on the feedbacks I see here. The newest 2.0.12 may be a better option. On Feb 2, 2015 8:43 AM, Sibbald, Charles charles.sibb...@bskyb.com wrote: Hi Oleg, What is the minor version of 1.2? I am looking to do the same for 1.2.14 in a very large cluster. Regards Charles On 02/02/2015 13:33, Oleg Dulin oleg.du...@gmail.com wrote: Dear Distinguished Colleagues: We'd like to upgrade our cluster from 1.2 to 2.0 and then to 2.1 . We are using Pelops Thrift client, which has long been abandoned by its authors. I've read that 2.x has changes to the Thrift protocol making it incompatible with 1.2 (and of course now the link to that site eludes me). If that is true, we need to first upgrade our Thrift client and then upgrade cassandra. Let's start by confirming if that indeed is the case -- if that is true, I have my work cut out for me. Anyone knows for sure ? Regards, Oleg Information in this email including any attachments may be privileged, confidential and is intended exclusively for the addressee. The views expressed may not be official policy, but the personal views of the originator. If you have received it in error, please notify the sender by return e-mail and delete it from your system. You should not reproduce, distribute, store, retransmit, use or disclose its contents to anyone. Please note we reserve the right to monitor all e-mail communication through our internal and external networks. SKY and the SKY marks are trademarks of British Sky Broadcasting Group plc and Sky International AG and are used under licence. British Sky Broadcasting Limited (Registration No. 2906991), Sky-In-Home Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are direct or indirect subsidiaries of British Sky Broadcasting Group plc (Registration No. 2247735). All of the companies mentioned in this paragraph are incorporated in England and Wales and share the same registered office at Grant Way, Isleworth, Middlesex TW7 5QD.
Re: Upgrading from 1.2 to 2.1 questions
Sure but the question is really about going from 1.2 to 2.0 ... On 2015-02-02 13:59:27 +, Kai Wang said: I would not use 2.1.2 for production yet. It doesn't seem stable enough based on the feedbacks I see here. The newest 2.0.12 may be a better option. On Feb 2, 2015 8:43 AM, Sibbald, Charles charles.sibb...@bskyb.com wrote: Hi Oleg, What is the minor version of 1.2? I am looking to do the same for 1.2.14 in a very large cluster. Regards Charles On 02/02/2015 13:33, Oleg Dulin oleg.du...@gmail.com wrote: Dear Distinguished Colleagues: We'd like to upgrade our cluster from 1.2 to 2.0 and then to 2.1 . We are using Pelops Thrift client, which has long been abandoned by its authors. I've read that 2.x has changes to the Thrift protocol making it incompatible with 1.2 (and of course now the link to that site eludes me). If that is true, we need to first upgrade our Thrift client and then upgrade cassandra. Let's start by confirming if that indeed is the case -- if that is true, I have my work cut out for me. Anyone knows for sure ? Regards, Oleg Information in this email including any attachments may be privileged, confidential and is intended exclusively for the addressee. The views expressed may not be official policy, but the personal views of the originator. If you have received it in error, please notify the sender by return e-mail and delete it from your system. You should not reproduce, distribute, store, retransmit, use or disclose its contents to anyone. Please note we reserve the right to monitor all e-mail communication through our internal and external networks. SKY and the SKY marks are trademarks of British Sky Broadcasting Group plc and Sky International AG and are used under licence. British Sky Broadcasting Limited (Registration No. 2906991), Sky-In-Home Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are direct or indirect subsidiaries of British Sky Broadcasting Group plc (Registration No. 2247735). All of the companies mentioned in this paragraph are incorporated in England and Wales and share the same registered office at Grant Way, Isleworth, Middlesex TW7 5QD.
Re: Any problem mounting a keyspace directory in ram memory?
Hi Colin, Yes, we don't want to use the C* in-memory, we just want to mount the keyspace data directory to RAM instead of leaving it on the spinning disks. My question is more related to the technical side of mounting the keyspace data folder to the ram memory than checking if Cassandra has some in-memory feature. My intention is to understand if mounting a keyspace data directory to RAM could cause any technical problems... On our point of it shouldn't, as we are just moving the directory to be stored on RAM instead of the spinning disks. Thanks so much. Em 02/02/2015, às 05:15, Colin co...@clark.ws escreveu: Until the in-memory option stores data off heap, I would strongly recommend staying away from this option. This was a marketing driven hack in my opinion. -- Colin Clark +1 612 859 6129 Skype colin.p.clark On Feb 2, 2015, at 5:31 AM, Jan cne...@yahoo.com wrote: HI Gabriel; I don't think Apache Cassandra supports in-memory keyspaces. However Datastax Enterprise does support it. Quoting from Datastax: DataStax Enterprise includes the in-memory option for storing data to and accessing data from memory exclusively. No disk I/O occurs. Consider using the in-memory option for storing a modest amount of data, mostly composed of overwrites, such as an application for mirroring stock exchange data. Only the prices fluctuate greatly while the keys for the data remain relatively constant. Generally, the table you design for use in-memory should have the following characteristics: Store a small amount of data Experience a workload that is mostly overwrites Be heavily trafficked Using the in-memory option | DataStax Enterprise 4.0 Documentation Using the in-memory option | DataStax Enterprise 4.0 Documentation Using the in-memory option View on www.datastax.com Preview by Yahoo hope this helps Jan C* Architect On Sunday, February 1, 2015 1:32 PM, Gabriel Menegatti gabr...@s1mbi0se.com.br wrote: Hi guys, Please, does anyone here already mounted a specific keyspace directory to ram memory using tmpfs? Do you see any problem doing so, except by the fact that the data can be lost? Thanks in advance. Regards, Gabriel.
Re: Upgrading from 1.2 to 2.1 questions
Our minor version is 1.2.15 ... I am not looking forward to the experience, and would like to gather as much information as possible. This presents an opportunity to also review the data structures we use and possibly move them out of Cassandra. Oleg On 2015-02-02 13:42:52 +, Sibbald, Charles said: Hi Oleg, What is the minor version of 1.2? I am looking to do the same for 1.2.14 in a very large cluster. Regards Charles On 02/02/2015 13:33, Oleg Dulin oleg.du...@gmail.com wrote: Dear Distinguished Colleagues: We'd like to upgrade our cluster from 1.2 to 2.0 and then to 2.1 . We are using Pelops Thrift client, which has long been abandoned by its authors. I've read that 2.x has changes to the Thrift protocol making it incompatible with 1.2 (and of course now the link to that site eludes me). If that is true, we need to first upgrade our Thrift client and then upgrade cassandra. Let's start by confirming if that indeed is the case -- if that is true, I have my work cut out for me. Anyone knows for sure ? Regards, Oleg Information in this email including any attachments may be privileged, confidential and is intended exclusively for the addressee. The views expressed may not be official policy, but the personal views of the originator. If you have received it in error, please notify the sender by return e-mail and delete it from your system. You should not reproduce, distribute, store, retransmit, use or disclose its contents to anyone. Please note we reserve the right to monitor all e-mail communication through our internal and external networks. SKY and the SKY marks are trademarks of British Sky Broadcasting Group plc and Sky International AG and are used under licence. British Sky Broadcasting Limited (Registration No. 2906991), Sky-In-Home Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are direct or indirect subsidiaries of British Sky Broadcasting Group plc (Registration No. 2247735). All of the companies mentioned in this paragraph are incorporated in England and Wales and share the same registered office at Grant Way, Isleworth, Middlesex TW7 5QD.
Upgrading from 1.2 to 2.1 questions
Dear Distinguished Colleagues: We'd like to upgrade our cluster from 1.2 to 2.0 and then to 2.1 . We are using Pelops Thrift client, which has long been abandoned by its authors. I've read that 2.x has changes to the Thrift protocol making it incompatible with 1.2 (and of course now the link to that site eludes me). If that is true, we need to first upgrade our Thrift client and then upgrade cassandra. Let's start by confirming if that indeed is the case -- if that is true, I have my work cut out for me. Anyone knows for sure ? Regards, Oleg
Re: Upgrading from 1.2 to 2.1 questions
Hi Oleg, What is the minor version of 1.2? I am looking to do the same for 1.2.14 in a very large cluster. Regards Charles On 02/02/2015 13:33, Oleg Dulin oleg.du...@gmail.com wrote: Dear Distinguished Colleagues: We'd like to upgrade our cluster from 1.2 to 2.0 and then to 2.1 . We are using Pelops Thrift client, which has long been abandoned by its authors. I've read that 2.x has changes to the Thrift protocol making it incompatible with 1.2 (and of course now the link to that site eludes me). If that is true, we need to first upgrade our Thrift client and then upgrade cassandra. Let's start by confirming if that indeed is the case -- if that is true, I have my work cut out for me. Anyone knows for sure ? Regards, Oleg Information in this email including any attachments may be privileged, confidential and is intended exclusively for the addressee. The views expressed may not be official policy, but the personal views of the originator. If you have received it in error, please notify the sender by return e-mail and delete it from your system. You should not reproduce, distribute, store, retransmit, use or disclose its contents to anyone. Please note we reserve the right to monitor all e-mail communication through our internal and external networks. SKY and the SKY marks are trademarks of British Sky Broadcasting Group plc and Sky International AG and are used under licence. British Sky Broadcasting Limited (Registration No. 2906991), Sky-In-Home Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are direct or indirect subsidiaries of British Sky Broadcasting Group plc (Registration No. 2247735). All of the companies mentioned in this paragraph are incorporated in England and Wales and share the same registered office at Grant Way, Isleworth, Middlesex TW7 5QD.
RE: FW: How to use cqlsh to access Cassandra DB if the client_encryption_options is enabled
Hi, Holmberg, I tried your suggestion and run the following command: keytool –exportcert –keystore path-to-my-keystore-file –storepass my-keystore-password –storetype JKS –file path-to-outptfile and I got following error: keytool error: java.lang.Exception: Alias mykey does not exist Do you know how to fix this issue? Thanks Boying From: Adam Holmberg [mailto:adam.holmb...@datastax.com] Sent: 2015年1月31日 1:12 To: user@cassandra.apache.org Subject: Re: FW: How to use cqlsh to access Cassandra DB if the client_encryption_options is enabled Assuming the truststore you are referencing is the same one the server is using, it's probably in the wrong format. You will need to export the cert into a PEM format for use in the (Python) cqlsh client. If exporting from the java keystore format, use keytool -exportcert source keystore, pass, etc -rfc -file output file If you have the crt file, you should be able to accomplish the same using openssl: openssl x509 -in in crt -inform DER -out output file -outform PEM Then, you should refer to that PEM file in your command. Alternatively, you can specify a path to the file (along with other options) in your cqlshrc file. References: How cqlsh picks up ssl optionshttps://github.com/apache/cassandra/blob/cassandra-2.1/pylib/cqlshlib/sslhandling.py Example cqlshrc filehttps://github.com/apache/cassandra/blob/cassandra-2.1/conf/cqlshrc.sample Adam Holmberg On Wed, Jan 28, 2015 at 1:08 AM, Lu, Boying boying...@emc.commailto:boying...@emc.com wrote: Hi, All, Does anyone know the answer? Thanks a lot Boying From: Lu, Boying Sent: 2015年1月6日 11:21 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: How to use cqlsh to access Cassandra DB if the client_encryption_options is enabled Hi, All, I turned on the dbclient_encryption_options like this: client_encryption_options: enabled: true keystore: path-to-my-keystore-file keystore_password: my-keystore-password truststore: path-to-my-truststore-file truststore_password: my-truststore-password … I can use following cassandra-cli command to access DB: cassandra-cli -ts path-to-my-truststore-file –tspw my-truststore-password –tf org.apache.cassandra.thrift.SSLTransportFactory But when I tried to access DB by cqlsh like this: SSL_CERTFILE=path-to-my-truststore cqlsh –t cqlishlib.ssl.ssl_transport_factory I got following error: Connection error: Could not connect to localhost:9160: [Errno 0] _ssl.c:332: error::lib(0):func(0):reason(0) I guess the reason maybe is that I didn’t provide the trustore password. But cqlsh doesn’t provide such option. Does anyone know how to resolve this issue? Thanks Boying
Re: Any problem mounting a keyspace directory in ram memory?
Hi Jan, Thanks for your reply, but C* in-memory just supports 1 GB keyspaces at the moment, what is not enough for us. My question is more related to the technical side of mounting the keyspace data folder to the ram memory than checking if Cassandra has some in-memory feature. My intention is to understand if mounting a keyspace data directory to RAM could cause any technical problems... On our point of it shouldn't, as we are just moving the directory to be stored on RAM instead of in spinning disks. Thanks so much. Em 02/02/2015, às 02:31, Jan cne...@yahoo.com escreveu: HI Gabriel; I don't think Apache Cassandra supports in-memory keyspaces. However Datastax Enterprise does support it. Quoting from Datastax: DataStax Enterprise includes the in-memory option for storing data to and accessing data from memory exclusively. No disk I/O occurs. Consider using the in-memory option for storing a modest amount of data, mostly composed of overwrites, such as an application for mirroring stock exchange data. Only the prices fluctuate greatly while the keys for the data remain relatively constant. Generally, the table you design for use in-memory should have the following characteristics: Store a small amount of data Experience a workload that is mostly overwrites Be heavily trafficked Using the in-memory option | DataStax Enterprise 4.0 Documentation Using the in-memory option | DataStax Enterprise 4.0 Documentation Using the in-memory option View on www.datastax.com Preview by Yahoo hope this helps Jan C* Architect On Sunday, February 1, 2015 1:32 PM, Gabriel Menegatti gabr...@s1mbi0se.com.br wrote: Hi guys, Please, does anyone here already mounted a specific keyspace directory to ram memory using tmpfs? Do you see any problem doing so, except by the fact that the data can be lost? Thanks in advance. Regards, Gabriel.
Re: Any problem mounting a keyspace directory in ram memory?
At least I cannot think of any reason why it wouldn't work. As you said, you might lose the data but if you can live with that then why not. Hannu On 02.02.2015, at 14:21 , Gabriel Menegatti gabr...@s1mbi0se.com.br wrote: Hi Colin, Yes, we don't want to use the C* in-memory, we just want to mount the keyspace data directory to RAM instead of leaving it on the spinning disks. My question is more related to the technical side of mounting the keyspace data folder to the ram memory than checking if Cassandra has some in-memory feature. My intention is to understand if mounting a keyspace data directory to RAM could cause any technical problems... On our point of it shouldn't, as we are just moving the directory to be stored on RAM instead of the spinning disks. Thanks so much. Em 02/02/2015, às 05:15, Colin co...@clark.ws mailto:co...@clark.ws escreveu: Until the in-memory option stores data off heap, I would strongly recommend staying away from this option. This was a marketing driven hack in my opinion. -- Colin Clark +1 612 859 6129 Skype colin.p.clark On Feb 2, 2015, at 5:31 AM, Jan cne...@yahoo.com mailto:cne...@yahoo.com wrote: HI Gabriel; I don't think Apache Cassandra supports in-memory keyspaces. However Datastax Enterprise does support it. Quoting from Datastax: DataStax Enterprise includes the in-memory option for storing data to and accessing data from memory exclusively. No disk I/O occurs. Consider using the in-memory option for storing a modest amount of data, mostly composed of overwrites, such as an application for mirroring stock exchange data. Only the prices fluctuate greatly while the keys for the data remain relatively constant. Generally, the table you design for use in-memory should have the following characteristics: Store a small amount of data Experience a workload that is mostly overwrites Be heavily trafficked Using the in-memory option | DataStax Enterprise 4.0 Documentation http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/inMemory.html Using the in-memory option | DataStax Enterprise 4.0 Documentation http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/inMemory.htmlUsing the in-memory option View on www.datastax.com http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/inMemory.html Preview by Yahoo hope this helps Jan C* Architect On Sunday, February 1, 2015 1:32 PM, Gabriel Menegatti gabr...@s1mbi0se.com.br mailto:gabr...@s1mbi0se.com.br wrote: Hi guys, Please, does anyone here already mounted a specific keyspace directory to ram memory using tmpfs? Do you see any problem doing so, except by the fact that the data can be lost? Thanks in advance. Regards, Gabriel.
Re: How to deal with too many sstables
Just run nodetool repair. The nodes witch has many sstables are newest in my cluster.Before add these nodes to my cluster ,my cluster have not compaction automaticly because my cluster is an only write cluster. thanks. -- 曹志富 手机:18611121927 邮箱:caozf.zh...@gmail.com 微博:http://weibo.com/boliza/ 2015-02-03 12:16 GMT+08:00 Flavien Charlon flavien.char...@gmail.com: Did you run incremental repair? Incremental repair is broken in 2.1 and tends to create way too many SSTables. On 2 February 2015 at 18:05, 曹志富 cao.zh...@gmail.com wrote: Hi,all: I have 18 nodes C* cluster with cassandra2.1.2.Some nodes have aboud 40,000+ sstables. my compaction strategy is STCS. Could someone give me some solution to deal with this situation. Thanks. -- 曹志富 手机:18611121927 邮箱:caozf.zh...@gmail.com 微博:http://weibo.com/boliza/
Re: How to deal with too many sstables
You are right.I have already change cold_reads_to_omit to 0.0. -- 曹志富 手机:18611121927 邮箱:caozf.zh...@gmail.com 微博:http://weibo.com/boliza/ 2015-02-03 14:15 GMT+08:00 Roland Etzenhammer r.etzenham...@t-online.de: Hi, maybe you are running into an issue that I also had on my test cluster. Since there were almost no reads on it cassandra did not run any minor compactions at all. Solution for me (in this case) was: ALTER TABLE tablename WITH compaction = {'class': 'SizeTieredCompactionStrategy', 'min_threshold': '4', 'max_threshold': '32', 'cold_reads_to_omit': 0.0}; where cold_reads_to_omit is the trick. Anyway as Eric and Marcus among others suggest, do not run 2.1.2 for production as it has many issues. I'm looking forward to test 2.1.3 when it arrives. Cheers, Roland Am 03.02.2015 um 03:05 schrieb 曹志富: Hi,all: I have 18 nodes C* cluster with cassandra2.1.2.Some nodes have aboud 40,000+ sstables. my compaction strategy is STCS. Could someone give me some solution to deal with this situation. Thanks. -- 曹志富 手机:18611121927 邮箱:caozf.zh...@gmail.com 微博:http://weibo.com/boliza/
Re: How to deal with too many sstables
https://issues.apache.org/jira/browse/CASSANDRA-8635 On Tue, Feb 3, 2015 at 5:47 AM, 曹志富 cao.zh...@gmail.com wrote: Just run nodetool repair. The nodes witch has many sstables are newest in my cluster.Before add these nodes to my cluster ,my cluster have not compaction automaticly because my cluster is an only write cluster. thanks. -- 曹志富 手机:18611121927 邮箱:caozf.zh...@gmail.com 微博:http://weibo.com/boliza/ 2015-02-03 12:16 GMT+08:00 Flavien Charlon flavien.char...@gmail.com: Did you run incremental repair? Incremental repair is broken in 2.1 and tends to create way too many SSTables. On 2 February 2015 at 18:05, 曹志富 cao.zh...@gmail.com wrote: Hi,all: I have 18 nodes C* cluster with cassandra2.1.2.Some nodes have aboud 40,000+ sstables. my compaction strategy is STCS. Could someone give me some solution to deal with this situation. Thanks. -- 曹志富 手机:18611121927 邮箱:caozf.zh...@gmail.com 微博:http://weibo.com/boliza/
Re: How to deal with too many sstables
Did you run incremental repair? Incremental repair is broken in 2.1 and tends to create way too many SSTables. On 2 February 2015 at 18:05, 曹志富 cao.zh...@gmail.com wrote: Hi,all: I have 18 nodes C* cluster with cassandra2.1.2.Some nodes have aboud 40,000+ sstables. my compaction strategy is STCS. Could someone give me some solution to deal with this situation. Thanks. -- 曹志富 手机:18611121927 邮箱:caozf.zh...@gmail.com 微博:http://weibo.com/boliza/
Re: Upgrading from 1.2 to 2.1 questions
Using Pycassa (https://github.com/pycassa/pycassa)I had no trouble with the Clients writing/reading from 1.2.x to 2.0.x (Can't recall the minor versions out of my head right now). Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Tel: 1649 www.pythian.com On Mon, Feb 2, 2015 at 3:21 PM, Oleg Dulin oleg.du...@gmail.com wrote: Sure but the question is really about going from 1.2 to 2.0 ... On 2015-02-02 13:59:27 +, Kai Wang said: I would not use 2.1.2 for production yet. It doesn't seem stable enough based on the feedbacks I see here. The newest 2.0.12 may be a better option. On Feb 2, 2015 8:43 AM, Sibbald, Charles charles.sibb...@bskyb.com wrote: Hi Oleg, What is the minor version of 1.2? I am looking to do the same for 1.2.14 in a very large cluster. Regards Charles On 02/02/2015 13:33, Oleg Dulin oleg.du...@gmail.com wrote: Dear Distinguished Colleagues: We'd like to upgrade our cluster from 1.2 to 2.0 and then to 2.1 . We are using Pelops Thrift client, which has long been abandoned by its authors. I've read that 2.x has changes to the Thrift protocol making it incompatible with 1.2 (and of course now the link to that site eludes me). If that is true, we need to first upgrade our Thrift client and then upgrade cassandra. Let's start by confirming if that indeed is the case -- if that is true, I have my work cut out for me. Anyone knows for sure ? Regards, Oleg Information in this email including any attachments may be privileged, confidential and is intended exclusively for the addressee. The views expressed may not be official policy, but the personal views of the originator. If you have received it in error, please notify the sender by return e-mail and delete it from your system. You should not reproduce, distribute, store, retransmit, use or disclose its contents to anyone. Please note we reserve the right to monitor all e-mail communication through our internal and external networks. SKY and the SKY marks are trademarks of British Sky Broadcasting Group plc and Sky International AG and are used under licence. British Sky Broadcasting Limited (Registration No. 2906991), Sky-In-Home Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are direct or indirect subsidiaries of British Sky Broadcasting Group plc (Registration No. 2247735). All of th! e compani es mentioned in this paragraph are incorporated in England and Wales and share the same registered office at Grant Way, Isleworth, Middlesex TW7 5QD. -- --
Re: Upgrading from 1.2 to 2.1 questions
What about Java clients that were built for 1.2 and how they work with 2.0 ? On 2015-02-02 14:32:53 +, Carlos Rolo said: Using Pycassa (https://github.com/pycassa/pycassa)I had no trouble with the Clients writing/reading from 1.2.x to 2.0.x (Can't recall the minor versions out of my head right now). Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo Tel: 1649 www.pythian.com On Mon, Feb 2, 2015 at 3:21 PM, Oleg Dulin oleg.du...@gmail.com wrote: Sure but the question is really about going from 1.2 to 2.0 ... On 2015-02-02 13:59:27 +, Kai Wang said: I would not use 2.1.2 for production yet. It doesn't seem stable enough based on the feedbacks I see here. The newest 2.0.12 may be a better option. On Feb 2, 2015 8:43 AM, Sibbald, Charles charles.sibb...@bskyb.com wrote: Hi Oleg, What is the minor version of 1.2? I am looking to do the same for 1.2.14 in a very large cluster. Regards Charles On 02/02/2015 13:33, Oleg Dulin oleg.du...@gmail.com wrote: Dear Distinguished Colleagues: We'd like to upgrade our cluster from 1.2 to 2.0 and then to 2.1 . We are using Pelops Thrift client, which has long been abandoned by its authors. I've read that 2.x has changes to the Thrift protocol making it incompatible with 1.2 (and of course now the link to that site eludes me). If that is true, we need to first upgrade our Thrift client and then upgrade cassandra. Let's start by confirming if that indeed is the case -- if that is true, I have my work cut out for me. Anyone knows for sure ? Regards, Oleg Information in this email including any attachments may be privileged, confidential and is intended exclusively for the addressee. The views expressed may not be official policy, but the personal views of the originator. If you have received it in error, please notify the sender by return e-mail and delete it from your system. You should not reproduce, distribute, store, retransmit, use or disclose its contents to anyone. Please note we reserve the right to monitor all e-mail communication through our internal and external networks. SKY and the SKY marks are trademarks of British Sky Broadcasting Group plc and Sky International AG and are used under licence. British Sky Broadcasting Limited (Registration No. 2906991), Sky-In-Home Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are direct or indirect subsidiaries of British Sky Broadcasting Group plc (Registration No. 2247735). All of th! e compani es mentioned in this paragraph are incorporated in England and Wales and share the same registered office at Grant Way, Isleworth, Middlesex TW7 5QD. -- -- Regards, Oleg Dulin http://www.olegdulin.com
Help on modeling a table
HI All We are working on a application logging project and this is one of the search tables as below : CREATE TABLE logentries ( logentrytimestamputcguid timeuuid PRIMARY KEY, context text, date_to_hour bigint, durationinseconds float, eventtimestamputc timestamp, ipaddress inet, logentrytimestamputc timestamp, loglevel int, logmessagestring text, logsequence int, message text, modulename text, productname text, searchitems maptext, text, servername text, sessionname text, stacktrace text, threadname text, timefinishutc timestamp, timestartutc timestamp, urihostname text, uripathvalue text, uriquerystring text, useragentstring text, username text ); I have some queries on the design of this table : 1) Does a timeuuid is a good candidate for partition key as we would be querying other fields with stargate-core full text project This table is actually be used for search like username like '*john' likewise and uing this present model the performance is very slow . Please advise Regards Asit
Re: Help on modeling a table
A leading wildcard is one of the slowest things you can do with Lucene, and not a recommended practice, so either accept that it is slow or don't do it. That said, there is a trick you can do with a reverse wildcard filter, but that's an expert-level feature and not recommended for average developers. -- Jack Krupansky On Mon, Feb 2, 2015 at 10:33 AM, Asit KAUSHIK asitkaushikno...@gmail.com wrote: HI All We are working on a application logging project and this is one of the search tables as below : CREATE TABLE logentries ( logentrytimestamputcguid timeuuid PRIMARY KEY, context text, date_to_hour bigint, durationinseconds float, eventtimestamputc timestamp, ipaddress inet, logentrytimestamputc timestamp, loglevel int, logmessagestring text, logsequence int, message text, modulename text, productname text, searchitems maptext, text, servername text, sessionname text, stacktrace text, threadname text, timefinishutc timestamp, timestartutc timestamp, urihostname text, uripathvalue text, uriquerystring text, useragentstring text, username text ); I have some queries on the design of this table : 1) Does a timeuuid is a good candidate for partition key as we would be querying other fields with stargate-core full text project This table is actually be used for search like username like '*john' likewise and uing this present model the performance is very slow . Please advise Regards Asit
Re: Help on modeling a table
I'll try your recommendations and would update on the same Thanks so much Cheers Asit On Mon, Feb 2, 2015, 9:56 PM Eric Stevens migh...@gmail.com wrote: Just a minor observation: those field names are extremely long. You store a copy of every field name with every value with only a couple of exceptions: http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningUserData_t.html Your partition key column name (logentrytimestamputcguid) is just kept in the schema, so the length of that name doesn't impact your storage costs. Also clustering keys (you have none) you pay to store the *value* (not the name) before all other non clustering columns. Generally it's a good idea to prefer short column names over long ones. It increases application and diagnostic complexity, but we try to keep our column names under 4 bytes. This storage overhead for column names is reduced if you use sstable compression, but at the cost of an increase in CPU time. On Mon, Feb 2, 2015 at 8:33 AM, Asit KAUSHIK asitkaushikno...@gmail.com wrote: HI All We are working on a application logging project and this is one of the search tables as below : CREATE TABLE logentries ( logentrytimestamputcguid timeuuid PRIMARY KEY, context text, date_to_hour bigint, durationinseconds float, eventtimestamputc timestamp, ipaddress inet, logentrytimestamputc timestamp, loglevel int, logmessagestring text, logsequence int, message text, modulename text, productname text, searchitems maptext, text, servername text, sessionname text, stacktrace text, threadname text, timefinishutc timestamp, timestartutc timestamp, urihostname text, uripathvalue text, uriquerystring text, useragentstring text, username text ); I have some queries on the design of this table : 1) Does a timeuuid is a good candidate for partition key as we would be querying other fields with stargate-core full text project This table is actually be used for search like username like '*john' likewise and uing this present model the performance is very slow . Please advise Regards Asit
Re: Help on modeling a table
Just a minor observation: those field names are extremely long. You store a copy of every field name with every value with only a couple of exceptions: http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningUserData_t.html Your partition key column name (logentrytimestamputcguid) is just kept in the schema, so the length of that name doesn't impact your storage costs. Also clustering keys (you have none) you pay to store the *value* (not the name) before all other non clustering columns. Generally it's a good idea to prefer short column names over long ones. It increases application and diagnostic complexity, but we try to keep our column names under 4 bytes. This storage overhead for column names is reduced if you use sstable compression, but at the cost of an increase in CPU time. On Mon, Feb 2, 2015 at 8:33 AM, Asit KAUSHIK asitkaushikno...@gmail.com wrote: HI All We are working on a application logging project and this is one of the search tables as below : CREATE TABLE logentries ( logentrytimestamputcguid timeuuid PRIMARY KEY, context text, date_to_hour bigint, durationinseconds float, eventtimestamputc timestamp, ipaddress inet, logentrytimestamputc timestamp, loglevel int, logmessagestring text, logsequence int, message text, modulename text, productname text, searchitems maptext, text, servername text, sessionname text, stacktrace text, threadname text, timefinishutc timestamp, timestartutc timestamp, urihostname text, uripathvalue text, uriquerystring text, useragentstring text, username text ); I have some queries on the design of this table : 1) Does a timeuuid is a good candidate for partition key as we would be querying other fields with stargate-core full text project This table is actually be used for search like username like '*john' likewise and uing this present model the performance is very slow . Please advise Regards Asit
Re: Cassandra on Ceph
Colin, I'm not familiar with Ceph, but it sounds like it's a more sophisticated version of a SAN. Be aware that running Cassandra on absolutely anything other than local disks is an anti-pattern. It will have a profound negative impact on performance, scalability, and reliability of your cluster. On Sun, Feb 1, 2015 at 8:13 PM, Colin Taylor colin.tay...@gmail.com wrote: Oops - Nonetheless in on my environments - Nonetheless in *one of* my environments On 2 February 2015 at 16:12, Colin Taylor colin.tay...@gmail.com wrote: Thanks all for you input. I'm aware of the overlap, I'm aware I need to turn Ceph replication off, I'm aware this isn't ideal. Nonetheless in on my environments instead of raw disk to install C* on, I'm likely to just have Ceph storage. This is a fully managed environment (excepting for C*) and that's their standard. cheers Colin On 2 February 2015 at 14:42, Daniel Compton daniel.compton.li...@gmail.com wrote: As Jan has already mentioned, Ceph and Cassandra do almost all of the same things. Replicated self healing data storage on commodity hardware without a SPOF describes both of these systems. If you did manage to get it running it would be a nightmare to reason about what's happening at the disk and network level. You're going to get write amplification by your replication factor of both Cassandra, and Ceph unless you turn one of them down. This impacts disk I/O, disk space, CPU, and network bandwidth. If you turned down Ceph replication I think it would be possible for all of the replicated data for some chunk to be stored on one node and be at risk of loss. E.g. 1x Ceph, 3x Cassandra replication could store all 3 Cassandra replicas on the same Ceph node. 3x Ceph, 1x Cassandra would be safer, but presumably slower. Lastly Cassandra is designed around running against local disks, you will lose a lot of the advantages of this running it on Ceph. Daniel. On Mon, 2 Feb 2015 at 1:11 am Baskar Duraikannu baskar.duraika...@outlook.com wrote: What is the reason for running Cassandra on Ceph? I have both running in my environment but doing different things - Cassandra as transactional store and Ceph as block storage for storing files. -- From: Jan cne...@yahoo.com Sent: 2/1/2015 2:53 AM To: user@cassandra.apache.org Subject: Re: Cassandra on Ceph Colin; Ceph is a block based storage architecture based on RADOS. It comes with its own replication rebalancing along with a map of the storage layer. Some more details similarities: a)Ceph stores a client’s data as objects within storage pools. (think of C* partitions) b) Using the CRUSH algorithm, Ceph calculates which placement group should contain the object, (C* primary keys vnode data distribution) c) and further calculates which Ceph OSD Daemon should store the placement group (C* node locality) d) The CRUSH algorithm enables the Ceph Storage Cluster to scale, rebalance, and recover dynamically (C* big table storage architecture). Summary: C* comes with everything that Ceph provides (with the exception of block storage). There is no value add that Ceph brings to the table that C* does not already provide. I seriously doubt if C* could even work out of the box with yet another level of replication rebalancing. Hope this helps Jan/ C* Architect On Saturday, January 31, 2015 7:28 PM, Colin Taylor colin.tay...@gmail.com wrote: I may be forced to run Cassandra on top of Ceph. Does anyone have experience / tips with this. Or alternatively, strong reasons why this won't work. cheers Colin
How to deal with too many sstables
Hi,all: I have 18 nodes C* cluster with cassandra2.1.2.Some nodes have aboud 40,000+ sstables. my compaction strategy is STCS. Could someone give me some solution to deal with this situation. Thanks. -- 曹志富 手机:18611121927 邮箱:caozf.zh...@gmail.com 微博:http://weibo.com/boliza/
Re: How to deal with too many sstables
Hi, maybe you are running into an issue that I also had on my test cluster. Since there were almost no reads on it cassandra did not run any minor compactions at all. Solution for me (in this case) was: ALTER TABLE tablename WITH compaction = {'class': 'SizeTieredCompactionStrategy', 'min_threshold': '4', 'max_threshold': '32', 'cold_reads_to_omit': 0.0}; where cold_reads_to_omit is the trick. Anyway as Eric and Marcus among others suggest, do not run 2.1.2 for production as it has many issues. I'm looking forward to test 2.1.3 when it arrives. Cheers, Roland Am 03.02.2015 um 03:05 schrieb 曹志富: Hi,all: I have 18 nodes C* cluster with cassandra2.1.2.Some nodes have aboud 40,000+ sstables. my compaction strategy is STCS. Could someone give me some solution to deal with this situation. Thanks. -- 曹志富 手机:18611121927 邮箱:caozf.zh...@gmail.com mailto:caozf.zh...@gmail.com 微博:http://weibo.com/boliza/
Re: Question about use scenario with fulltext search
Yes but the stargate-core project is using native lucene libraries but yes it would be dependent on the stargate-core developer. I find that very easy and doing more analysis on this. Regards Asit On Mon, Feb 2, 2015 at 12:50 PM, Colin colpcl...@gmail.com wrote: I use solr and cassandra but not together. I write what I want indexed into solr (and only unstructured data), and related data into either cassandra or oracle. I use the same key across all three db's. When I need full text search etc, I read the data from solr, grab the keys, and go get the data from the other db's. This avoids conflation of concerns, isolates failures, but is dependent upon multiple writes. I use a message bus and services based approach. In my experience, at scale this approach works better and avoids vendor lock in. -- *Colin Clark* +1 612 859 6129 Skype colin.p.clark On Feb 2, 2015, at 7:25 AM, Asit KAUSHIK asitkaushikno...@gmail.com wrote: I tried elasticsearch but pulling up the data from Cassandra is a big pain. The river pulls up all the the data everytime and no incremental approach. Its a great product but i had to change my writing approach which i am just doing in Cassandra from .net client . Also you have to create a separate infrastructure for elasticsearch. Agin this is what i found with limited analysis on elasticsearch Regards Asit On Mon, Feb 2, 2015 at 11:43 AM, Asit KAUSHIK asitkaushikno...@gmail.com wrote: Also there is a project as Stargate-Core which gives the utility of querying with wildcard characters. the location is https://github.com/tuplejump/stargate-core/releases/tag/0.9.9 it supports the 2.0.11 version of cassandra.. Also elasticsearch is another product but pumping the data from Cassandra is a bad option in elasticsearch. You have to design you write such that you write on both. But i am using the Stargate-Core personally its very easy to implement and use Hope this add a cent to you evaluation on this topic Regards Asit On Sun, Feb 1, 2015 at 10:45 PM, Mark Reddy mark.l.re...@gmail.com wrote: If you have a list of usernames stored in your cassandra database, how could you find all usernames starting with 'Jo'? Cassandra does not support full text search on its own, if you are looking into DataStax enterprise Cassandra there is an integration with Slor that gives you this functionality. Personally for projects I work on that use Cassandra and require full text search, the necessary data is indexed into Elasticsearch. Or ... if this is not possible, what are you using cassandra for? If you are looking for use cases here is a comprehensive set from companies spanning many industries: http://planetcassandra.org/apache-cassandra-use-cases/ Regards, Mark On 1 February 2015 at 16:05, anton anto...@gmx.de wrote: Hi, I was just reading about cassandra and playing a little with it (using django www.djangoproject.com on the web server). One thing that I realized now is that fulltext search as in a normal sql statement (example): select name from users where name like 'Jo%'; Simply does not work because this functionality does not exist. After reading and googeling and reading ... I still do not understand how I could use a db without this functionality (If I do not want to restrict myself on numerical data). So my question is: If you have a list of usernames stored in your cassandra database, how could you find all usernames starting with 'Jo'? Or ... if this is not possible, what are you using cassandra for? Actually I still did not get the point of how I could use cassandra :-( Anton
Re: Question about use scenario with fulltext search
You can also try Stratio Cassandra, which is based in Cassandra 2.1.2, the latest version of Apache Cassandra: https://github.com/Stratio/stratio-cassandra It provides an open sourced implementation of the secondary indexes of Cassandra, which allows you to perform full-text queries, distributed relevance search, etc. It was presented in the last Cassandra Summit Europe: http://www.slideshare.net/dhiguero/advanced-search-and-topk-queries-in-cassandracassandrasummiteurope2014 https://www.youtube.com/watch?v=Hg5s-hXy_-M 2015-02-02 9:04 GMT+01:00 Asit KAUSHIK asitkaushikno...@gmail.com: Yes but the stargate-core project is using native lucene libraries but yes it would be dependent on the stargate-core developer. I find that very easy and doing more analysis on this. Regards Asit On Mon, Feb 2, 2015 at 12:50 PM, Colin colpcl...@gmail.com wrote: I use solr and cassandra but not together. I write what I want indexed into solr (and only unstructured data), and related data into either cassandra or oracle. I use the same key across all three db's. When I need full text search etc, I read the data from solr, grab the keys, and go get the data from the other db's. This avoids conflation of concerns, isolates failures, but is dependent upon multiple writes. I use a message bus and services based approach. In my experience, at scale this approach works better and avoids vendor lock in. -- *Colin Clark* +1 612 859 6129 Skype colin.p.clark On Feb 2, 2015, at 7:25 AM, Asit KAUSHIK asitkaushikno...@gmail.com wrote: I tried elasticsearch but pulling up the data from Cassandra is a big pain. The river pulls up all the the data everytime and no incremental approach. Its a great product but i had to change my writing approach which i am just doing in Cassandra from .net client . Also you have to create a separate infrastructure for elasticsearch. Agin this is what i found with limited analysis on elasticsearch Regards Asit On Mon, Feb 2, 2015 at 11:43 AM, Asit KAUSHIK asitkaushikno...@gmail.com wrote: Also there is a project as Stargate-Core which gives the utility of querying with wildcard characters. the location is https://github.com/tuplejump/stargate-core/releases/tag/0.9.9 it supports the 2.0.11 version of cassandra.. Also elasticsearch is another product but pumping the data from Cassandra is a bad option in elasticsearch. You have to design you write such that you write on both. But i am using the Stargate-Core personally its very easy to implement and use Hope this add a cent to you evaluation on this topic Regards Asit On Sun, Feb 1, 2015 at 10:45 PM, Mark Reddy mark.l.re...@gmail.com wrote: If you have a list of usernames stored in your cassandra database, how could you find all usernames starting with 'Jo'? Cassandra does not support full text search on its own, if you are looking into DataStax enterprise Cassandra there is an integration with Slor that gives you this functionality. Personally for projects I work on that use Cassandra and require full text search, the necessary data is indexed into Elasticsearch. Or ... if this is not possible, what are you using cassandra for? If you are looking for use cases here is a comprehensive set from companies spanning many industries: http://planetcassandra.org/apache-cassandra-use-cases/ Regards, Mark On 1 February 2015 at 16:05, anton anto...@gmx.de wrote: Hi, I was just reading about cassandra and playing a little with it (using django www.djangoproject.com on the web server). One thing that I realized now is that fulltext search as in a normal sql statement (example): select name from users where name like 'Jo%'; Simply does not work because this functionality does not exist. After reading and googeling and reading ... I still do not understand how I could use a db without this functionality (If I do not want to restrict myself on numerical data). So my question is: If you have a list of usernames stored in your cassandra database, how could you find all usernames starting with 'Jo'? Or ... if this is not possible, what are you using cassandra for? Actually I still did not get the point of how I could use cassandra :-( Anton -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*
RE: FW: How to use cqlsh to access Cassandra DB if the client_encryption_options is enabled
Thanks a lot ;) I’ll try your suggestions. From: Adam Holmberg [mailto:adam.holmb...@datastax.com] Sent: 2015年1月31日 1:12 To: user@cassandra.apache.org Subject: Re: FW: How to use cqlsh to access Cassandra DB if the client_encryption_options is enabled Assuming the truststore you are referencing is the same one the server is using, it's probably in the wrong format. You will need to export the cert into a PEM format for use in the (Python) cqlsh client. If exporting from the java keystore format, use keytool -exportcert source keystore, pass, etc -rfc -file output file If you have the crt file, you should be able to accomplish the same using openssl: openssl x509 -in in crt -inform DER -out output file -outform PEM Then, you should refer to that PEM file in your command. Alternatively, you can specify a path to the file (along with other options) in your cqlshrc file. References: How cqlsh picks up ssl optionshttps://github.com/apache/cassandra/blob/cassandra-2.1/pylib/cqlshlib/sslhandling.py Example cqlshrc filehttps://github.com/apache/cassandra/blob/cassandra-2.1/conf/cqlshrc.sample Adam Holmberg On Wed, Jan 28, 2015 at 1:08 AM, Lu, Boying boying...@emc.commailto:boying...@emc.com wrote: Hi, All, Does anyone know the answer? Thanks a lot Boying From: Lu, Boying Sent: 2015年1月6日 11:21 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: How to use cqlsh to access Cassandra DB if the client_encryption_options is enabled Hi, All, I turned on the dbclient_encryption_options like this: client_encryption_options: enabled: true keystore: path-to-my-keystore-file keystore_password: my-keystore-password truststore: path-to-my-truststore-file truststore_password: my-truststore-password … I can use following cassandra-cli command to access DB: cassandra-cli -ts path-to-my-truststore-file –tspw my-truststore-password –tf org.apache.cassandra.thrift.SSLTransportFactory But when I tried to access DB by cqlsh like this: SSL_CERTFILE=path-to-my-truststore cqlsh –t cqlishlib.ssl.ssl_transport_factory I got following error: Connection error: Could not connect to localhost:9160: [Errno 0] _ssl.c:332: error::lib(0):func(0):reason(0) I guess the reason maybe is that I didn’t provide the trustore password. But cqlsh doesn’t provide such option. Does anyone know how to resolve this issue? Thanks Boying