Re: CQL3 and column slices
The short answer is yes, we are looking into adding streaming of results to solve that problem (https://issues.apache.org/jira/browse/CASSANDRA-4415). -- Sylvain On Tue, Jul 24, 2012 at 6:51 PM, Josep Blanquer blanq...@rightscale.com wrote: Thank Sylvain, The main argument for this is pagination. Let me try to explain the use cases, and compare it to RDBMS for better illustration: 1- Right now, Cassandra doesn't stream the requests, so large resultsets are a royal pain in the neck to deal with. I.e., if I have a range_slice, or even a slice query that cuts across 1 million columns...I have to completely eat it all in the client receiving the response. That is, I'll need to store 1 million results in the client no matter what, and that can be quite prohibitive. 2- In an effort to alleviate that, one can be smarter in the client and play the pagination game...i.e., start slicing at some column and get the next N results, then start the slice at the last column seen and get N moreetc. That results in many more queries from the smart client, but at least it would allow you to handle large result sets. (That's where the need for the CQL query in my original email was about). 3- There's another important factor related to this problem in my opinion: the LIMIT clause in Cassandra (in both CQL or Thrift) is a required field. What I mean by required is that cassandra requires an explicit count to operate underneath. So it is really different from RDBMS' semantics where no LIMIT means you'll get all the results (instead of the high, yet still bound count of 10K or 20K max resultset row cassandra enforces by defaul)...and I cannot tell you how many problems we've had with developers forgetting about these default counts in queries, and realizing that some had results truncated because of that...in my mind, LIMIT should be to only used restrict results...queries with no LIMIT should always return all results (much like RDBMS)...otherwise the query seems the same but it is semantically different. So, all in all I think that the main problem/use case I'm facing is that Cassandra cannot stream resultsets. If it did, I believe that the need for my pagination use case would basically disappear, since it'd be the transport/client that would throttle how many results are stored in the client buffer at any point time. At the same time, I believe that with a streaming protocol you could simply change Cassandra internals to have infinite default limits...since there wouldn't be no reason to stop scanning (unless an explicit LIMIT clause was specified by the client). That would give you not only the SQL-equivalent syntax, but also the equivalent semantics of most current DBs. I hope that makes sense. That being said, are there any plans for streaming results? I believe that without that (and especially with the new CQL restrictions) it make much more difficult to use Cassandra with wide rows and large resultsets (which, in my mind is one of its sweet spots ). I believe that if that doesn't happen it would a) force the clients to be built in a much more complex and inefficient way to handle wide rows or b) will force users to use different, less efficient datamodels for their data. Both seem bad propositions to me, as they wouldn't be taking advantage of Cassandra's power, therefore diminishing its value. Cheers, Josep M. On Tue, Jul 24, 2012 at 3:11 AM, Sylvain Lebresne sylv...@datastax.com wrote: On Tue, Jul 24, 2012 at 12:09 AM, Josep Blanquer blanq...@rightscale.com wrote: is there some way to express that in CQL3? something logically equivalent to SELECT * FROM bug_test WHERE a:b:c:d:e 1:1:1:1:2?? No, there isn't. Not currently at least. But feel free of course to open a ticket/request on https://issues.apache.org/jira/browse/CASSANDRA. I note that I would be curious to know the concrete use case you have for such type of queries. It would also help as an argument to add such facilities more quickly (or at all). Typically, we should support it in CQL3 because it was possible with thrift is definitively an argument, but a much weaker one without concrete examples of why it might be useful in the first place. -- Sylvain
Connection issue in Cassandra
Hi, I have created 2 node cluster and use with application. My application unable to connect with database. Please find below logs; NoConnectionAvailable at / ConnectionPool limit of size 2 overflow 2 reached, unable to obtain connection after 30 seconds Request Method: GET Request URL:http://172.16.100.131/ Django Version: 1.4 Exception Type: NoConnectionAvailable Exception Value: ConnectionPool limit of size 2 overflow 2 reached, unable to obtain connection after 30 seconds Exception Location: /usr/local/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg/pycassa/pool.py in get, line 738 Python Executable: /usr/local/bin/python Python Version: 2.6.4 Python Path: ['/var/www/bs_ping', '/usr/local/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg', '/usr/local/lib/python2.6/site-packages/amqplib-0.6.1-py2.6.egg', '/usr/local/lib/python2.6/site-packages/BeautifulSoup-3.1.0.1-py2.6.egg', '/usr/local/lib/python2.6/site-packages/python_dateutil-1.4.1-py2.6.egg', '/usr/local/lib/python2.6/site-packages/feedparser-4.1-py2.6.egg', '/usr/local/lib/python2.6/site-packages/python_twitter-0.6-py2.6.egg', '/usr/local/lib/python2.6/site-packages/simplejson-2.0.9-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/txAMQP-0.3-py2.6.egg', '/usr/local/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/zope.interface-3.5.2-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/UnicodeUtils-0.3.2-py2.6.egg', '/usr/local/lib/python2.6/site-packages/pytz-2009p-py2.6.egg', '/usr/local/lib/python2.6/site-packages/ScriptUtils-0.5.5-py2.6.egg', '/usr/local/lib/python2.6/site-packages/MySQL_python-1.2.3c1-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/python_memcached-1.44-py2.6.egg', '/usr/local/lib/python2.6/site-packages/coverage-3.2b1-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/flup-1.0.3.dev_20091027-py2.6.egg', '/usr/local/lib/python2.6/site-packages/oauth-1.0.1-py2.6.egg', '/usr/local/lib/python2.6/site-packages/pyOpenSSL-0.10-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg', '/usr/local/lib/python2.6/site-packages/wadofstuff_django_serializers-1.1.0-py2.6.egg', '/usr/local/lib/python2.6/site-packages/jsonpickle-0.4.0-py2.6.egg', '/usr/local/lib/python2.6/site-packages/django_compressor-1.1.2-py2.6.egg', '/usr/local/lib/python2.6/site-packages/django_appconf-0.5-py2.6.egg', '/usr/local/lib/python26.zip', '/usr/local/lib/python2.6', '/usr/local/lib/python2.6/plat-linux2', '/usr/local/lib/python2.6/lib-tk', '/usr/local/lib/python2.6/lib-old', '/usr/local/lib/python2.6/lib-dynload', '/usr/local/lib/python2.6/site-packages', '/usr/local/lib/python2.6/site-packages/PIL', '/var/www/bs_ping/', '/var/www'] Server time:Wed, 25 Jul 2012 13:17:33 +0500 -- Thanks Regards *Adeel**Akbar*
Is there any way to limit the off heap memory usage for cassandra 1.1.X ?
Is there any way to limit the off heap memory usage for cassandra 1.1.X Thx -- Thomas Spengler
Fwd: {kundera-discuss} Kundera 2.0.7 Released
-- Forwarded message -- From: Amry amresh1...@gmail.com Date: Wed, Jul 25, 2012 at 4:41 PM Subject: {kundera-discuss} Kundera 2.0.7 Released To: kundera-disc...@googlegroups.com Hi All, We are happy to announce release of Kundera 2.0.7. Kundera is a JPA 2.0 based, object-datastore papping library for NoSQL datastores. The idea behind Kundera is to make working with NoSQL Databases drop-dead simple and fun. It currently supports Cassandra, HBase, MongoDB and relational databases. Major Changes in this release: --- * HBase 0.92.1 migration * Hadoop 1.0.2 migration * Cassandra 1.1.2 migration * MongoDB 2.0.4 migration * JPA EntityTransaction commit and rollback * JTA Transactions integration over web server * Kundera-REST API * Support for Counter column in cassandra * Inverted wide-row indexing support for Cassandra * Login Authentication support for Cassandra and MongoDB * Filters and filters list for HBase * Deprecated Lucene based indexing for HBase. * Datastore specific configuration files for specifying: - Replication factor - Placement strategy - Consistency level per operation. - Counter column family configuration - Inverted indexing switch - Zookeeper host and port - Hbase column family configurations - MongoDB servers list, read preference and socket timeout. ...etc. To download, use or contribute to Kundera, visit: http://github.com/impetus-opensource/Kundera Latest released tag version is kundera-2.0.7. Kundera maven libraries are now available at: https://oss.sonatype.org/content/repositories/releases/com/impetus Sample codes and examples for using Kundera can be found here: http://github.com/impetus-opensource/Kundera-Examples and https://github.com/impetus-opensource/Kundera/tree/trunk/kundera-tests Thank you all for your contributions! Regards, Kundera Team.
[Feedback needed] New MacOSX preferences pane plugin to control a local Cassandra Server
Hi, I have written a preferences pane plugin to control a Cassandra Server installed on MacOSX. It is based on the UI of the MySQL preferences pane. I'd like to have your feedback about it. - Is it useful for you? - What do you think could be improved / added? You can find it on my github at https://github.com/remysaissy/cassandra-macosx-prefspane No need to build from sources, there is a release of course :). Feel free to add issues on the github or even send some code. Thanks a lot! Regards, -- Rémy Saissy Photos: http://picasaweb.google.com/remy.saissy Blog: http://blog.remysaissy.com
filtered value count
Hi All, Can any one suggest me how to retrieve the result count after multiple filtration operations performed on multiple column families something similar like join query result count in normal RDBMS. I guess its not possible in Cassandra, still asking? Or any other alternative to do the same Regards, Abhijit
Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0
Are you actually seeing any problems from this? High virtual memory usage on its own really doesn't mean anything. See http://wiki.apache.org/cassandra/FAQ#mmap On Wed, Jul 25, 2012 at 1:21 AM, Thomas Spengler thomas.speng...@toptarif.de wrote: No one has any idea? we tryed update to 1.1.2 DiskAccessMode standard, indexAccessMode standard row_cache_size_in_mb: 0 key_cache_size_in_mb: 0 Our next try will to change SerializingCacheProvider to ConcurrentLinkedHashCacheProvider any other proposals are welcom On 07/04/2012 02:13 PM, Thomas Spengler wrote: Hi @all, since our upgrade form cassandra 1.0.3 to 1.1.0 the virtual memory usage of the cassandra-nodes explodes our setup is: * 5 - centos 5.8 nodes * each 4 CPU's and 8 GB RAM * each node holds about 100 GB on data * each jvm's uses 2GB Ram * DiskAccessMode is standard, indexAccessMode is standard The memory usage grows upto the whole memory is used. Just for information, as we had cassandra 1.0.3, we used * DiskAccessMode is standard, indexAccessMode is mmap * and the ram-usage was ~4GB can anyone help? With Regards -- Thomas Spengler Chief Technology Officer TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin Tel.: (030) 2000912 0 | Fax: (030) 2000912 100 thomas.speng...@toptarif.de | www.toptarif.de Amtsgericht Charlottenburg, HRB 113287 B Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor - -- Tyler Hobbs DataStax http://datastax.com/
Re: Connection issue in Cassandra
That's a pretty old version of pycassa; it was release before 0.7.0 came out. I suggest upgrading. It's possible this was caused by an old bug, but in general, this indicates that you have more threads trying to use the ConnectionPool concurrently than there are connections. On Wed, Jul 25, 2012 at 3:30 AM, Adeel Akbar adeel.ak...@panasiangroup.comwrote: Hi, I have created 2 node cluster and use with application. My application unable to connect with database. Please find below logs; NoConnectionAvailable at / ConnectionPool limit of size 2 overflow 2 reached, unable to obtain connection after 30 seconds Request Method: GET Request URL: http://172.16.100.131/ Django Version: 1.4 Exception Type: NoConnectionAvailable Exception Value: ConnectionPool limit of size 2 overflow 2 reached, unable to obtain connection after 30 seconds Exception Location: /usr/local/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg/pycassa/pool.py in get, line 738 Python Executable: /usr/local/bin/python Python Version: 2.6.4 Python Path: ['/var/www/bs_ping', '/usr/local/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg', '/usr/local/lib/python2.6/site-packages/amqplib-0.6.1-py2.6.egg', '/usr/local/lib/python2.6/site-packages/BeautifulSoup-3.1.0.1-py2.6.egg', '/usr/local/lib/python2.6/site-packages/python_dateutil-1.4.1-py2.6.egg', '/usr/local/lib/python2.6/site-packages/feedparser-4.1-py2.6.egg', '/usr/local/lib/python2.6/site-packages/python_twitter-0.6-py2.6.egg', '/usr/local/lib/python2.6/site-packages/simplejson-2.0.9-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/txAMQP-0.3-py2.6.egg', '/usr/local/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/zope.interface-3.5.2-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/UnicodeUtils-0.3.2-py2.6.egg', '/usr/local/lib/python2.6/site-packages/pytz-2009p-py2.6.egg', '/usr/local/lib/python2.6/site-packages/ScriptUtils-0.5.5-py2.6.egg', '/usr/local/lib/python2.6/site-packages/MySQL_python-1.2.3c1-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/python_memcached-1.44-py2.6.egg', '/usr/local/lib/python2.6/site-packages/coverage-3.2b1-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/flup-1.0.3.dev_20091027-py2.6.egg', '/usr/local/lib/python2.6/site-packages/oauth-1.0.1-py2.6.egg', '/usr/local/lib/python2.6/site-packages/pyOpenSSL-0.10-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg', '/usr/local/lib/python2.6/site-packages/wadofstuff_django_serializers-1.1.0-py2.6.egg', '/usr/local/lib/python2.6/site-packages/jsonpickle-0.4.0-py2.6.egg', '/usr/local/lib/python2.6/site-packages/django_compressor-1.1.2-py2.6.egg', '/usr/local/lib/python2.6/site-packages/django_appconf-0.5-py2.6.egg', '/usr/local/lib/python26.zip', '/usr/local/lib/python2.6', '/usr/local/lib/python2.6/plat-linux2', '/usr/local/lib/python2.6/lib-tk', '/usr/local/lib/python2.6/lib-old', '/usr/local/lib/python2.6/lib-dynload', '/usr/local/lib/python2.6/site-packages', '/usr/local/lib/python2.6/site-packages/PIL', '/var/www/bs_ping/', '/var/www'] Server time: Wed, 25 Jul 2012 13:17:33 +0500 -- Thanks Regards *Adeel** Akbar* -- Tyler Hobbs DataStax http://datastax.com/
Re: Cassandra Throughput
Barring unusual circumstances, no. Typically, the throughput should be pretty similar for both consistency levels, with slightly higher latency for QUORUM. On Tue, Jul 24, 2012 at 5:32 PM, Code Box codeith...@gmail.com wrote: Can it happen that the throughput with CL=QUORUM is throughput with CL=1 on a three node cluster with replication factor of 3 ? -- Tyler Hobbs DataStax http://datastax.com/
Re: Bringing a dead node back up after fixing hardware issues
Hi Brandon, Increasing CL is tricky for us for now, as our RF on that datacenter is 2 and CL is set to ONE. If we make the CL to be LOCAL_QUORUM, then, if a node goes down we will have trouble. I will try to increase the RF to 3 in that data center and set the CL to LOCAL_QUORUM if nothing works out. Increasing the RF and and using LOCAL_QUORUM is the right thing in this case. By choosing CL.ONE, you are agreeing that read misses are acceptable. If they are not, then adjusting your RF/CL is the only path. Alright, lets assume I want to go on this route. I have RF=2 in the data center and I believe I need at least RF=3 to set the replication to LOCAL_QUORUM and hide the node failures. But if I increase the RF to 3 now then won't it trigger more read misses until repair completes? Given this is a production cluster which can not afford downtime, how can we do this? Thanks, Eran