Re: CQL3 and column slices

2012-07-25 Thread Sylvain Lebresne
The short answer is yes, we are looking into adding streaming of
results to solve that problem
(https://issues.apache.org/jira/browse/CASSANDRA-4415).

--
Sylvain

On Tue, Jul 24, 2012 at 6:51 PM, Josep Blanquer blanq...@rightscale.com wrote:
 Thank Sylvain,

  The main argument for this is pagination. Let me try to explain the use
 cases, and compare it to RDBMS for better illustration:
  1- Right now, Cassandra doesn't stream the requests, so large resultsets
 are a royal pain in the neck to deal with. I.e., if I have a range_slice, or
 even a slice query that cuts across 1 million columns...I have to completely
 eat it all in the client receiving the response. That is, I'll need to
 store 1 million results in the client no matter what, and that can be quite
 prohibitive.
  2- In an effort to alleviate that, one can be smarter in the client and
 play the pagination game...i.e., start slicing at some column and get the
 next N results, then start the slice at the last column seen and get N
 moreetc. That results in many more queries from the smart client, but at
 least it would allow you to handle large result sets. (That's where the need
 for the CQL query in my original email was about).
 3- There's another important factor related to this problem in my opinion:
 the LIMIT clause in Cassandra (in both CQL or Thrift) is a required field.
 What I mean by required is that cassandra requires an explicit count to
 operate underneath. So it is really different from RDBMS' semantics where no
 LIMIT means you'll get all the results (instead of the high, yet still
 bound count of 10K or 20K max resultset row cassandra enforces by
 defaul)...and I cannot tell you how many problems we've had with developers
 forgetting about these default counts in queries, and realizing that some
 had results truncated because of that...in my mind, LIMIT should be to only
 used restrict results...queries with no LIMIT should always return all
 results (much like RDBMS)...otherwise the query seems the same but it is
 semantically different.

 So, all in all I think that the main problem/use case I'm facing is that
 Cassandra cannot stream resultsets. If it did, I believe that the need for
 my pagination use case would basically disappear, since it'd be the
 transport/client that would throttle how many results are stored in the
 client buffer at any point time. At the same time, I believe that with a
 streaming protocol you could simply change Cassandra internals to have
 infinite default limits...since there wouldn't be no reason to stop
 scanning (unless an explicit LIMIT clause was specified by the client). That
 would give you not only the SQL-equivalent syntax, but also the equivalent
 semantics of most current DBs.

 I hope that makes sense. That being said, are there any plans for streaming
 results? I believe that without that (and especially with the new CQL
 restrictions) it make much more difficult to use Cassandra with wide rows
 and large resultsets (which, in my mind is one of its sweet spots ). I
 believe that if that doesn't happen it would a) force the clients to be
 built in a much more complex and inefficient way to handle wide rows or b)
 will force users to use different, less efficient datamodels for their data.
 Both seem bad propositions to me, as they wouldn't be taking advantage of
 Cassandra's power, therefore diminishing its value.

  Cheers,

  Josep M.


 On Tue, Jul 24, 2012 at 3:11 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:

 On Tue, Jul 24, 2012 at 12:09 AM, Josep Blanquer
 blanq...@rightscale.com wrote:
  is there some way to express that in CQL3? something logically
  equivalent to
 
  SELECT *  FROM bug_test WHERE a:b:c:d:e  1:1:1:1:2??

 No, there isn't. Not currently at least. But feel free of course to
 open a ticket/request on
 https://issues.apache.org/jira/browse/CASSANDRA.

 I note that I would be curious to know the concrete use case you have
 for such type of queries. It would also help as an argument to add
 such facilities more quickly (or at all). Typically, we should
 support it in CQL3 because it was possible with thrift is
 definitively an argument, but a much weaker one without concrete
 examples of why it might be useful in the first place.

 --
 Sylvain




Connection issue in Cassandra

2012-07-25 Thread Adeel Akbar

Hi,

I have created 2 node cluster and use with application. My application 
unable to connect with database. Please find below logs;



 NoConnectionAvailable at /

ConnectionPool limit of size 2 overflow 2 reached, unable to obtain connection 
after 30 seconds

Request Method: GET
Request URL:http://172.16.100.131/
Django Version: 1.4
Exception Type: NoConnectionAvailable
Exception Value:

ConnectionPool limit of size 2 overflow 2 reached, unable to obtain connection 
after 30 seconds

Exception Location: 
/usr/local/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg/pycassa/pool.py 
in get, line 738

Python Executable:  /usr/local/bin/python
Python Version: 2.6.4
Python Path:

['/var/www/bs_ping',
 '/usr/local/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg',
 '/usr/local/lib/python2.6/site-packages/amqplib-0.6.1-py2.6.egg',
 '/usr/local/lib/python2.6/site-packages/BeautifulSoup-3.1.0.1-py2.6.egg',
 '/usr/local/lib/python2.6/site-packages/python_dateutil-1.4.1-py2.6.egg',
 '/usr/local/lib/python2.6/site-packages/feedparser-4.1-py2.6.egg',
 '/usr/local/lib/python2.6/site-packages/python_twitter-0.6-py2.6.egg',
 '/usr/local/lib/python2.6/site-packages/simplejson-2.0.9-py2.6-linux-i686.egg',
 '/usr/local/lib/python2.6/site-packages/txAMQP-0.3-py2.6.egg',
 '/usr/local/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg',
 
'/usr/local/lib/python2.6/site-packages/zope.interface-3.5.2-py2.6-linux-i686.egg',
 '/usr/local/lib/python2.6/site-packages/UnicodeUtils-0.3.2-py2.6.egg',
 '/usr/local/lib/python2.6/site-packages/pytz-2009p-py2.6.egg',
 '/usr/local/lib/python2.6/site-packages/ScriptUtils-0.5.5-py2.6.egg',
 
'/usr/local/lib/python2.6/site-packages/MySQL_python-1.2.3c1-py2.6-linux-i686.egg',
 '/usr/local/lib/python2.6/site-packages/python_memcached-1.44-py2.6.egg',
 '/usr/local/lib/python2.6/site-packages/coverage-3.2b1-py2.6-linux-i686.egg',
 '/usr/local/lib/python2.6/site-packages/flup-1.0.3.dev_20091027-py2.6.egg',
 '/usr/local/lib/python2.6/site-packages/oauth-1.0.1-py2.6.egg',
 '/usr/local/lib/python2.6/site-packages/pyOpenSSL-0.10-py2.6-linux-i686.egg',
 '/usr/local/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg',
 
'/usr/local/lib/python2.6/site-packages/wadofstuff_django_serializers-1.1.0-py2.6.egg',
 '/usr/local/lib/python2.6/site-packages/jsonpickle-0.4.0-py2.6.egg',
 '/usr/local/lib/python2.6/site-packages/django_compressor-1.1.2-py2.6.egg',
 '/usr/local/lib/python2.6/site-packages/django_appconf-0.5-py2.6.egg',
 '/usr/local/lib/python26.zip',
 '/usr/local/lib/python2.6',
 '/usr/local/lib/python2.6/plat-linux2',
 '/usr/local/lib/python2.6/lib-tk',
 '/usr/local/lib/python2.6/lib-old',
 '/usr/local/lib/python2.6/lib-dynload',
 '/usr/local/lib/python2.6/site-packages',
 '/usr/local/lib/python2.6/site-packages/PIL',
 '/var/www/bs_ping/',
 '/var/www']

Server time:Wed, 25 Jul 2012 13:17:33 +0500


--


Thanks  Regards

*Adeel**Akbar*



Is there any way to limit the off heap memory usage for cassandra 1.1.X ?

2012-07-25 Thread Thomas Spengler
Is there any way to limit the off heap memory usage for cassandra 1.1.X

Thx


-- 
Thomas Spengler


Fwd: {kundera-discuss} Kundera 2.0.7 Released

2012-07-25 Thread Vivek Mishra
-- Forwarded message --
From: Amry amresh1...@gmail.com
Date: Wed, Jul 25, 2012 at 4:41 PM
Subject: {kundera-discuss} Kundera 2.0.7 Released
To: kundera-disc...@googlegroups.com


Hi All,

We are happy to announce release of Kundera 2.0.7.

Kundera is a JPA 2.0 based, object-datastore papping library for NoSQL
datastores. The idea behind Kundera is to make working with NoSQL Databases
drop-dead simple and fun. It currently supports Cassandra, HBase, MongoDB
and relational databases.

Major Changes in this release:
---
* HBase 0.92.1 migration
* Hadoop 1.0.2 migration
* Cassandra 1.1.2 migration
* MongoDB 2.0.4 migration
* JPA EntityTransaction commit and rollback
* JTA Transactions integration over web server
* Kundera-REST API
* Support for Counter column in cassandra
* Inverted wide-row indexing support for Cassandra
* Login Authentication support for Cassandra and MongoDB
* Filters and filters list for HBase
* Deprecated Lucene based indexing for HBase.
* Datastore specific configuration files for specifying:
- Replication factor
- Placement strategy
- Consistency level per operation.
- Counter column family configuration
- Inverted indexing switch
- Zookeeper host and port
- Hbase column family configurations
- MongoDB servers list, read preference and socket timeout. ...etc.


To download, use or contribute to Kundera, visit:
http://github.com/impetus-opensource/Kundera
Latest released tag version is kundera-2.0.7. Kundera maven libraries are
now available at:
https://oss.sonatype.org/content/repositories/releases/com/impetus

Sample codes and examples for using Kundera can be found here:
http://github.com/impetus-opensource/Kundera-Examples
and
https://github.com/impetus-opensource/Kundera/tree/trunk/kundera-tests

Thank you all for your contributions!

Regards,
Kundera Team.


[Feedback needed] New MacOSX preferences pane plugin to control a local Cassandra Server

2012-07-25 Thread Rémy Saissy
Hi,
I have written a preferences pane plugin to control a Cassandra Server
installed on MacOSX.
It is based on the UI of the MySQL preferences pane.

I'd like to have your feedback about it.
 - Is it useful for you?
 - What do you think could be improved / added?

You can find it on my github at
https://github.com/remysaissy/cassandra-macosx-prefspane
No need to build from sources, there is a release of course :).

Feel free to add issues on the github or even send some code.
Thanks a lot!
Regards,

--
Rémy Saissy
Photos: http://picasaweb.google.com/remy.saissy
Blog: http://blog.remysaissy.com


filtered value count

2012-07-25 Thread Abhijit Chanda
Hi All,

Can any one suggest me how to retrieve the result count after multiple
filtration operations performed on multiple column families something
similar like join query result count in normal RDBMS.   I guess its not
possible in Cassandra, still asking? Or any other alternative to do the same

Regards,
Abhijit


Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0

2012-07-25 Thread Tyler Hobbs
Are you actually seeing any problems from this? High virtual memory usage
on its own really doesn't mean anything. See
http://wiki.apache.org/cassandra/FAQ#mmap

On Wed, Jul 25, 2012 at 1:21 AM, Thomas Spengler 
thomas.speng...@toptarif.de wrote:

 No one has any idea?

 we tryed

 update to 1.1.2
 DiskAccessMode standard, indexAccessMode standard
 row_cache_size_in_mb: 0
 key_cache_size_in_mb: 0


 Our next try will to change

 SerializingCacheProvider to ConcurrentLinkedHashCacheProvider

 any other proposals are welcom

 On 07/04/2012 02:13 PM, Thomas Spengler wrote:
  Hi @all,
 
  since our upgrade form cassandra 1.0.3 to 1.1.0 the virtual memory usage
  of the cassandra-nodes explodes
 
  our setup is:
  * 5 - centos 5.8 nodes
  * each 4 CPU's and 8 GB RAM
  * each node holds about 100 GB on data
  * each jvm's uses 2GB Ram
  * DiskAccessMode is standard, indexAccessMode is standard
 
  The memory usage grows upto the whole memory is used.
 
  Just for information, as we had cassandra 1.0.3, we used
  * DiskAccessMode is standard, indexAccessMode is mmap
  * and the ram-usage was ~4GB
 
 
  can anyone help?
 
 
  With Regards
 


 --
 Thomas Spengler
 Chief Technology Officer
 

 TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin
 Tel.: (030) 2000912 0 | Fax: (030) 2000912 100
 thomas.speng...@toptarif.de | www.toptarif.de

 Amtsgericht Charlottenburg, HRB 113287 B
 Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor
 -





-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Connection issue in Cassandra

2012-07-25 Thread Tyler Hobbs
That's a pretty old version of pycassa; it was release before 0.7.0 came
out.  I suggest upgrading.

It's possible this was caused by an old bug, but in general, this indicates
that you have more threads trying to use the ConnectionPool concurrently
than there are connections.

On Wed, Jul 25, 2012 at 3:30 AM, Adeel Akbar
adeel.ak...@panasiangroup.comwrote:

  Hi,

 I have created 2 node cluster and use with application. My application
 unable to connect with database. Please find below logs;

  NoConnectionAvailable at /

 ConnectionPool limit of size 2 overflow 2 reached, unable to obtain 
 connection after 30 seconds

Request Method: GET  Request URL: http://172.16.100.131/  Django
 Version: 1.4  Exception Type: NoConnectionAvailable  Exception Value:

 ConnectionPool limit of size 2 overflow 2 reached, unable to obtain 
 connection after 30 seconds

   Exception Location: 
 /usr/local/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg/pycassa/pool.py
 in get, line 738  Python Executable: /usr/local/bin/python  Python
 Version: 2.6.4  Python Path:

 ['/var/www/bs_ping',
  '/usr/local/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/amqplib-0.6.1-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/BeautifulSoup-3.1.0.1-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/python_dateutil-1.4.1-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/feedparser-4.1-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/python_twitter-0.6-py2.6.egg',
  
 '/usr/local/lib/python2.6/site-packages/simplejson-2.0.9-py2.6-linux-i686.egg',
  '/usr/local/lib/python2.6/site-packages/txAMQP-0.3-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg',
  
 '/usr/local/lib/python2.6/site-packages/zope.interface-3.5.2-py2.6-linux-i686.egg',
  '/usr/local/lib/python2.6/site-packages/UnicodeUtils-0.3.2-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/pytz-2009p-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/ScriptUtils-0.5.5-py2.6.egg',
  
 '/usr/local/lib/python2.6/site-packages/MySQL_python-1.2.3c1-py2.6-linux-i686.egg',
  '/usr/local/lib/python2.6/site-packages/python_memcached-1.44-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/coverage-3.2b1-py2.6-linux-i686.egg',
  '/usr/local/lib/python2.6/site-packages/flup-1.0.3.dev_20091027-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/oauth-1.0.1-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/pyOpenSSL-0.10-py2.6-linux-i686.egg',
  '/usr/local/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg',
  
 '/usr/local/lib/python2.6/site-packages/wadofstuff_django_serializers-1.1.0-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/jsonpickle-0.4.0-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/django_compressor-1.1.2-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/django_appconf-0.5-py2.6.egg',
  '/usr/local/lib/python26.zip',
  '/usr/local/lib/python2.6',
  '/usr/local/lib/python2.6/plat-linux2',
  '/usr/local/lib/python2.6/lib-tk',
  '/usr/local/lib/python2.6/lib-old',
  '/usr/local/lib/python2.6/lib-dynload',
  '/usr/local/lib/python2.6/site-packages',
  '/usr/local/lib/python2.6/site-packages/PIL',
  '/var/www/bs_ping/',
  '/var/www']

   Server time: Wed, 25 Jul 2012 13:17:33 +0500
 --


 Thanks  Regards

 *Adeel** Akbar*




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Cassandra Throughput

2012-07-25 Thread Tyler Hobbs
Barring unusual circumstances, no.  Typically, the throughput should be
pretty similar for both consistency levels, with slightly higher latency
for QUORUM.

On Tue, Jul 24, 2012 at 5:32 PM, Code Box codeith...@gmail.com wrote:

 Can it happen that the throughput with CL=QUORUM is  throughput with CL=1
 on a three node cluster with replication factor of 3 ?




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Bringing a dead node back up after fixing hardware issues

2012-07-25 Thread Eran Chinthaka Withana
Hi Brandon,

 Increasing CL is tricky for us for now, as our RF on that datacenter is 2

  and CL is set to ONE. If we make the CL to be LOCAL_QUORUM, then, if a
 node
  goes down we will have trouble. I will try to increase the RF to 3 in
 that
  data center and set the CL to LOCAL_QUORUM if nothing works out.

 Increasing the RF and and using LOCAL_QUORUM is the right thing in
 this case.  By choosing CL.ONE, you are agreeing that read misses are
 acceptable.  If they are not, then adjusting your RF/CL is the only
 path.


Alright, lets assume I want to go on this route. I have RF=2 in the data
center and I believe I need at least RF=3 to set the replication to
LOCAL_QUORUM and hide the node failures. But if I increase the RF to 3 now
then won't it trigger more read misses until repair completes? Given this
is a production cluster which can not afford downtime, how can we do this?

Thanks,
Eran