monitoring cassandra with JMX
Hi I have just written a little note on how to monitor cassandra... http://vineetdaniel.me/2011/03/26/monitoring-cassandra-with-jmx/ I hope it helps the community. Regards Vineet Daniel Cell : +918106217121 Websites : Blog http://vineetdaniel.blogspot.com | Linkedinhttp://in.linkedin.com/in/vineetdaniel | Twitter https://twitter.com/vineetdaniel
Re: Central monitoring of Cassandra cluster
...in process of using nagios to monitor three servers. Will post updates shortly. Regards Vineet Daniel Cell : +918106217121 Websites : Blog http://vineetdaniel.blogspot.com | Linkedinhttp://in.linkedin.com/in/vineetdaniel | Twitter https://twitter.com/vineetdaniel On Fri, Mar 25, 2011 at 12:14 AM, mcasandra mohitanch...@gmail.com wrote: Can someone share if they have centralized monitoring for all cassandra servers. With many nodes it becomes difficult to monitor them individually unless we can look at data in one place. I am looking at solutions where this can be done. Looking at Cacti currently but not sure how to integrate it with JMX. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Central-monitoring-of-Cassandra-cluster-tp6205275p6205275.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Internal error processing get in get after insert ttl -
I got similar error but while inserting I am using 07 Beta 1 and l found the following in the logs : ERROR 13:59:44,555 Internal error processing insert java.lang.AssertionError: invalid response count 1 at org.apache.cassandra.service.WriteResponseHandler.determineBlockFor(WriteResponseHandler.java:87) at org.apache.cassandra.service.WriteResponseHandler.init(WriteResponseHandler.java:47) at org.apache.cassandra.locator.AbstractReplicationStrategy.getWriteResponseHandler(AbstractReplicationStrategy.java:113) at org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:198) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:474) at org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:390) at org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:2896) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2499) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Regards Vineet Daniel Cell : +918106217121 Websites : Blog http://vinetedaniel.blogspot.com | Linkedinhttp://in.linkedin.com/in/vineetdaniel | Twitter https://twitter.com/vineetdaniel On Thu, Sep 23, 2010 at 12:48 PM, Sylvain Lebresne sylv...@yakaz.comwrote: You should not have anything special to do. Could you check the cassandra logs and give us the stack trace of the error ? -- Sylvain On Thu, Sep 23, 2010 at 8:36 AM, Michal Augustýn augustyn.mic...@gmail.com wrote: Hello, I tried to use Column.Ttl property but I was not successful. My simple test: 1) insert column with ttl = 3 2) get column - all is ok 3) wait for 2 seconds 4) get column - all is ok 5) wait again for 2 seconds (so column should disappear) 6) get column - I got Thrift.TApplicationException of type 6 with message Internal error processing get Do I have to change some Cassandra configuration in order to get ttl working? Or am I doing anything in bad way? Thank you! Augi
Re: Internal error processing get in get after insert ttl -
Hi I was using 'access_logs' as column family name changed it to Accesslogs and it worked. May be cassandra doesn't like underscores and small letters. Regards Vineet Daniel Cell : +918106217121 Websites : Blog http://vinetedaniel.blogspot.com | Linkedinhttp://in.linkedin.com/in/vineetdaniel | Twitter https://twitter.com/vineetdaniel On Thu, Sep 23, 2010 at 2:06 PM, vineet daniel vineetdan...@gmail.comwrote: I got similar error but while inserting I am using 07 Beta 1 and l found the following in the logs : ERROR 13:59:44,555 Internal error processing insert java.lang.AssertionError: invalid response count 1 at org.apache.cassandra.service.WriteResponseHandler.determineBlockFor(WriteResponseHandler.java:87) at org.apache.cassandra.service.WriteResponseHandler.init(WriteResponseHandler.java:47) at org.apache.cassandra.locator.AbstractReplicationStrategy.getWriteResponseHandler(AbstractReplicationStrategy.java:113) at org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:198) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:474) at org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:390) at org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:2896) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2499) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Regards Vineet Daniel Cell : +918106217121 Websites : Blog http://vinetedaniel.blogspot.com | Linkedinhttp://in.linkedin.com/in/vineetdaniel | Twitter https://twitter.com/vineetdaniel On Thu, Sep 23, 2010 at 12:48 PM, Sylvain Lebresne sylv...@yakaz.comwrote: You should not have anything special to do. Could you check the cassandra logs and give us the stack trace of the error ? -- Sylvain On Thu, Sep 23, 2010 at 8:36 AM, Michal Augustýn augustyn.mic...@gmail.com wrote: Hello, I tried to use Column.Ttl property but I was not successful. My simple test: 1) insert column with ttl = 3 2) get column - all is ok 3) wait for 2 seconds 4) get column - all is ok 5) wait again for 2 seconds (so column should disappear) 6) get column - I got Thrift.TApplicationException of type 6 with message Internal error processing get Do I have to change some Cassandra configuration in order to get ttl working? Or am I doing anything in bad way? Thank you! Augi
Re: Schema question
Hi Morten Simplest appraoch that comes to my mind (without considering any other use -cases just read and unread messages) is to use two CF's 'read' and 'unread', put all new messages in 'unread' and once user reads any one one them shift the same to 'read' and mark original for deletion. Regards Vineet Daniel Cell : +918106217121 Websites : Blog http://vinetedaniel.blogspot.com | Linkedinhttp://in.linkedin.com/in/vineetdaniel | Twitter https://twitter.com/vineetdaniel On Mon, Sep 20, 2010 at 3:35 PM, aaron morton aa...@thelastpickle.comwrote: Here is a discussion about implementing twitter with Cassandra http://www.rackspacecloud.com/blog/2010/05/12/cassandra-by-example/ An example of the same on github http://github.com/ericflo/twissandra If you have not done already checkout the articles page on the wiki http://wiki.apache.org/cassandra/ArticlesAndPresentations Aaron On 20 Sep 2010, at 21:57, Morten Wegelbye Nissen wrote: Hello List, No matter where you read, you almost every-where read the the noSQL datascema is completely different from the relational way - and after a little insight in cassandra everyone can 2nd that. But I miss to see some real-life examples on how a real system can be modelled. Lets take the example for a system where users can send messages to each other. ( Completely imaginary, noone would use cassandra for a mailsystem :) ) If one should create such a system, what CF's would be used? And how would you per example find all not read messages? ./Morten
Re: 0.7 memory usage problem
Hi Peter I actually checked after 15-20 of observation of monitor and logs when everything calmed down then it was showing this many processes, shouldnt it be good to reduce the no. of threads once server is idle or almost idle. As I am not a Java guy the only thing that I can think of is that may be creating processes/threads again will consume more memory than having idle threads. Regards Vineet Daniel Cell : +918106217121 Websites : Blog http://vinetedaniel.blogspot.com | Linkedinhttp://in.linkedin.com/in/vineetdaniel | Twitter https://twitter.com/vineetdaniel On Sat, Sep 18, 2010 at 10:50 PM, Peter Schuller peter.schul...@infidyne.com wrote: Even I would like to add here something and correct me if I am wrong, I downloaded 0.7 beta and ran it, just by chance I checked 'top' to see how the new version is doing and there were 64 processes running though Cassandra was on single node with default configuration options ( ran it as is, as soon as I downloaded). No inserts done, no selects done nothing. I don't think this is normal. I presume those are threads. It adds up; various stages have multiple threads in cassandra, and the JVM itself has a number of threads (e.g. GC threads, compiler threads). A 'jstack' on a freshly started trunk cassandra for me, grepping for 'prio', yields 92 threads the following. Attach Listener daemon prio=9 tid=0x000805476800 nid=0x80554e3c0 waiting on condition [0x] Timer-1 prio=5 tid=0x0009103ed800 nid=0x910777280 in Object.wait() [0x7a1a4000] LB-TARGET:1 prio=5 tid=0x0009103ee800 nid=0x910777b40 waiting on condition [0x7a2a5000] LB-OPERATIONS:1 prio=5 tid=0x0009103f nid=0x910778400 waiting on condition [0x7a3a6000] ACCEPT-localhost/127.0.0.1 prio=5 tid=0x0009103f0800 nid=0x910778cc0 runnable [0x7a4a7000] Timer-0 prio=5 tid=0x0009103f1800 nid=0x9103d4ac0 in Object.wait() [0x7a5a8000] GC inspection prio=5 tid=0x0009103f2000 nid=0x91051c540 in Object.wait() [0x7a6a9000] CompactionExecutor:1 prio=1 tid=0x000911805800 nid=0x9111fb3c0 waiting on condition [0x7a7aa000] PERIODIC-COMMIT-LOG-SYNCER prio=5 tid=0x0009103f3000 nid=0x91051ce00 waiting on condition [0x7a8ab000] COMMIT-LOG-WRITER prio=5 tid=0x0009103f3800 nid=0x91051d6c0 waiting on condition [0x7a9ac000] MISC_STAGE:1 prio=5 tid=0x0009103f4800 nid=0x91051df80 waiting on condition [0x7aaad000] MIGRATION_STAGE:1 prio=5 tid=0x0009103f5000 nid=0x91051e840 waiting on condition [0x7abae000] AE_SERVICE_STAGE:1 prio=5 tid=0x0009103f6000 nid=0x91051f100 waiting on condition [0x7acaf000] GOSSIP_STAGE:1 prio=5 tid=0x0009103f6800 nid=0x91051f9c0 waiting on condition [0x7adb] STREAM_STAGE:1 prio=5 tid=0x0009103f7800 nid=0x910520280 waiting on condition [0x7aeb1000] RESPONSE_STAGE:4 prio=5 tid=0x0009103f8000 nid=0x910520b40 waiting on condition [0x7afb2000] RESPONSE_STAGE:3 prio=5 tid=0x0009103f9000 nid=0x910521400 waiting on condition [0x7b0b3000] RESPONSE_STAGE:2 prio=5 tid=0x0009103f9800 nid=0x910521cc0 waiting on condition [0x7b1b4000] RESPONSE_STAGE:1 prio=5 tid=0x0009103fa800 nid=0x9103c8900 waiting on condition [0x7b2b5000] READ_STAGE:8 prio=5 tid=0x000910505000 nid=0x9103ce380 waiting on condition [0x7b3b6000] READ_STAGE:7 prio=5 tid=0x000910505800 nid=0x9103cec40 waiting on condition [0x7b4b7000] READ_STAGE:6 prio=5 tid=0x000910506800 nid=0x9103cf500 waiting on condition [0x7b5b8000] READ_STAGE:5 prio=5 tid=0x000910507000 nid=0x9103cfdc0 waiting on condition [0x7b6b9000] READ_STAGE:4 prio=5 tid=0x000801cbf000 nid=0x9103d0680 waiting on condition [0x7b7ba000] READ_STAGE:3 prio=5 tid=0x000801cbf800 nid=0x9103d0f40 waiting on condition [0x7b8bb000] READ_STAGE:2 prio=5 tid=0x000801cc0800 nid=0x9103d1800 waiting on condition [0x7b9bc000] READ_STAGE:1 prio=5 tid=0x000801cc1000 nid=0x9103d20c0 waiting on condition [0x7babd000] MUTATION_STAGE:32 prio=5 tid=0x000801cc2000 nid=0x9103d2980 waiting on condition [0x7bbbe000] MUTATION_STAGE:31 prio=5 tid=0x000801cc2800 nid=0x9103d3240 waiting on condition [0x7bcbf000] MUTATION_STAGE:30 prio=5 tid=0x000801cc3800 nid=0x9103d3b00 waiting on condition [0x7bdc] MUTATION_STAGE:29 prio=5 tid=0x000801cc4000 nid=0x9103d43c0 waiting on condition [0x7bec1000] MUTATION_STAGE:28 prio=5 tid=0x000801cc5000 nid=0x9103c21c0 waiting on condition [0x7bfc2000] MUTATION_STAGE:27 prio=5 tid=0x000801cc5800 nid=0x9103c2a80 waiting on condition [0x7c0c3000] MUTATION_STAGE:26 prio=5 tid=0x000801cc6800 nid=0x9103c3340 waiting on condition [0x7c1c4000] MUTATION_STAGE:25 prio=5 tid
Re: Bootstrapping stays stuck
Hi Gurpreet What is the output of nodetool -h hostname/IP streams --( to see what is going on between the nodes) . If you dont see anything happening try switching off firewall or iptables. Regards Vineet Daniel Cell : +918106217121 Websites : Blog http://vinetedaniel.blogspot.com | Linkedinhttp://in.linkedin.com/in/vineetdaniel | Twitter https://twitter.com/vineetdaniel On Tue, Sep 14, 2010 at 11:11 PM, Gurpreet Singh gurpreet.si...@gmail.comwrote: I tried this again, it happenned yet again. This time while the transfer messages seemed tobe in order, i also noticed that the source logs talk about having 9 dropped messages in the last 1000 ms. The only activity on the whole cluster is this bootstrapping, there is no read/write traffic going on. /G On Tue, Sep 14, 2010 at 10:05 AM, Gurpreet Singh gurpreet.si...@gmail.com wrote: I am using cassandra 0.6.5. On Tue, Sep 14, 2010 at 9:16 AM, Gurpreet Singh gurpreet.si...@gmail.com wrote: Hi, I have a cassandra cluster of 4 machines, and I am trying to bootstrap 2 more machines, one at a time. For both these machines, the bootstrapping stays stuck after the streaming is done. When the nodes come up for bootstrapping, I see all the relevant messages about getting a new token, assuming load from a particular host. I see a couple of nodes anticompacting data to send, and at a later point the node that is bootstrapping prints the right streaming mesgs. However, once the streaming is over, the node just doesnt do anything. Previously while bootstrapping, I have seen that after the streaming is done, the node restarts and becomes part of the ring by itself. I dont see this behaviour with both the nodes I tried today. I even restarted all the nodes in the cluster, and tried bootstrapping one of the nodes again, but it again was stuck after streaming. It seems to have copied the relevant load as well. Any ideas as to what could be going on here? /G
Re: Capping the memory limit in cassandra
Hi When is this happening I mean is Cassandra idle or application is inserting/reading values from it .are you running any map/reduce job at that time. Regards Vineet Daniel Cell : +918106217121 Websites : Blog http://vinetedaniel.blogspot.com | Linkedinhttp://in.linkedin.com/in/vineetdaniel | Twitter https://twitter.com/vineetdaniel On Tue, Sep 7, 2010 at 11:08 PM, Dathan Pattishall datha...@gmail.comwrote: For this java process /opt/java/bin/java -ea -Xms1G -*Xmx7G *-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8181 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=/opt/cassandra/bin/../conf -cp /opt/cassandra/bin/../conf:/opt/cassandra/bin/../build/classes:/opt/cassandra/bin/../lib/antlr-3.1.3.jar:/opt/cassandra/bin/../lib/apache-cassandra-0.6.4.jar:/opt/cassandra/bin/../lib/clhm-production.jar:/opt/cassandra/bin/../lib/commons-cli-1.1.jar:/opt/cassandra/bin/../lib/commons-codec-1.2.jar:/opt/cassandra/bin/../lib/commons-collections-3.2.1.jar:/opt/cassandra/bin/../lib/commons-lang-2.4.jar:/opt/cassandra/bin/../lib/google-collections-1.0.jar:/opt/cassandra/bin/../lib/hadoop-core-0.20.1.jar:/opt/cassandra/bin/../lib/high-scale-lib.jar:/opt/cassandra/bin/../lib/ivy-2.1.0.jar:/opt/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/opt/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/opt/cassandra/bin/../lib/jline-0.9.94.jar:/opt/cassandra/bin/../lib/json-simple-1.1.jar:/opt/cassandra/bin/../lib/libthrift-r917130.jar:/opt/cassandra/bin/../lib/log4j-1.2.14.jar:/opt/cassandra/bin/../lib/slf4j-api-1.5.8.jar:/opt/cassandra/bin/../lib/slf4j-log4j12-1.5.8.jar org.apache.cassandra.thrift.CassandraDaemon I set the max memory size to 7G yet Cassandra is taking PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 6992 root 18 0 32.8g 29g 21g S 19 93.9 5685:14 /opt/java/bin/java -ea -Xms1G* -Xmx7G *-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:M why is that, and how do I cap the memory used for cassandra? Is this a bug or a mistake on my part?
servers for cassandra
Hi I am just curious to know if there is any hosting company that provides servers at a very low cost, wherein I can install cassandra on WAN. I have cassandra setup in my LAN and want to test it in real conditions, taking dedicated servers just for testing purposes is not at all feasible for me not even pay-as-you go types. I'd really appreciate if anybody can share information on such hosting providers. Vineet Daniel Cell : +918106217121 Websites : Blog http://vinetedaniel.blogspot.com | Linkedinhttp://in.linkedin.com/in/vineetdaniel | Twitter https://twitter.com/vineetdaniel
Re: 4k keyspaces... Maybe we're doing it wrong?
If I am correct than you need to restart cassandra whenever you adding a new KeySpace. Thats another concern. Vineet Daniel Cell : +91-8106217121 Websites : Blog http://vinetedaniel.blogspot.com | Linkedinhttp://in.linkedin.com/in/vineetdaniel | Twitter https://twitter.com/vineetdaniel On Fri, Sep 3, 2010 at 2:58 PM, Mike Peters cassan...@softwareprojects.comwrote: Very interesting. Thank you So it sounds like other than being able to quickly truncate customer-keyspaces, with Cassandra there's no real benefit in keeping each customer data in a separate keyspace. We'll suffer on the memory side with all the switching between keyspaces and we're better off storing all customer data under the same keyspace? On 9/2/2010 11:29 PM, Aaron Morton wrote: Create one big happy love in keyspace. Use the key structure to identify the different clients data. The is more support for multi tenancy systems but a lot of the memory configuration is per keyspace/column family, so you cannot run that many keyspaces. This page has some more information http://wiki.apache.org/cassandra/MultiTenant Aaron On 03 Sep, 2010,at 01:25 PM, Mike Peters cassan...@softwareprojects.comcassan...@softwareprojects.comwrote: Hi, We're in the process of migrating 4,000 MySQL client databases to Cassandra. All database schemas are identical. With MySQL, we used to provision a separate 'database' per each client, to make it easier to shard and move things around. Does it make sense to migrate the 4,000 MySQL databases to 4,000 keyspaces in Cassandra? Or should we stick with a single keyspace? My concerns are - #1. Will every single node end up with 4k folders under /cassandra/data/? #2. Performance: Will Cassandra work better with a single keyspace + lots of keys, or thousands of keyspaces? - Granted it's 'cleaner' to have a separate keyspace per each client, but maybe that's not the best approach with Cassandra. Thoughts?
Re: Looking for something like like of mysql.
you can try using different CF for different result sets or inverted index. but looking at the number of inserts that you have..it will become complicated. The first thing that you need to do is stop thinking in terms of any RDBMS as cassandra is not at all like them. ___ Regards Vineet Daniel +918106217121 ___ Let your email find you On Thu, Sep 2, 2010 at 10:00 PM, Mike Peters cassan...@softwareprojects.com wrote: Cassandra doesn't support adhoc queries, like what you're describing I recommend looking at Lucandra http://github.com/tjake/Lucandra On 9/2/2010 12:27 PM, Anuj Kabra wrote: I am working with cassandra-0.6.4. I am working on mail retreival problem. We have the metadata of mail like sender, recipient, timestamp, subject and the location of mail file stored in a cassandra DB.Everyday about 25,000 records will be entered to this DB. We have not finalised on the data model yet but starting with a simple one having only one column family. ColumnFamily name=MailMetadata CompareWith=UTF8Type which have user_id of recipient as key.and columns for sender_id, timestamp of mail, subject and location of mail file. Now our Use case is to get the locations of all mail files which are being sent by a user matching a given subject(can be a part of the original subject of mail). Well according to my knowledge till now, we can get all the rows of a user by using user_id as key. After that i need to iterate over all the rows i get and see which mail seems to fit the given condition.(matching a subject in this case), which is very heavy computationally as we would get thousands of rows. So we are looking for something like like of mysql provided by thrift. I also need to know if am going the right way. Help is much appreciated.
Re: Data Modeling Conundrum
Query : Why are you sorting AFAIK cassandra sorts the keys by itself if you are using ordered partitioning. And how do you store data pertaining to single user but having several GUID's to attach with. ___ Vineet Daniel ___ Let your email find you On Sat, May 8, 2010 at 9:01 AM, William Ashley wash...@gmail.com wrote: List, I have a case where visitors to a site are tracked via a persistent cookie containing a guid. This cookie is created and set when missing. Some of these visitors are logged in, meaning a userId may also be available. What I’m looking to do is have a way to associate each userId with all of the guids that it has been seen with. Conceptually, this would identify the unique (device, browser) pairs for each userId. The catch is that I want to be able to retrieve the most-recently-seen N guids for a userId. One possible solution to this problem in SQL looks like this (made up on the fly): # Table schema CREATE TABLE UserGuid ( userId INT, guid VARCHAR, when TIMESTAMP, PRIMARY KEY( userId, guid ), INDEX( userId, when ) ); # For each request with guid G and userId U at time T INSERT INTO UserGuid ( userId, guid, when ) VALUES ( U, G, T ) ON DUPLICATE KEY UPDATE SET when = T; # To get most recent N guids for userId U SELECT guid FROM UserGuid WHERE userId = U ORDER BY when DESC LIMIT N; Hopefully I’ve sufficiently explained what I’m trying to do. Now on to solving this problem in Cassandra. I’ve been trying to find a way that allows both of the above operations to be performed efficiently. Updates are a breeze with a structure like this: // Row key is userId 12345 : { // Column name is guid ‘256fb890-5a4b-11df-a08a-0800200c9a66’ : { // Column timestamp is last time guid was seen timestamp : 387587235233 } } but getting the last N recently seen guids requires pulling all columns and sorting by timestamp. Retrievals can be done efficiently with a structure taking advantage of column sorting: // Row key is userId 12345 : { // Column name is last time guid was seen 387587235233 : { // Column value is guid value: ‘256fb890-5a4b-11df-a08a-0800200c9a66’ } } where we use a slice get on the row with limit N (and reverse order). However, updates involve pulling all columns to de-duplicate guid values. Neither solution is ideal, and so I present this to you fine gentlemen who have more experience modeling data in Cassandra than I. I would much prefer to avoid any solutions that require pulling an indeterminate amount of data for either operation. For the time being I am using the first method and only pulling the first M columns, sorting, and taking the top N (M = N). One thing I was thinking would be nice (if possible), is to have a column family where columns are either sorted by their timestamp, or by the time the column was created/updated (which may be equivalent to not sorting at all, but I have not looked at the implementation). I appreciate any feedback or suggestions you might have. - William
release date for 0.7 ?
Hi What is the expected release date for 0.7 and what will be the feature specifications for it ? ___ Vineet Daniel ___ Let your email find you
Re: bloom filter
Thanks David and Peter. Is there any way to view the content of this file. ___ Vineet Daniel ___ Let your email find you On Fri, May 7, 2010 at 4:24 PM, David Strauss da...@fourkitchens.comwrote: On 2010-05-07 10:51, vineet daniel wrote: what is the benefit of creating bloom filter when cassandra writes data, how does it helps ? http://wiki.apache.org/cassandra/ArchitectureOverview -- David Strauss | da...@fourkitchens.com Four Kitchens | http://fourkitchens.com | +1 512 454 6659 [office] | +1 512 870 8453 [direct]
Re: bloom filter
1. Peter said 'without going to disk' so that means bloom filters reside in memory, always or just when request to that particular CF is made. 2. It is also important for identifying which SSTable files to look inside even when a key is present. - David can you please throw some more light on your point, like what are the implications, why do we need to identify etc. ___ Vineet Daniel ___ Let your email find you On Fri, May 7, 2010 at 4:28 PM, David Strauss da...@fourkitchens.comwrote: On 2010-05-07 10:55, Peter Schüller wrote: what is the benefit of creating bloom filter when cassandra writes data, how does it helps ? It allows Cassandra to answer requests for non-existent keys without going to disk, except in cases where the bloom filter gives a false positive. See: http://spyced.blogspot.com/2009/01/all-you-ever-wanted-to-know-about.html It is also important for identifying which SSTable files to look inside even when a key is present. -- David Strauss | da...@fourkitchens.com Four Kitchens | http://fourkitchens.com | +1 512 454 6659 [office] | +1 512 870 8453 [direct]
Re: Cassandra Streaming Service
For more details have a look here : http://wiki.apache.org/cassandra/Streaming ___ Vineet Daniel ___ Let your email find you On Wed, May 5, 2010 at 9:34 PM, Weijun Li weiju...@gmail.com wrote: Thank you Jonathan! Good to know. On Tue, May 4, 2010 at 9:13 PM, Jonathan Ellis jbel...@gmail.com wrote: The Streaming service is what moves data around for load balancing, bootstrap, and decommission operations. On Tue, May 4, 2010 at 8:08 PM, Weijun Li weiju...@gmail.com wrote: A dumb question: what is the use of Cassandra streaming service? Any use case or example? Thanks, -Weijun -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
why is streaming done in 32 MB chunks ?
Hi Just out of curiosity want to know why streaming is done with 32MB chunks and not with 16 or 64 MB chunks. Any specific reasons behind 32 MB or its just like that ? ___ Vineet Daniel ___ Let your email find you
Re: How do you, Bloom filter of the false positive rate or remove the problem of distributed databases?
Only major compactions can clean out obsolete tombstones. On Tue, May 4, 2010 at 9:59 AM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, May 3, 2010 at 8:45 PM, Kauzki Aranami kazuki.aran...@gmail.com wrote: Let me rephrase my question. How does Cassandra deal with bloom filter's false positives on deleted records? The same way it deals with tombstones that it encounters otherwise (part of a row slice, or in a memtable). All the bloom filter does is keep you from having to check rows that don't have any data at all for a given key. Tombstones are not the same as no data at all, we do need to propagate tombstones during replication. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Design Query
As you havent specified all the details pertaining to filters and your data layout (structure) at a very high level what i can suggest is that you need to create a seperate CF for each filter. On Sat, May 1, 2010 at 5:04 PM, Rakesh Rajan rakes...@gmail.com wrote: I am evaluating cassandra to implement activity streams. We currently have over 100 feeds with total entries exceeding 32000 implemented using redis ( ~320 entries / feed). Would like hear from the community on how to use cassandra to solve the following cases: 1. Ability to fetch entries by applying a few filters ( like show me only likes from a given user). This would include range query to support pagination. So this would mean indices on a few columns like the feed id, feed type etc. 2. We have around 3 machines with 4GB RAM for this purpose and thinking of having replication factor 2. Would 4GB * 3 be enough for cassandra for this kind of data? I read that cassandra does not keep all the data in memory but want to be sure that we have the right server config to handle this data using cassandra. Thanks, Rakesh
how to fetch latest data
Hi In a cluster of cassandra if we are updating any key/value and perform the fetch query on that same key, we get old/stale data. This can be because of Read Repair. Is there any way to fetch the latest updated data from the cluster, as old data stands no significance and showing it to client is more irrelevant. Regards Vineet Daniel
Re: how to fetch latest data
If R + W N, where R, W, and N are respectively the read replica count, the write replica count, and the replication factor, all client reads will see the most recent write. On Tue, May 4, 2010 at 4:39 PM, vineet daniel vineetdan...@gmail.comwrote: Hi In a cluster of cassandra if we are updating any key/value and perform the fetch query on that same key, we get old/stale data. This can be because of Read Repair. Is there any way to fetch the latest updated data from the cluster, as old data stands no significance and showing it to client is more irrelevant. Regards Vineet Daniel
can we have duplicate keys ?
Hi Can anyone please tell me if we can have duplicate keys in Super Column Family, if now how can we represent this : - Article and Category Mapping clientOne.insert(:ArticleCategory, 12, {ArticleID = 123}) 12, {ArticleID = 124}) 12, {ArticleID = 125}) 12, {ArticleID = 126}) Here 12 is they key for a Category name 'sample' and all the four articles are part of this key or sample. Is this right or I need to do something else ?
compare cassandra read n write results
Hi A little while ago I tried cassandra's read n write operations and timed it. I am using Pandra for communication with cassandra. System is CentOS 5 with 2 GB RAM and dual core. I inserted 10 rows in around 30 secs and read the same in 25 seconds. If anyone of you have run similar tests can you please share or tell whether this can be improved or not. I am using default configuration of cassandra and its a single node setup. Thanks Vineet Daniel
Re: compare cassandra read n write results
I dont think it would be a good idea not to use pandra for benchmarks as we are going to use pandra for our application. Secondly, it will give Pandra guys some boost to enhance the performance of thier library. On Mon, Apr 12, 2010 at 6:05 PM, Jordan Pittier jordan.pitt...@gmail.comwrote: Hi, If you really want to benchmark your box, you should concidere not using Pandra nor any library built upon Thrift. They all come with a (small) overhead. I also realized when I made my first benchmark that most of my box's ressources was used by the benchmarking tool it self and not by Canssandra. I recommend using 2 boxes if possible, one for running the benchmark tool against the other which will run Cassandra (both boxes have to be in the same LAN). On Mon, Apr 12, 2010 at 1:55 PM, vineet daniel vineetdan...@gmail.comwrote: Hi A little while ago I tried cassandra's read n write operations and timed it. I am using Pandra for communication with cassandra. System is CentOS 5 with 2 GB RAM and dual core. I inserted 10 rows in around 30 secs and read the same in 25 seconds. If anyone of you have run similar tests can you please share or tell whether this can be improved or not. I am using default configuration of cassandra and its a single node setup. Thanks Vineet Daniel
Re: compare cassandra read n write results
Actually, to be honest I dont know how to insert 100 rows without PHP or Pandra. If you could help me out I will surely try it and will share the results with you guys. On Mon, Apr 12, 2010 at 7:25 PM, Paul Prescod pres...@gmail.com wrote: How will they know whether the performance problem is caused by Cassandra or Pandra if you do not have raw Cassandra performance numbers for your setup? On Mon, Apr 12, 2010 at 5:51 AM, vineet daniel vineetdan...@gmail.com wrote: I dont think it would be a good idea not to use pandra for benchmarks as we are going to use pandra for our application. Secondly, it will give Pandra guys some boost to enhance the performance of thier library.
Re: How to perform queries on Cassandra?
How to handle same usernames. Otherwise seems fine to me. On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun su...@dopsun.com wrote: Hi, As far as I can see it, the Cassandra API currently supports criterias on: Token – Key – Super Column Name (if applicable) - Column Names I guess Token is not usually used for the day to day queries, so, Key and Column Names are normally used for querying. For the user name and password case, I guess it can be done like this: Define a CF as UserAuth with type as Super, and Key is user name, while password can be the SuperKeyName. So, while you receive the user name and password from the UI (or any other methods), it can be queried via: multiget_slice or get_range_slices, if there are anything returned, means that the user name and password matches. If not using the super column name, and put the password as the column name, the column name usually not used for these kind of discretionary values (actually, I don’t see any definitive documents on how to use the column Names and Super Columns, flexibility is the good of Cassandra, or is it bad if abused? :P) Not sure whether this is the best way, but I guess it will work. Regards, Dop *From:* Lucifer Dignified [mailto:vineetdan...@gmail.com] *Sent:* Sunday, April 11, 2010 5:33 PM *To:* user@cassandra.apache.org *Subject:* Re: How to perform queries on Cassandra? Hi Benjamin I'll try to make it more clear to you. We have a user table with fields 'id', 'username', and 'password'. Now if use the ideal way to store key/value, like : username : vineetdaniel timestamp password : password timestamp second user : username: seconduser timestamp password:password and so on, here what i assume is that as we cannot make search on values (as confirmed by guys on cassandra forums) we are not able to perform robust 'where' queries. Now what i propose is this. Rather than using a static values for column names use values itself and unique key as identifier. So, the above example when put in as per me would be. vineetdaniel : vineetdaniel timestamp password:password timestamp second user seconduser:seconduser timestamp password:password timestamp By using above methodology we can simply make search on keys itself rather than going into using different CF's. But to add further, this cannot be used for every situation. I am still exploring this, and soon will be updating the group and my blog with information pertaining to this. As cassandra is new, I think every idea or experience should be shared with the community. I hope I example is clear this time. Should you have any queries feel free to revert. On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black b...@b3k.us wrote: Sorry, I don't understand your example. On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified vineetdan...@gmail.com wrote: Benjamin I quite agree to you, but what in case of duplicate usernames, suppose if I am not using unique names as in email id's . If we have duplicacy in usernames we cannot use it for key, so what should be the solution. I think keeping incremental numeric id as key and keeping the name and value same in the column family. Example : User1 has password as 123456 Cassandra structure : 1 as key user1 - column name value - user1 123456 - column name value - 123456 I m thinking of doing it this way for my applicaton, this way i can run different sorts of queries too. Any feedback on this is welcome. On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black b...@b3k.us wrote: You would have a Column Family, not a column for that; let's call it the Users CF. You'd use username as the row key and have a column called 'password'. For your example query, you'd retrieve row key 'usr2', column 'password'. The general pattern is that you create CFs to act as indices for each query you want to perform. There is no equivalent to a relational store to perform arbitrary queries. You must structure things to permit the queries of interest. b On Sat, Apr 10, 2010 at 8:34 PM, dir dir sikerasa...@gmail.com wrote: I have already read the API spesification. Honestly I do not understand how to use it. Because there are not an examples. For example I have a column like this: UserNamePassword usr1abc usr2xyz usr3opm suppose I want query the user's password using SQL in RDBMS Select Password From Users Where UserName = usr2; Now I want to get the password using OODBMS DB4o Object Query and Java ObjectSet QueryResult = db.query(new Predicate() { public boolean match(Users Myusers) { return Myuser.getUserName() == usr2; } }); After we get the Users instance in the QueryResult, hence we
Re: How to perform queries on Cassandra?
its not a problem its a scenario, which we need to handle. And all I am trying to do is to achieve what is not there with API i.e a workaroud. On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black b...@b3k.us wrote: A system that permits multiple people to have the same username has a serious problem. On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel vineetdan...@gmail.com wrote: How to handle same usernames. Otherwise seems fine to me. On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun su...@dopsun.com wrote: Hi, As far as I can see it, the Cassandra API currently supports criterias on: Token – Key – Super Column Name (if applicable) - Column Names I guess Token is not usually used for the day to day queries, so, Key and Column Names are normally used for querying. For the user name and password case, I guess it can be done like this: Define a CF as UserAuth with type as Super, and Key is user name, while password can be the SuperKeyName. So, while you receive the user name and password from the UI (or any other methods), it can be queried via: multiget_slice or get_range_slices, if there are anything returned, means that the user name and password matches. If not using the super column name, and put the password as the column name, the column name usually not used for these kind of discretionary values (actually, I don’t see any definitive documents on how to use the column Names and Super Columns, flexibility is the good of Cassandra, or is it bad if abused? :P) Not sure whether this is the best way, but I guess it will work. Regards, Dop From: Lucifer Dignified [mailto:vineetdan...@gmail.com] Sent: Sunday, April 11, 2010 5:33 PM To: user@cassandra.apache.org Subject: Re: How to perform queries on Cassandra? Hi Benjamin I'll try to make it more clear to you. We have a user table with fields 'id', 'username', and 'password'. Now if use the ideal way to store key/value, like : username : vineetdaniel timestamp password : password timestamp second user : username: seconduser timestamp password:password and so on, here what i assume is that as we cannot make search on values (as confirmed by guys on cassandra forums) we are not able to perform robust 'where' queries. Now what i propose is this. Rather than using a static values for column names use values itself and unique key as identifier. So, the above example when put in as per me would be. vineetdaniel : vineetdaniel timestamp password:password timestamp second user seconduser:seconduser timestamp password:password timestamp By using above methodology we can simply make search on keys itself rather than going into using different CF's. But to add further, this cannot be used for every situation. I am still exploring this, and soon will be updating the group and my blog with information pertaining to this. As cassandra is new, I think every idea or experience should be shared with the community. I hope I example is clear this time. Should you have any queries feel free to revert. On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black b...@b3k.us wrote: Sorry, I don't understand your example. On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified vineetdan...@gmail.com wrote: Benjamin I quite agree to you, but what in case of duplicate usernames, suppose if I am not using unique names as in email id's . If we have duplicacy in usernames we cannot use it for key, so what should be the solution. I think keeping incremental numeric id as key and keeping the name and value same in the column family. Example : User1 has password as 123456 Cassandra structure : 1 as key user1 - column name value - user1 123456 - column name value - 123456 I m thinking of doing it this way for my applicaton, this way i can run different sorts of queries too. Any feedback on this is welcome. On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black b...@b3k.us wrote: You would have a Column Family, not a column for that; let's call it the Users CF. You'd use username as the row key and have a column called 'password'. For your example query, you'd retrieve row key 'usr2', column 'password'. The general pattern is that you create CFs to act as indices for each query you want to perform. There is no equivalent to a relational store to perform arbitrary queries. You must structure things to permit the queries of interest. b On Sat, Apr 10, 2010 at 8:34 PM, dir dir sikerasa...@gmail.com wrote: I have already read the API spesification. Honestly I do not understand how to use it. Because there are not an examples. For example I have a column like this: UserNamePassword usr1abc usr2
Re: How to perform queries on Cassandra?
Well my initial idea is to use value as column name, keeping key as an incremental integer. The discussion after each mail has drifted from this point which I had made. Will put it again. we want to store user information. We keep 1,2,3,4.so on as keys. AND values as column names i.e rather than using column name 'first name', i'd be using 'vineet' as column name, rather than using 'last name' as column name i'd be using 'daniel'. This way I can directly read column names as values. This is just a thought that has come to my mind while trying to design my db for cassandra. On Sun, Apr 11, 2010 at 11:46 PM, Benjamin Black b...@b3k.us wrote: Row keys must be unique. If your usernames are not unique and you want to be able to query on them, you either need to figure out a way to make them unique or treat the username rows themselves as indices, which refer to a set of actually unique identifiers for users. On Sun, Apr 11, 2010 at 11:12 AM, vineet daniel vineetdan...@gmail.com wrote: its not a problem its a scenario, which we need to handle. And all I am trying to do is to achieve what is not there with API i.e a workaroud. On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black b...@b3k.us wrote: A system that permits multiple people to have the same username has a serious problem. On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel vineetdan...@gmail.com wrote: How to handle same usernames. Otherwise seems fine to me. On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun su...@dopsun.com wrote: Hi, As far as I can see it, the Cassandra API currently supports criterias on: Token – Key – Super Column Name (if applicable) - Column Names I guess Token is not usually used for the day to day queries, so, Key and Column Names are normally used for querying. For the user name and password case, I guess it can be done like this: Define a CF as UserAuth with type as Super, and Key is user name, while password can be the SuperKeyName. So, while you receive the user name and password from the UI (or any other methods), it can be queried via: multiget_slice or get_range_slices, if there are anything returned, means that the user name and password matches. If not using the super column name, and put the password as the column name, the column name usually not used for these kind of discretionary values (actually, I don’t see any definitive documents on how to use the column Names and Super Columns, flexibility is the good of Cassandra, or is it bad if abused? :P) Not sure whether this is the best way, but I guess it will work. Regards, Dop From: Lucifer Dignified [mailto:vineetdan...@gmail.com] Sent: Sunday, April 11, 2010 5:33 PM To: user@cassandra.apache.org Subject: Re: How to perform queries on Cassandra? Hi Benjamin I'll try to make it more clear to you. We have a user table with fields 'id', 'username', and 'password'. Now if use the ideal way to store key/value, like : username : vineetdaniel timestamp password : password timestamp second user : username: seconduser timestamp password:password and so on, here what i assume is that as we cannot make search on values (as confirmed by guys on cassandra forums) we are not able to perform robust 'where' queries. Now what i propose is this. Rather than using a static values for column names use values itself and unique key as identifier. So, the above example when put in as per me would be. vineetdaniel : vineetdaniel timestamp password:password timestamp second user seconduser:seconduser timestamp password:password timestamp By using above methodology we can simply make search on keys itself rather than going into using different CF's. But to add further, this cannot be used for every situation. I am still exploring this, and soon will be updating the group and my blog with information pertaining to this. As cassandra is new, I think every idea or experience should be shared with the community. I hope I example is clear this time. Should you have any queries feel free to revert. On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black b...@b3k.us wrote: Sorry, I don't understand your example. On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified vineetdan...@gmail.com wrote: Benjamin I quite agree to you, but what in case of duplicate usernames, suppose if I am not using unique names as in email id's . If we have duplicacy in usernames we cannot use it for key, so what should be the solution. I think keeping incremental numeric id as key and keeping the name and value same in the column family. Example : User1 has password as 123456 Cassandra
Re: How to perform queries on Cassandra?
I assume that using the key i can get the all the columns like an array. Now i'd be using php to extract arraykey=value in that array, just want to avoid that i.e i can directly print the column names. If you guys think its not a good idea I can drop it, anyways m new to it and a lot of things are coming to mind. As far as cassandra and columnfamily/ super columns are concerned i am pretty clear. On Mon, Apr 12, 2010 at 12:23 AM, Benjamin Black b...@b3k.us wrote: I have no idea what problem you are trying to solve. You are misunderstanding a number of things about the Cassandra data model and about how we are explaining it is best used. On Sun, Apr 11, 2010 at 11:37 AM, vineet daniel vineetdan...@gmail.com wrote: Well my initial idea is to use value as column name, keeping key as an incremental integer. The discussion after each mail has drifted from this point which I had made. Will put it again. we want to store user information. We keep 1,2,3,4.so on as keys. AND values as column names i.e rather than using column name 'first name', i'd be using 'vineet' as column name, rather than using 'last name' as column name i'd be using 'daniel'. This way I can directly read column names as values. This is just a thought that has come to my mind while trying to design my db for cassandra. On Sun, Apr 11, 2010 at 11:46 PM, Benjamin Black b...@b3k.us wrote: Row keys must be unique. If your usernames are not unique and you want to be able to query on them, you either need to figure out a way to make them unique or treat the username rows themselves as indices, which refer to a set of actually unique identifiers for users. On Sun, Apr 11, 2010 at 11:12 AM, vineet daniel vineetdan...@gmail.com wrote: its not a problem its a scenario, which we need to handle. And all I am trying to do is to achieve what is not there with API i.e a workaroud. On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black b...@b3k.us wrote: A system that permits multiple people to have the same username has a serious problem. On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel vineetdan...@gmail.com wrote: How to handle same usernames. Otherwise seems fine to me. On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun su...@dopsun.com wrote: Hi, As far as I can see it, the Cassandra API currently supports criterias on: Token – Key – Super Column Name (if applicable) - Column Names I guess Token is not usually used for the day to day queries, so, Key and Column Names are normally used for querying. For the user name and password case, I guess it can be done like this: Define a CF as UserAuth with type as Super, and Key is user name, while password can be the SuperKeyName. So, while you receive the user name and password from the UI (or any other methods), it can be queried via: multiget_slice or get_range_slices, if there are anything returned, means that the user name and password matches. If not using the super column name, and put the password as the column name, the column name usually not used for these kind of discretionary values (actually, I don’t see any definitive documents on how to use the column Names and Super Columns, flexibility is the good of Cassandra, or is it bad if abused? :P) Not sure whether this is the best way, but I guess it will work. Regards, Dop From: Lucifer Dignified [mailto:vineetdan...@gmail.com] Sent: Sunday, April 11, 2010 5:33 PM To: user@cassandra.apache.org Subject: Re: How to perform queries on Cassandra? Hi Benjamin I'll try to make it more clear to you. We have a user table with fields 'id', 'username', and 'password'. Now if use the ideal way to store key/value, like : username : vineetdaniel timestamp password : password timestamp second user : username: seconduser timestamp password:password and so on, here what i assume is that as we cannot make search on values (as confirmed by guys on cassandra forums) we are not able to perform robust 'where' queries. Now what i propose is this. Rather than using a static values for column names use values itself and unique key as identifier. So, the above example when put in as per me would be. vineetdaniel : vineetdaniel timestamp password:password timestamp second user seconduser:seconduser timestamp password:password timestamp By using above methodology we can simply make search on keys itself rather than going into using different CF's. But to add further, this cannot be used for every situation. I am still exploring
Re: How to perform queries on Cassandra?
I am dropping the idea, dont want to irritate you guys more. I've got your points. On Mon, Apr 12, 2010 at 12:41 AM, Benjamin Black b...@b3k.us wrote: Just to be clear: do you understand we are saying you need to use multiple CFs to achieve the goal, not a single one? The Users CF would be indexed on a unique integer as you are saying you intend. There is no point in having values as column names here, other than making things incredibly confusing. Assume instead that you have a column called 'username' and a column called 'password'. In your model where usernames may be the same for different users, you would have data that looked like this: 0: {'username':'usr1', 'password':'woop'} 1: {'username':'usr2', 'password':'foo'} 2: {'username':'usr2', 'password':'bar'} The UsernameIndex CF would be indexed on usernames, giving a map from a username to the unique identifiers in the Users CF with that username: 'usr1': {0:0} 'usr2': {1:0, 2:0} Note that since we don't care about the values in the UsernameIndex, they are just set to 0. You can stash data here, if you like, but it can mean more overhead in maintaining data synchronization between the raw data and the index data. To perform your query on username 'usr2', you get 'usr2' from UsernameIndex CF, which gives you a set of ids, and you then get those ids (1 and 2) from the Users CF. b On Sun, Apr 11, 2010 at 11:37 AM, vineet daniel vineetdan...@gmail.com wrote: Well my initial idea is to use value as column name, keeping key as an incremental integer. The discussion after each mail has drifted from this point which I had made. Will put it again. we want to store user information. We keep 1,2,3,4.so on as keys. AND values as column names i.e rather than using column name 'first name', i'd be using 'vineet' as column name, rather than using 'last name' as column name i'd be using 'daniel'. This way I can directly read column names as values. This is just a thought that has come to my mind while trying to design my db for cassandra. On Sun, Apr 11, 2010 at 11:46 PM, Benjamin Black b...@b3k.us wrote: Row keys must be unique. If your usernames are not unique and you want to be able to query on them, you either need to figure out a way to make them unique or treat the username rows themselves as indices, which refer to a set of actually unique identifiers for users. On Sun, Apr 11, 2010 at 11:12 AM, vineet daniel vineetdan...@gmail.com wrote: its not a problem its a scenario, which we need to handle. And all I am trying to do is to achieve what is not there with API i.e a workaroud. On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black b...@b3k.us wrote: A system that permits multiple people to have the same username has a serious problem. On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel vineetdan...@gmail.com wrote: How to handle same usernames. Otherwise seems fine to me. On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun su...@dopsun.com wrote: Hi, As far as I can see it, the Cassandra API currently supports criterias on: Token – Key – Super Column Name (if applicable) - Column Names I guess Token is not usually used for the day to day queries, so, Key and Column Names are normally used for querying. For the user name and password case, I guess it can be done like this: Define a CF as UserAuth with type as Super, and Key is user name, while password can be the SuperKeyName. So, while you receive the user name and password from the UI (or any other methods), it can be queried via: multiget_slice or get_range_slices, if there are anything returned, means that the user name and password matches. If not using the super column name, and put the password as the column name, the column name usually not used for these kind of discretionary values (actually, I don’t see any definitive documents on how to use the column Names and Super Columns, flexibility is the good of Cassandra, or is it bad if abused? :P) Not sure whether this is the best way, but I guess it will work. Regards, Dop From: Lucifer Dignified [mailto:vineetdan...@gmail.com] Sent: Sunday, April 11, 2010 5:33 PM To: user@cassandra.apache.org Subject: Re: How to perform queries on Cassandra? Hi Benjamin I'll try to make it more clear to you. We have a user table with fields 'id', 'username', and 'password'. Now if use the ideal way to store key/value, like : username : vineetdaniel timestamp password : password timestamp second user : username: seconduser timestamp password:password and so on, here what i assume is that as we cannot make search