monitoring cassandra with JMX

2011-05-24 Thread vineet daniel
Hi

I have just written a little note on how to monitor cassandra...
http://vineetdaniel.me/2011/03/26/monitoring-cassandra-with-jmx/

I hope it helps the community.



Regards
Vineet Daniel
Cell  : +918106217121
Websites :
Blog http://vineetdaniel.blogspot.com   |
Linkedinhttp://in.linkedin.com/in/vineetdaniel
|  Twitter https://twitter.com/vineetdaniel


Re: Central monitoring of Cassandra cluster

2011-03-24 Thread vineet daniel
...in process of using nagios to monitor three servers. Will post updates
shortly.
Regards
Vineet Daniel
Cell  : +918106217121
Websites :
Blog http://vineetdaniel.blogspot.com   |
Linkedinhttp://in.linkedin.com/in/vineetdaniel
|  Twitter https://twitter.com/vineetdaniel




On Fri, Mar 25, 2011 at 12:14 AM, mcasandra mohitanch...@gmail.com wrote:

 Can someone share if they have centralized monitoring for all cassandra
 servers. With many nodes it becomes difficult to monitor them individually
 unless we can look at data in one place. I am looking at solutions where
 this can be done. Looking at Cacti currently but not sure how to integrate
 it with JMX.

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Central-monitoring-of-Cassandra-cluster-tp6205275p6205275.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: Internal error processing get in get after insert ttl -

2010-09-23 Thread vineet daniel
I got similar error but while inserting I am using 07 Beta 1 and l found the
following in the logs :

ERROR 13:59:44,555 Internal error processing insert
java.lang.AssertionError: invalid response count 1
at
org.apache.cassandra.service.WriteResponseHandler.determineBlockFor(WriteResponseHandler.java:87)
at
org.apache.cassandra.service.WriteResponseHandler.init(WriteResponseHandler.java:47)
at
org.apache.cassandra.locator.AbstractReplicationStrategy.getWriteResponseHandler(AbstractReplicationStrategy.java:113)
at
org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:198)
at
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:474)
at
org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:390)
at
org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:2896)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2499)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)


Regards
Vineet Daniel
Cell  : +918106217121
Websites :
Blog http://vinetedaniel.blogspot.com   |
Linkedinhttp://in.linkedin.com/in/vineetdaniel
|  Twitter https://twitter.com/vineetdaniel





On Thu, Sep 23, 2010 at 12:48 PM, Sylvain Lebresne sylv...@yakaz.comwrote:

 You should not have anything special to do.
 Could you check the cassandra logs and give us the stack trace of the error
 ?

 --
 Sylvain

 On Thu, Sep 23, 2010 at 8:36 AM, Michal Augustýn
 augustyn.mic...@gmail.com wrote:
  Hello,
  I tried to use Column.Ttl property but I was not successful. My simple
 test:
  1) insert column with ttl = 3
  2) get column - all is ok
  3) wait for 2 seconds
  4) get column - all is ok
  5) wait again for 2 seconds (so column should disappear)
  6) get column - I got Thrift.TApplicationException of type 6 with
 message
  Internal error processing get
  Do I have to change some Cassandra configuration in order to get ttl
  working? Or am I doing anything in bad way?
  Thank you!
  Augi



Re: Internal error processing get in get after insert ttl -

2010-09-23 Thread vineet daniel
Hi

I was using 'access_logs' as column family name changed it to Accesslogs and
it worked. May be cassandra doesn't like underscores and small letters.

Regards
Vineet Daniel
Cell  : +918106217121
Websites :
Blog http://vinetedaniel.blogspot.com   |
Linkedinhttp://in.linkedin.com/in/vineetdaniel
|  Twitter https://twitter.com/vineetdaniel





On Thu, Sep 23, 2010 at 2:06 PM, vineet daniel vineetdan...@gmail.comwrote:

 I got similar error but while inserting I am using 07 Beta 1 and l found
 the following in the logs :

 ERROR 13:59:44,555 Internal error processing insert
 java.lang.AssertionError: invalid response count 1
 at
 org.apache.cassandra.service.WriteResponseHandler.determineBlockFor(WriteResponseHandler.java:87)
 at
 org.apache.cassandra.service.WriteResponseHandler.init(WriteResponseHandler.java:47)
 at
 org.apache.cassandra.locator.AbstractReplicationStrategy.getWriteResponseHandler(AbstractReplicationStrategy.java:113)
 at
 org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:198)
 at
 org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:474)
 at
 org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:390)
 at
 org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:2896)
 at
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2499)
 at
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:636)


 Regards
 Vineet Daniel
 Cell  : +918106217121
 Websites :
 Blog http://vinetedaniel.blogspot.com   |   
 Linkedinhttp://in.linkedin.com/in/vineetdaniel
 |  Twitter https://twitter.com/vineetdaniel





 On Thu, Sep 23, 2010 at 12:48 PM, Sylvain Lebresne sylv...@yakaz.comwrote:

 You should not have anything special to do.
 Could you check the cassandra logs and give us the stack trace of the
 error ?

 --
 Sylvain

 On Thu, Sep 23, 2010 at 8:36 AM, Michal Augustýn
 augustyn.mic...@gmail.com wrote:
  Hello,
  I tried to use Column.Ttl property but I was not successful. My simple
 test:
  1) insert column with ttl = 3
  2) get column - all is ok
  3) wait for 2 seconds
  4) get column - all is ok
  5) wait again for 2 seconds (so column should disappear)
  6) get column - I got Thrift.TApplicationException of type 6 with
 message
  Internal error processing get
  Do I have to change some Cassandra configuration in order to get ttl
  working? Or am I doing anything in bad way?
  Thank you!
  Augi





Re: Schema question

2010-09-20 Thread vineet daniel
Hi Morten

Simplest appraoch that comes to my mind (without considering any other use
-cases just read and unread messages) is to use two CF's 'read' and
'unread', put all new messages in 'unread' and once user reads any one one
them shift the same to 'read' and mark original for deletion.


Regards
Vineet Daniel
Cell  : +918106217121
Websites :
Blog http://vinetedaniel.blogspot.com   |
Linkedinhttp://in.linkedin.com/in/vineetdaniel
|  Twitter https://twitter.com/vineetdaniel





On Mon, Sep 20, 2010 at 3:35 PM, aaron morton aa...@thelastpickle.comwrote:

 Here is a discussion about implementing twitter with Cassandra
 http://www.rackspacecloud.com/blog/2010/05/12/cassandra-by-example/

 An example of the same on github
 http://github.com/ericflo/twissandra

 If you have not done already checkout the articles page on the wiki
 http://wiki.apache.org/cassandra/ArticlesAndPresentations

 Aaron


 On 20 Sep 2010, at 21:57, Morten Wegelbye Nissen wrote:

  Hello List,
 
  No matter where you read, you almost every-where read the the noSQL
 datascema is completely different from the relational way - and after a
 little insight in cassandra everyone can 2nd that.
 
  But I miss to see some real-life examples on how a real system can be
 modelled. Lets take the example for a system where users can send messages
 to each other. ( Completely imaginary, noone would use cassandra for a
 mailsystem :) )
 
  If one should create such a system, what CF's would be used? And how
 would you per example find all not read messages?
 
  ./Morten




Re: 0.7 memory usage problem

2010-09-18 Thread vineet daniel
Hi Peter

I actually checked after 15-20 of observation of monitor and logs when
everything calmed down then it was showing this many processes, shouldnt it
be good to reduce the no. of threads once server is idle or almost idle. As
I am not a Java guy the only thing that I can think  of is that may be
creating processes/threads again will consume more memory than having idle
threads.

Regards
Vineet Daniel
Cell  : +918106217121
Websites :
Blog http://vinetedaniel.blogspot.com   |
Linkedinhttp://in.linkedin.com/in/vineetdaniel
|  Twitter https://twitter.com/vineetdaniel





On Sat, Sep 18, 2010 at 10:50 PM, Peter Schuller 
peter.schul...@infidyne.com wrote:

  Even I would like to add here something and correct me if I am wrong, I
  downloaded 0.7 beta and ran it, just by chance I checked 'top' to see how
  the new version is doing and there were 64 processes running though
  Cassandra was on single node with default configuration options ( ran it
 as
  is, as soon as I downloaded). No inserts done, no selects done nothing. I
  don't think this is normal.

 I presume those are threads. It adds up; various stages have multiple
 threads in cassandra, and the JVM itself has a number of threads (e.g.
 GC threads, compiler threads). A 'jstack' on a freshly started trunk
 cassandra for me, grepping for 'prio', yields 92 threads the
 following.

 Attach Listener daemon prio=9 tid=0x000805476800 nid=0x80554e3c0
 waiting on condition [0x]
 Timer-1 prio=5 tid=0x0009103ed800 nid=0x910777280 in
 Object.wait() [0x7a1a4000]
 LB-TARGET:1 prio=5 tid=0x0009103ee800 nid=0x910777b40 waiting on
 condition [0x7a2a5000]
 LB-OPERATIONS:1 prio=5 tid=0x0009103f nid=0x910778400
 waiting on condition [0x7a3a6000]
 ACCEPT-localhost/127.0.0.1 prio=5 tid=0x0009103f0800
 nid=0x910778cc0 runnable [0x7a4a7000]
 Timer-0 prio=5 tid=0x0009103f1800 nid=0x9103d4ac0 in
 Object.wait() [0x7a5a8000]
 GC inspection prio=5 tid=0x0009103f2000 nid=0x91051c540 in
 Object.wait() [0x7a6a9000]
 CompactionExecutor:1 prio=1 tid=0x000911805800 nid=0x9111fb3c0
 waiting on condition [0x7a7aa000]
 PERIODIC-COMMIT-LOG-SYNCER prio=5 tid=0x0009103f3000
 nid=0x91051ce00 waiting on condition [0x7a8ab000]
 COMMIT-LOG-WRITER prio=5 tid=0x0009103f3800 nid=0x91051d6c0
 waiting on condition [0x7a9ac000]
 MISC_STAGE:1 prio=5 tid=0x0009103f4800 nid=0x91051df80 waiting
 on condition [0x7aaad000]
 MIGRATION_STAGE:1 prio=5 tid=0x0009103f5000 nid=0x91051e840
 waiting on condition [0x7abae000]
 AE_SERVICE_STAGE:1 prio=5 tid=0x0009103f6000 nid=0x91051f100
 waiting on condition [0x7acaf000]
 GOSSIP_STAGE:1 prio=5 tid=0x0009103f6800 nid=0x91051f9c0 waiting
 on condition [0x7adb]
 STREAM_STAGE:1 prio=5 tid=0x0009103f7800 nid=0x910520280 waiting
 on condition [0x7aeb1000]
 RESPONSE_STAGE:4 prio=5 tid=0x0009103f8000 nid=0x910520b40
 waiting on condition [0x7afb2000]
 RESPONSE_STAGE:3 prio=5 tid=0x0009103f9000 nid=0x910521400
 waiting on condition [0x7b0b3000]
 RESPONSE_STAGE:2 prio=5 tid=0x0009103f9800 nid=0x910521cc0
 waiting on condition [0x7b1b4000]
 RESPONSE_STAGE:1 prio=5 tid=0x0009103fa800 nid=0x9103c8900
 waiting on condition [0x7b2b5000]
 READ_STAGE:8 prio=5 tid=0x000910505000 nid=0x9103ce380 waiting
 on condition [0x7b3b6000]
 READ_STAGE:7 prio=5 tid=0x000910505800 nid=0x9103cec40 waiting
 on condition [0x7b4b7000]
 READ_STAGE:6 prio=5 tid=0x000910506800 nid=0x9103cf500 waiting
 on condition [0x7b5b8000]
 READ_STAGE:5 prio=5 tid=0x000910507000 nid=0x9103cfdc0 waiting
 on condition [0x7b6b9000]
 READ_STAGE:4 prio=5 tid=0x000801cbf000 nid=0x9103d0680 waiting
 on condition [0x7b7ba000]
 READ_STAGE:3 prio=5 tid=0x000801cbf800 nid=0x9103d0f40 waiting
 on condition [0x7b8bb000]
 READ_STAGE:2 prio=5 tid=0x000801cc0800 nid=0x9103d1800 waiting
 on condition [0x7b9bc000]
 READ_STAGE:1 prio=5 tid=0x000801cc1000 nid=0x9103d20c0 waiting
 on condition [0x7babd000]
 MUTATION_STAGE:32 prio=5 tid=0x000801cc2000 nid=0x9103d2980
 waiting on condition [0x7bbbe000]
 MUTATION_STAGE:31 prio=5 tid=0x000801cc2800 nid=0x9103d3240
 waiting on condition [0x7bcbf000]
 MUTATION_STAGE:30 prio=5 tid=0x000801cc3800 nid=0x9103d3b00
 waiting on condition [0x7bdc]
 MUTATION_STAGE:29 prio=5 tid=0x000801cc4000 nid=0x9103d43c0
 waiting on condition [0x7bec1000]
 MUTATION_STAGE:28 prio=5 tid=0x000801cc5000 nid=0x9103c21c0
 waiting on condition [0x7bfc2000]
 MUTATION_STAGE:27 prio=5 tid=0x000801cc5800 nid=0x9103c2a80
 waiting on condition [0x7c0c3000]
 MUTATION_STAGE:26 prio=5 tid=0x000801cc6800 nid=0x9103c3340
 waiting on condition [0x7c1c4000]
 MUTATION_STAGE:25 prio=5 tid

Re: Bootstrapping stays stuck

2010-09-14 Thread vineet daniel
Hi Gurpreet

What is the output of  nodetool -h hostname/IP streams  --( to see what
is going on between the nodes) . If you dont see anything happening try
switching off firewall or iptables.


Regards
Vineet Daniel
Cell  : +918106217121
Websites :
Blog http://vinetedaniel.blogspot.com   |
Linkedinhttp://in.linkedin.com/in/vineetdaniel
|  Twitter https://twitter.com/vineetdaniel





On Tue, Sep 14, 2010 at 11:11 PM, Gurpreet Singh
gurpreet.si...@gmail.comwrote:

 I tried this again, it happenned yet again.
 This time while the transfer messages seemed tobe in order, i also noticed
 that the source logs talk about having 9 dropped messages in the last 1000
 ms. The only activity on the whole cluster is this bootstrapping, there is
 no read/write traffic going on.

 /G

 On Tue, Sep 14, 2010 at 10:05 AM, Gurpreet Singh gurpreet.si...@gmail.com
  wrote:

 I am using cassandra 0.6.5.


 On Tue, Sep 14, 2010 at 9:16 AM, Gurpreet Singh gurpreet.si...@gmail.com
  wrote:

 Hi,
 I have a cassandra cluster of 4 machines, and I am trying to bootstrap 2
 more machines, one at a time.
 For both these machines, the bootstrapping stays stuck after the
 streaming is done.

 When the nodes come up for bootstrapping, I see all the relevant messages
 about getting a new token, assuming load from a particular host. I see a
 couple of nodes anticompacting data to send, and at a later point the node
 that is bootstrapping prints the right streaming mesgs. However, once the
 streaming is over, the node just doesnt do anything. Previously while
 bootstrapping, I have seen that after the streaming is done, the node
 restarts and becomes part of the ring by itself. I dont see this behaviour
 with both the nodes I tried today.
 I even restarted all the nodes in the cluster, and tried bootstrapping
 one of the nodes again, but it again was stuck after streaming. It seems to
 have copied the relevant load as well.
 Any ideas as to what could be going on here?

 /G






Re: Capping the memory limit in cassandra

2010-09-07 Thread vineet daniel
Hi

When is this happening I mean is Cassandra idle or application is
inserting/reading values from it .are you running any map/reduce job at
that time.

Regards

Vineet Daniel
Cell  : +918106217121
Websites :
Blog http://vinetedaniel.blogspot.com   |
Linkedinhttp://in.linkedin.com/in/vineetdaniel
|  Twitter https://twitter.com/vineetdaniel





On Tue, Sep 7, 2010 at 11:08 PM, Dathan Pattishall datha...@gmail.comwrote:

 For this java process

 /opt/java/bin/java -ea -Xms1G -*Xmx7G *-XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
 -XX:MaxTenuringThreshold=1 -XX:+HeapDumpOnOutOfMemoryError
 -Dcom.sun.management.jmxremote.port=8181
 -Dcom.sun.management.jmxremote.ssl=false
 -Dcom.sun.management.jmxremote.authenticate=false
 -Dstorage-config=/opt/cassandra/bin/../conf -cp
 /opt/cassandra/bin/../conf:/opt/cassandra/bin/../build/classes:/opt/cassandra/bin/../lib/antlr-3.1.3.jar:/opt/cassandra/bin/../lib/apache-cassandra-0.6.4.jar:/opt/cassandra/bin/../lib/clhm-production.jar:/opt/cassandra/bin/../lib/commons-cli-1.1.jar:/opt/cassandra/bin/../lib/commons-codec-1.2.jar:/opt/cassandra/bin/../lib/commons-collections-3.2.1.jar:/opt/cassandra/bin/../lib/commons-lang-2.4.jar:/opt/cassandra/bin/../lib/google-collections-1.0.jar:/opt/cassandra/bin/../lib/hadoop-core-0.20.1.jar:/opt/cassandra/bin/../lib/high-scale-lib.jar:/opt/cassandra/bin/../lib/ivy-2.1.0.jar:/opt/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/opt/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/opt/cassandra/bin/../lib/jline-0.9.94.jar:/opt/cassandra/bin/../lib/json-simple-1.1.jar:/opt/cassandra/bin/../lib/libthrift-r917130.jar:/opt/cassandra/bin/../lib/log4j-1.2.14.jar:/opt/cassandra/bin/../lib/slf4j-api-1.5.8.jar:/opt/cassandra/bin/../lib/slf4j-log4j12-1.5.8.jar
 org.apache.cassandra.thrift.CassandraDaemon

 I set the max memory size to 7G yet Cassandra is taking


 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND



 6992 root  18   0 32.8g  29g  21g S   19 93.9   5685:14
 /opt/java/bin/java -ea -Xms1G* -Xmx7G *-XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
 -XX:M

 why is that, and how do I cap the memory used for cassandra? Is this a bug
 or a mistake on my part?




servers for cassandra

2010-09-04 Thread vineet daniel
Hi

I am just curious to know if there is any hosting company that provides
servers at a very low cost, wherein I can install cassandra on WAN. I have
cassandra setup in my LAN and want to test it in real conditions, taking
dedicated servers just for testing purposes is not at all feasible for me
not even pay-as-you go types. I'd really appreciate if anybody can share
information on such hosting providers.

Vineet Daniel
Cell  : +918106217121
Websites :
Blog http://vinetedaniel.blogspot.com   |
Linkedinhttp://in.linkedin.com/in/vineetdaniel
|  Twitter https://twitter.com/vineetdaniel


Re: 4k keyspaces... Maybe we're doing it wrong?

2010-09-03 Thread vineet daniel
If I am correct than you need to restart cassandra whenever you adding a new
KeySpace. Thats another concern.

Vineet Daniel
Cell  : +91-8106217121
Websites :
Blog http://vinetedaniel.blogspot.com   |
Linkedinhttp://in.linkedin.com/in/vineetdaniel
|  Twitter https://twitter.com/vineetdaniel





On Fri, Sep 3, 2010 at 2:58 PM, Mike Peters
cassan...@softwareprojects.comwrote:

  Very interesting. Thank you

 So it sounds like other than being able to quickly truncate
 customer-keyspaces, with Cassandra there's no real benefit in keeping each
 customer data in a separate keyspace.

 We'll suffer on the memory side with all the switching between keyspaces
 and we're better off storing all customer data under the same keyspace?



 On 9/2/2010 11:29 PM, Aaron Morton wrote:

 Create one big happy love in keyspace. Use the key structure to identify
 the different clients data.

  The is more support for multi tenancy systems but a lot of the memory
 configuration is per keyspace/column family, so you cannot run that many
 keyspaces.

  This page has some more information
 http://wiki.apache.org/cassandra/MultiTenant

   Aaron


 On 03 Sep, 2010,at 01:25 PM, Mike Peters 
 cassan...@softwareprojects.comcassan...@softwareprojects.comwrote:

Hi,

 We're in the process of migrating 4,000 MySQL client databases to
 Cassandra. All database schemas are identical.

 With MySQL, we used to provision a separate 'database' per each client,
 to make it easier to shard and move things around.

 Does it make sense to migrate the 4,000 MySQL databases to 4,000
 keyspaces in Cassandra? Or should we stick with a single keyspace?

 My concerns are -
 #1. Will every single node end up with 4k folders under /cassandra/data/?

 #2. Performance: Will Cassandra work better with a single keyspace +
 lots of keys, or thousands of keyspaces?

 -

 Granted it's 'cleaner' to have a separate keyspace per each client, but
 maybe that's not the best approach with Cassandra.

 Thoughts?





Re: Looking for something like like of mysql.

2010-09-02 Thread vineet daniel
you can try using different CF for different result sets or inverted index.
but looking at the number of inserts that you have..it will become
complicated. The first thing that you need to do is stop thinking in terms
of any RDBMS as cassandra is not at all like them.
___
Regards
Vineet Daniel
+918106217121
___

Let your email find you


On Thu, Sep 2, 2010 at 10:00 PM, Mike Peters cassan...@softwareprojects.com
 wrote:

  Cassandra doesn't support adhoc queries, like what you're describing

 I recommend looking at Lucandra http://github.com/tjake/Lucandra


 On 9/2/2010 12:27 PM, Anuj Kabra wrote:

 I am working with cassandra-0.6.4. I am working on mail retreival problem.
 We have the metadata of mail like sender, recipient, timestamp, subject and
 the location of mail file stored in a cassandra DB.Everyday about 25,000
 records will

 be entered to this DB. We have not finalised on the data model yet but
 starting with a simple one having only one column family.
 ColumnFamily name=MailMetadata CompareWith=UTF8Type
 which have user_id of recipient as key.and columns for sender_id, timestamp
 of mail, subject and location of mail file.
 Now our Use case is to get the locations of all mail files which are being
 sent by a user matching a given subject(can be a part of the original
 subject of mail). Well according to my knowledge till now, we can get all
 the rows of a user

 by using user_id as key. After that i need to iterate over all the rows i
 get and see which mail seems to fit the given condition.(matching a subject
 in this case), which is very heavy computationally as we would get thousands
 of rows.
 So we are looking for something like like of mysql provided by thrift. I
 also need to know if am going the right way.
 Help is much appreciated.





Re: Data Modeling Conundrum

2010-05-08 Thread vineet daniel
Query : Why are you sorting AFAIK cassandra sorts the keys by itself if you
are using ordered partitioning. And how do you store data pertaining to
single user but having several GUID's to attach with.


___
Vineet Daniel
___

Let your email find you


On Sat, May 8, 2010 at 9:01 AM, William Ashley wash...@gmail.com wrote:

 List,
 I have a case where visitors to a site are tracked via a persistent cookie
 containing a guid. This cookie is created and set when missing. Some of
 these visitors are logged in, meaning a userId may also be available. What
 I’m looking to do is have a way to associate each userId with all of the
 guids that it has been seen with. Conceptually, this would identify the
 unique (device, browser) pairs for each userId. The catch is that I want to
 be able to retrieve the most-recently-seen N guids for a userId.


 One possible solution to this problem in SQL looks like this (made up on
 the fly):
 # Table schema
 CREATE TABLE UserGuid ( userId INT, guid VARCHAR, when TIMESTAMP, PRIMARY
 KEY( userId, guid ), INDEX( userId, when ) );

 # For each request with guid G and userId U at time T
 INSERT INTO UserGuid ( userId, guid, when ) VALUES ( U, G, T ) ON DUPLICATE
 KEY UPDATE SET when = T;

 # To get most recent N guids for userId U
 SELECT guid FROM UserGuid WHERE userId = U ORDER BY when DESC LIMIT N;


 Hopefully I’ve sufficiently explained what I’m trying to do. Now on to
 solving this problem in Cassandra. I’ve been trying to find a way that
 allows both of the above operations to be performed efficiently. Updates are
 a breeze with a structure like this:

 // Row key is userId
 12345 : {
  // Column name is guid
  ‘256fb890-5a4b-11df-a08a-0800200c9a66’ : {
// Column timestamp is last time guid was seen
timestamp : 387587235233
  }
 }

 but getting the last N recently seen guids requires pulling all columns and
 sorting by timestamp. Retrievals can be done efficiently with a structure
 taking advantage of column sorting:

 // Row key is userId
 12345 : {
  // Column name is last time guid was seen
  387587235233 : {
// Column value is guid
value: ‘256fb890-5a4b-11df-a08a-0800200c9a66’
  }
 }

 where we use a slice get on the row with limit N (and reverse order).
 However, updates involve pulling all columns to de-duplicate guid values.
 Neither solution is ideal, and so I present this to you fine gentlemen who
 have more experience modeling data in Cassandra than I.

 I would much prefer to avoid any solutions that require pulling an
 indeterminate amount of data for either operation. For the time being I am
 using the first method and only pulling the first M columns, sorting, and
 taking the top N (M = N).

 One thing I was thinking would be nice (if possible), is to have a column
 family where columns are either sorted by their timestamp, or by the time
 the column was created/updated (which may be equivalent to not sorting at
 all, but I have not looked at the implementation).

 I appreciate any feedback or suggestions you might have.
 - William




release date for 0.7 ?

2010-05-08 Thread vineet daniel
Hi

What is the expected release date for 0.7 and what will be the feature
specifications for it ?

___
Vineet Daniel
___

Let your email find you


Re: bloom filter

2010-05-07 Thread vineet daniel
Thanks David and Peter.

Is there any way to view the content of this file.
___
Vineet Daniel
___

Let your email find you


On Fri, May 7, 2010 at 4:24 PM, David Strauss da...@fourkitchens.comwrote:

 On 2010-05-07 10:51, vineet daniel wrote:
  what is the benefit of creating bloom filter when cassandra writes data,
  how does it helps ?

 http://wiki.apache.org/cassandra/ArchitectureOverview

 --
 David Strauss
   | da...@fourkitchens.com
 Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]




Re: bloom filter

2010-05-07 Thread vineet daniel
1. Peter said 'without going to disk' so that means bloom filters reside in
memory, always or just when request to that particular CF is made.
2. It is also important for identifying which SSTable files to look inside
even when a key is present. - David can you please throw some more light on
your point, like what are the implications, why do we need to identify etc.


___
Vineet Daniel
___

Let your email find you


On Fri, May 7, 2010 at 4:28 PM, David Strauss da...@fourkitchens.comwrote:

 On 2010-05-07 10:55, Peter Schüller wrote:
  what is the benefit of creating bloom filter when cassandra writes data,
 how
  does it helps ?
 
  It allows Cassandra to answer requests for non-existent keys without
  going to disk, except in cases where the bloom filter gives a false
  positive.
 
  See:
 
 
 http://spyced.blogspot.com/2009/01/all-you-ever-wanted-to-know-about.html

 It is also important for identifying which SSTable files to look inside
 even when a key is present.

 --
 David Strauss
   | da...@fourkitchens.com
 Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]




Re: Cassandra Streaming Service

2010-05-06 Thread vineet daniel
For more details have a look here :

http://wiki.apache.org/cassandra/Streaming

___
Vineet Daniel
___

Let your email find you


On Wed, May 5, 2010 at 9:34 PM, Weijun Li weiju...@gmail.com wrote:

 Thank you Jonathan! Good to know.


 On Tue, May 4, 2010 at 9:13 PM, Jonathan Ellis jbel...@gmail.com wrote:

 The Streaming service is what moves data around for load balancing,
 bootstrap, and decommission operations.

 On Tue, May 4, 2010 at 8:08 PM, Weijun Li weiju...@gmail.com wrote:
  A dumb question: what is the use of Cassandra streaming service? Any use
  case or example?
 
  Thanks,
  -Weijun
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com





why is streaming done in 32 MB chunks ?

2010-05-06 Thread vineet daniel
Hi

Just out of curiosity want to know why streaming is done with 32MB chunks
and not with 16 or 64 MB chunks. Any specific reasons behind 32 MB or its
just like that ?


___
Vineet Daniel
___

Let your email find you


Re: How do you, Bloom filter of the false positive rate or remove the problem of distributed databases?

2010-05-04 Thread vineet daniel
Only major compactions can clean out obsolete tombstones.

On Tue, May 4, 2010 at 9:59 AM, Jonathan Ellis jbel...@gmail.com wrote:

 On Mon, May 3, 2010 at 8:45 PM, Kauzki Aranami kazuki.aran...@gmail.com
 wrote:
  Let me rephrase my question.
 
  How does Cassandra deal with bloom filter's false positives on deleted
 records?

 The same way it deals with tombstones that it encounters otherwise
 (part of a row slice, or in a memtable).

 All the bloom filter does is keep you from having to check rows that
 don't have any data at all for a given key.  Tombstones are not the
 same as no data at all, we do need to propagate tombstones during
 replication.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com



Re: Design Query

2010-05-04 Thread vineet daniel
As you havent specified all the details pertaining to filters and your data
layout (structure) at a very high level what i can suggest is that you need
to create a seperate CF for each filter.


On Sat, May 1, 2010 at 5:04 PM, Rakesh Rajan rakes...@gmail.com wrote:

 I am evaluating cassandra to implement activity streams. We currently have
 over 100 feeds with total entries exceeding 32000 implemented using
 redis ( ~320 entries / feed). Would like hear from the community on how to
 use cassandra to solve the following cases:

1. Ability to fetch entries by applying a few filters ( like show me
only likes from a given user). This would include range query to support
pagination. So this would mean indices on a few columns like the feed id,
feed type etc.
2. We have around 3 machines with 4GB RAM for this purpose and thinking
of having replication factor 2. Would 4GB * 3 be enough for cassandra for
this kind of data? I read that cassandra does not keep all the data in
memory but want to be sure that we have the right server config to handle
this data using cassandra.

 Thanks,
 Rakesh



how to fetch latest data

2010-05-04 Thread vineet daniel
Hi

In a cluster of cassandra if we are updating any key/value and perform the
fetch query on that same key, we get old/stale data. This can be because of
Read Repair.

Is there any way to fetch the latest updated data from the cluster, as old
data stands no significance and showing it to client is more irrelevant.

Regards
Vineet Daniel


Re: how to fetch latest data

2010-05-04 Thread vineet daniel
If R + W  N, where R, W, and N are respectively the read replica count, the
write replica count, and the replication factor, all client reads will see
the most recent write.

On Tue, May 4, 2010 at 4:39 PM, vineet daniel vineetdan...@gmail.comwrote:

 Hi

 In a cluster of cassandra if we are updating any key/value and perform the
 fetch query on that same key, we get old/stale data. This can be because of
 Read Repair.

 Is there any way to fetch the latest updated data from the cluster, as old
 data stands no significance and showing it to client is more irrelevant.

 Regards
 Vineet Daniel



can we have duplicate keys ?

2010-04-29 Thread vineet daniel
Hi

Can anyone please tell me if we can have duplicate keys in Super Column
Family, if now how can we represent this : -

Article and Category Mapping

 clientOne.insert(:ArticleCategory, 12, {ArticleID = 123})
  12, {ArticleID =
124})
   12, {ArticleID =
125})
  12, {ArticleID =
126})

Here 12 is they key for a Category name 'sample' and all the four articles
are part of this key or sample. Is this right or I need to do something else
?


compare cassandra read n write results

2010-04-12 Thread vineet daniel
Hi

A little while ago I tried cassandra's read n write operations and timed it.

I am using Pandra for communication with cassandra. System is CentOS 5 with
2 GB RAM and dual core.

I inserted 10 rows in around 30 secs and read the same in 25 seconds.

If anyone of you have run similar tests can you please share or tell whether
this can be improved or not. I am using default configuration of cassandra
and its a single node setup.

Thanks
Vineet Daniel


Re: compare cassandra read n write results

2010-04-12 Thread vineet daniel
I dont think it would be a good idea not to use pandra for benchmarks as we
are going to use pandra for our application. Secondly, it will give Pandra
guys some boost to enhance the performance of thier library.

On Mon, Apr 12, 2010 at 6:05 PM, Jordan Pittier jordan.pitt...@gmail.comwrote:

 Hi,
 If you really want to benchmark your box, you should concidere not using
 Pandra nor any library built upon Thrift. They all come with a (small)
 overhead.

 I also realized when I made my first benchmark that most of my box's
 ressources was used by the benchmarking tool it self and not by Canssandra.
 I recommend using 2 boxes if possible, one for running the benchmark tool
 against the other which will run Cassandra (both boxes have to be in the
 same LAN).

 On Mon, Apr 12, 2010 at 1:55 PM, vineet daniel vineetdan...@gmail.comwrote:

 Hi

 A little while ago I tried cassandra's read n write operations and timed
 it.
 I am using Pandra for communication with cassandra. System is CentOS 5
 with 2 GB RAM and dual core.

 I inserted 10 rows in around 30 secs and read the same in 25 seconds.

 If anyone of you have run similar tests can you please share or tell
 whether this can be improved or not. I am using default configuration of
 cassandra and its a single node setup.

 Thanks
 Vineet Daniel





Re: compare cassandra read n write results

2010-04-12 Thread vineet daniel
Actually, to be honest I dont know how to insert 100 rows without PHP or
Pandra. If you could help me out I will surely try it and will share the
results with you guys.

On Mon, Apr 12, 2010 at 7:25 PM, Paul Prescod pres...@gmail.com wrote:

 How will they know whether the performance problem is caused by
 Cassandra or Pandra if you do not have raw Cassandra performance
 numbers for your setup?

 On Mon, Apr 12, 2010 at 5:51 AM, vineet daniel vineetdan...@gmail.com
 wrote:
  I dont think it would be a good idea not to use pandra for benchmarks as
 we
  are going to use pandra for our application. Secondly, it will give
 Pandra
  guys some boost to enhance the performance of thier library.



Re: How to perform queries on Cassandra?

2010-04-11 Thread vineet daniel
How to handle same usernames. Otherwise seems fine to me.

On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun su...@dopsun.com wrote:

  Hi,



 As far as I can see it, the Cassandra API currently supports criterias on:

 Token – Key – Super Column Name (if applicable) - Column Names



 I guess Token is not usually used for the day to day queries, so, Key and
 Column Names are normally used for querying. For the user name and password
 case, I guess it can be done like this:



 Define a CF as UserAuth with type as Super, and Key is user name, while
 password can be the SuperKeyName. So, while you receive the user name and
 password from the UI (or any other methods), it can be queried via:
 multiget_slice or get_range_slices, if there are anything returned, means
 that the user name and password matches.



 If not using the super column name, and put the password as the column
 name, the column name usually not used for these kind of discretionary
 values (actually, I don’t see any definitive documents on how to use the
 column Names and Super Columns, flexibility is the good of Cassandra, or is
 it bad if abused? :P)



 Not sure whether this is the best way, but I guess it will work.



 Regards,

 Dop



 *From:* Lucifer Dignified [mailto:vineetdan...@gmail.com]
 *Sent:* Sunday, April 11, 2010 5:33 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: How to perform queries on Cassandra?



 Hi Benjamin

 I'll try to make it more clear to you.
 We have a user table with fields 'id', 'username', and 'password'. Now if
 use the ideal way to store key/value, like :
 username : vineetdaniel
 timestamp
 password : password
 timestamp

 second user :

 username: seconduser
 timestamp
 password:password

 and so on, here what i assume is that as we cannot make search on values
 (as confirmed by guys on cassandra forums) we are not able to perform robust
 'where' queries. Now what i propose is this.

 Rather than using a static values for column names use values itself and
 unique key as identifier. So, the above example when put in as per me would
 be.

 vineetdaniel : vineetdaniel
 timestamp

 password:password
 timestamp

 second user
 seconduser:seconduser
 timestamp

 password:password
 timestamp

 By using above methodology we can simply make search on keys itself rather
 than going into using different CF's. But to add further, this cannot be
 used for every situation. I am still exploring this, and soon will be
 updating the group and my blog with information pertaining to this. As
 cassandra is new, I think every idea or experience should be shared with the
 community.

 I hope I example is clear this time. Should you have any queries feel free
 to revert.

 On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black b...@b3k.us wrote:

 Sorry, I don't understand your example.


 On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
 vineetdan...@gmail.com wrote:
  Benjamin I quite agree to you, but what in case of duplicate usernames,
  suppose if I am not using unique names as in email id's . If we have
  duplicacy in usernames we cannot use it for key, so what should be the
  solution. I think keeping incremental numeric id as key and keeping the
 name
  and value same in the column family.
 
  Example :
  User1 has password as 123456
 
  Cassandra structure :
 
  1 as key
 user1 - column name
 value - user1
 123456 - column name
  value - 123456
 
  I m thinking of doing it this way for my applicaton, this way i can run
  different sorts of queries too. Any feedback on this is welcome.
 
  On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black b...@b3k.us wrote:
 
  You would have a Column Family, not a column for that; let's call it
  the Users CF.  You'd use username as the row key and have a column
  called 'password'.  For your example query, you'd retrieve row key
  'usr2', column 'password'.  The general pattern is that you create CFs
  to act as indices for each query you want to perform.  There is no
  equivalent to a relational store to perform arbitrary queries.  You
  must structure things to permit the queries of interest.
 
 
  b
 
  On Sat, Apr 10, 2010 at 8:34 PM, dir dir sikerasa...@gmail.com wrote:
   I have already read the API spesification. Honestly I do not
 understand
   how to use it. Because there are not an examples.
  
   For example I have a column like this:
  
   UserNamePassword
   usr1abc
   usr2xyz
   usr3opm
  
   suppose I want query the user's password using SQL in RDBMS
  
 Select Password From Users Where UserName = usr2;
  
   Now I want to get the password using OODBMS DB4o Object Query  and
 Java
  
ObjectSet QueryResult = db.query(new Predicate()
{
   public boolean match(Users Myusers)
   {
return Myuser.getUserName() == usr2;
   }
});
  
   After we get the Users instance in the QueryResult, hence we 

Re: How to perform queries on Cassandra?

2010-04-11 Thread vineet daniel
its not a problem its a scenario, which we need to handle. And all I am
trying to do is to achieve what is not there with API i.e a workaroud.

On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black b...@b3k.us wrote:

 A system that permits multiple people to have the same username has a
 serious problem.

 On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel vineetdan...@gmail.com
 wrote:
  How to handle same usernames. Otherwise seems fine to me.
 
  On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun su...@dopsun.com wrote:
 
  Hi,
 
 
 
  As far as I can see it, the Cassandra API currently supports criterias
 on:
 
  Token – Key – Super Column Name (if applicable) - Column Names
 
 
 
  I guess Token is not usually used for the day to day queries, so, Key
 and
  Column Names are normally used for querying. For the user name and
 password
  case, I guess it can be done like this:
 
 
 
  Define a CF as UserAuth with type as Super, and Key is user name, while
  password can be the SuperKeyName. So, while you receive the user name
 and
  password from the UI (or any other methods), it can be queried via:
  multiget_slice or get_range_slices, if there are anything returned,
 means
  that the user name and password matches.
 
 
 
  If not using the super column name, and put the password as the column
  name, the column name usually not used for these kind of discretionary
  values (actually, I don’t see any definitive documents on how to use the
  column Names and Super Columns, flexibility is the good of Cassandra, or
 is
  it bad if abused? :P)
 
 
 
  Not sure whether this is the best way, but I guess it will work.
 
 
 
  Regards,
 
  Dop
 
 
 
  From: Lucifer Dignified [mailto:vineetdan...@gmail.com]
  Sent: Sunday, April 11, 2010 5:33 PM
  To: user@cassandra.apache.org
  Subject: Re: How to perform queries on Cassandra?
 
 
 
  Hi Benjamin
 
  I'll try to make it more clear to you.
  We have a user table with fields 'id', 'username', and 'password'. Now
 if
  use the ideal way to store key/value, like :
  username : vineetdaniel
  timestamp
  password : password
  timestamp
 
  second user :
 
  username: seconduser
  timestamp
  password:password
 
  and so on, here what i assume is that as we cannot make search on values
  (as confirmed by guys on cassandra forums) we are not able to perform
 robust
  'where' queries. Now what i propose is this.
 
  Rather than using a static values for column names use values itself and
  unique key as identifier. So, the above example when put in as per me
 would
  be.
 
  vineetdaniel : vineetdaniel
  timestamp
 
  password:password
  timestamp
 
  second user
  seconduser:seconduser
  timestamp
 
  password:password
  timestamp
 
  By using above methodology we can simply make search on keys itself
 rather
  than going into using different CF's. But to add further, this cannot be
  used for every situation. I am still exploring this, and soon will be
  updating the group and my blog with information pertaining to this. As
  cassandra is new, I think every idea or experience should be shared with
 the
  community.
 
  I hope I example is clear this time. Should you have any queries feel
 free
  to revert.
 
  On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black b...@b3k.us wrote:
 
  Sorry, I don't understand your example.
 
  On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
  vineetdan...@gmail.com wrote:
   Benjamin I quite agree to you, but what in case of duplicate
 usernames,
   suppose if I am not using unique names as in email id's . If we have
   duplicacy in usernames we cannot use it for key, so what should be the
   solution. I think keeping incremental numeric id as key and keeping
 the
   name
   and value same in the column family.
  
   Example :
   User1 has password as 123456
  
   Cassandra structure :
  
   1 as key
  user1 - column name
  value - user1
  123456 - column name
   value - 123456
  
   I m thinking of doing it this way for my applicaton, this way i can
 run
   different sorts of queries too. Any feedback on this is welcome.
  
   On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black b...@b3k.us wrote:
  
   You would have a Column Family, not a column for that; let's call it
   the Users CF.  You'd use username as the row key and have a column
   called 'password'.  For your example query, you'd retrieve row key
   'usr2', column 'password'.  The general pattern is that you create
 CFs
   to act as indices for each query you want to perform.  There is no
   equivalent to a relational store to perform arbitrary queries.  You
   must structure things to permit the queries of interest.
  
  
   b
  
   On Sat, Apr 10, 2010 at 8:34 PM, dir dir sikerasa...@gmail.com
 wrote:
I have already read the API spesification. Honestly I do not
understand
how to use it. Because there are not an examples.
   
For example I have a column like this:
   
UserNamePassword
usr1abc
usr2

Re: How to perform queries on Cassandra?

2010-04-11 Thread vineet daniel
Well my initial idea is to use value  as column name, keeping key as an
incremental integer. The discussion after each mail has drifted from this
point which I had made. Will put it again.

we want to store user information. We keep 1,2,3,4.so on as keys. AND
values as column names i.e rather than using column name 'first name', i'd
be using 'vineet' as column name, rather than using 'last name' as column
name i'd be using 'daniel'. This way I can directly read column names as
values. This is just a thought that has come to my mind while trying to
design my db for cassandra.



On Sun, Apr 11, 2010 at 11:46 PM, Benjamin Black b...@b3k.us wrote:

 Row keys must be unique.  If your usernames are not unique and you
 want to be able to query on them, you either need to figure out a way
 to make them unique or treat the username rows themselves as indices,
 which refer to a set of actually unique identifiers for users.

 On Sun, Apr 11, 2010 at 11:12 AM, vineet daniel vineetdan...@gmail.com
 wrote:
  its not a problem its a scenario, which we need to handle. And all I am
  trying to do is to achieve what is not there with API i.e a workaroud.
 
  On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black b...@b3k.us wrote:
 
  A system that permits multiple people to have the same username has a
  serious problem.
 
  On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel vineetdan...@gmail.com
  wrote:
   How to handle same usernames. Otherwise seems fine to me.
  
   On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun su...@dopsun.com wrote:
  
   Hi,
  
  
  
   As far as I can see it, the Cassandra API currently supports
 criterias
   on:
  
   Token – Key – Super Column Name (if applicable) - Column Names
  
  
  
   I guess Token is not usually used for the day to day queries, so, Key
   and
   Column Names are normally used for querying. For the user name and
   password
   case, I guess it can be done like this:
  
  
  
   Define a CF as UserAuth with type as Super, and Key is user name,
 while
   password can be the SuperKeyName. So, while you receive the user name
   and
   password from the UI (or any other methods), it can be queried via:
   multiget_slice or get_range_slices, if there are anything returned,
   means
   that the user name and password matches.
  
  
  
   If not using the super column name, and put the password as the
 column
   name, the column name usually not used for these kind of
 discretionary
   values (actually, I don’t see any definitive documents on how to use
   the
   column Names and Super Columns, flexibility is the good of Cassandra,
   or is
   it bad if abused? :P)
  
  
  
   Not sure whether this is the best way, but I guess it will work.
  
  
  
   Regards,
  
   Dop
  
  
  
   From: Lucifer Dignified [mailto:vineetdan...@gmail.com]
   Sent: Sunday, April 11, 2010 5:33 PM
   To: user@cassandra.apache.org
   Subject: Re: How to perform queries on Cassandra?
  
  
  
   Hi Benjamin
  
   I'll try to make it more clear to you.
   We have a user table with fields 'id', 'username', and 'password'.
 Now
   if
   use the ideal way to store key/value, like :
   username : vineetdaniel
   timestamp
   password : password
   timestamp
  
   second user :
  
   username: seconduser
   timestamp
   password:password
  
   and so on, here what i assume is that as we cannot make search on
   values
   (as confirmed by guys on cassandra forums) we are not able to perform
   robust
   'where' queries. Now what i propose is this.
  
   Rather than using a static values for column names use values itself
   and
   unique key as identifier. So, the above example when put in as per me
   would
   be.
  
   vineetdaniel : vineetdaniel
   timestamp
  
   password:password
   timestamp
  
   second user
   seconduser:seconduser
   timestamp
  
   password:password
   timestamp
  
   By using above methodology we can simply make search on keys itself
   rather
   than going into using different CF's. But to add further, this cannot
   be
   used for every situation. I am still exploring this, and soon will be
   updating the group and my blog with information pertaining to this.
 As
   cassandra is new, I think every idea or experience should be shared
   with the
   community.
  
   I hope I example is clear this time. Should you have any queries feel
   free
   to revert.
  
   On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black b...@b3k.us wrote:
  
   Sorry, I don't understand your example.
  
   On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
   vineetdan...@gmail.com wrote:
Benjamin I quite agree to you, but what in case of duplicate
usernames,
suppose if I am not using unique names as in email id's . If we
 have
duplicacy in usernames we cannot use it for key, so what should be
the
solution. I think keeping incremental numeric id as key and keeping
the
name
and value same in the column family.
   
Example :
User1 has password as 123456
   
Cassandra

Re: How to perform queries on Cassandra?

2010-04-11 Thread vineet daniel
I assume that using the key i can get the all the columns like an array. Now
i'd be using php to extract  arraykey=value in that array, just want to
avoid that i.e i can directly print the column names. If you guys think its
not a good idea I can drop it, anyways m new to it and a lot of things are
coming to mind. As far as cassandra and columnfamily/ super columns are
concerned i am pretty clear.

On Mon, Apr 12, 2010 at 12:23 AM, Benjamin Black b...@b3k.us wrote:

 I have no idea what problem you are trying to solve.  You are
 misunderstanding a number of things about the Cassandra data model and
 about how we are explaining it is best used.

 On Sun, Apr 11, 2010 at 11:37 AM, vineet daniel vineetdan...@gmail.com
 wrote:
  Well my initial idea is to use value  as column name, keeping key as an
  incremental integer. The discussion after each mail has drifted from this
  point which I had made. Will put it again.
 
  we want to store user information. We keep 1,2,3,4.so on as keys. AND
  values as column names i.e rather than using column name 'first name',
 i'd
  be using 'vineet' as column name, rather than using 'last name' as column
  name i'd be using 'daniel'. This way I can directly read column names as
  values. This is just a thought that has come to my mind while trying to
  design my db for cassandra.
 
 
 
  On Sun, Apr 11, 2010 at 11:46 PM, Benjamin Black b...@b3k.us wrote:
 
  Row keys must be unique.  If your usernames are not unique and you
  want to be able to query on them, you either need to figure out a way
  to make them unique or treat the username rows themselves as indices,
  which refer to a set of actually unique identifiers for users.
 
  On Sun, Apr 11, 2010 at 11:12 AM, vineet daniel vineetdan...@gmail.com
 
  wrote:
   its not a problem its a scenario, which we need to handle. And all I
 am
   trying to do is to achieve what is not there with API i.e a workaroud.
  
   On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black b...@b3k.us wrote:
  
   A system that permits multiple people to have the same username has a
   serious problem.
  
   On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel 
 vineetdan...@gmail.com
   wrote:
How to handle same usernames. Otherwise seems fine to me.
   
On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun su...@dopsun.com wrote:
   
Hi,
   
   
   
As far as I can see it, the Cassandra API currently supports
criterias
on:
   
Token – Key – Super Column Name (if applicable) - Column Names
   
   
   
I guess Token is not usually used for the day to day queries, so,
Key
and
Column Names are normally used for querying. For the user name and
password
case, I guess it can be done like this:
   
   
   
Define a CF as UserAuth with type as Super, and Key is user name,
while
password can be the SuperKeyName. So, while you receive the user
name
and
password from the UI (or any other methods), it can be queried
 via:
multiget_slice or get_range_slices, if there are anything
 returned,
means
that the user name and password matches.
   
   
   
If not using the super column name, and put the password as the
column
name, the column name usually not used for these kind of
discretionary
values (actually, I don’t see any definitive documents on how to
 use
the
column Names and Super Columns, flexibility is the good of
Cassandra,
or is
it bad if abused? :P)
   
   
   
Not sure whether this is the best way, but I guess it will work.
   
   
   
Regards,
   
Dop
   
   
   
From: Lucifer Dignified [mailto:vineetdan...@gmail.com]
Sent: Sunday, April 11, 2010 5:33 PM
To: user@cassandra.apache.org
Subject: Re: How to perform queries on Cassandra?
   
   
   
Hi Benjamin
   
I'll try to make it more clear to you.
We have a user table with fields 'id', 'username', and 'password'.
Now
if
use the ideal way to store key/value, like :
username : vineetdaniel
timestamp
password : password
timestamp
   
second user :
   
username: seconduser
timestamp
password:password
   
and so on, here what i assume is that as we cannot make search on
values
(as confirmed by guys on cassandra forums) we are not able to
perform
robust
'where' queries. Now what i propose is this.
   
Rather than using a static values for column names use values
 itself
and
unique key as identifier. So, the above example when put in as per
me
would
be.
   
vineetdaniel : vineetdaniel
timestamp
   
password:password
timestamp
   
second user
seconduser:seconduser
timestamp
   
password:password
timestamp
   
By using above methodology we can simply make search on keys
 itself
rather
than going into using different CF's. But to add further, this
cannot
be
used for every situation. I am still exploring

Re: How to perform queries on Cassandra?

2010-04-11 Thread vineet daniel
I am dropping the idea, dont want to irritate you guys more. I've got your
points.

On Mon, Apr 12, 2010 at 12:41 AM, Benjamin Black b...@b3k.us wrote:

 Just to be clear: do you understand we are saying you need to use
 multiple CFs to achieve the goal, not a single one?

 The Users CF would be indexed on a unique integer as you are saying
 you intend.  There is no point in having values as column names here,
 other than making things incredibly confusing.  Assume instead that
 you have a column called 'username' and a column called 'password'.
 In your model where usernames may be the same for different users, you
 would have data that looked like this:

 0: {'username':'usr1', 'password':'woop'}
 1: {'username':'usr2', 'password':'foo'}
 2: {'username':'usr2', 'password':'bar'}

 The UsernameIndex CF would be indexed on usernames, giving a map from
 a username to the unique identifiers in the Users CF with that
 username:

 'usr1': {0:0}
 'usr2': {1:0, 2:0}

 Note that since we don't care about the values in the UsernameIndex,
 they are just set to 0.  You can stash data here, if you like, but it
 can mean more overhead in maintaining data synchronization between the
 raw data and the index data.  To perform your query on username
 'usr2', you get 'usr2' from UsernameIndex CF, which gives you a set of
 ids, and you then get those ids (1 and 2) from the Users CF.


 b


 On Sun, Apr 11, 2010 at 11:37 AM, vineet daniel vineetdan...@gmail.com
 wrote:
  Well my initial idea is to use value  as column name, keeping key as an
  incremental integer. The discussion after each mail has drifted from this
  point which I had made. Will put it again.
 
  we want to store user information. We keep 1,2,3,4.so on as keys. AND
  values as column names i.e rather than using column name 'first name',
 i'd
  be using 'vineet' as column name, rather than using 'last name' as column
  name i'd be using 'daniel'. This way I can directly read column names as
  values. This is just a thought that has come to my mind while trying to
  design my db for cassandra.
 
 
 
  On Sun, Apr 11, 2010 at 11:46 PM, Benjamin Black b...@b3k.us wrote:
 
  Row keys must be unique.  If your usernames are not unique and you
  want to be able to query on them, you either need to figure out a way
  to make them unique or treat the username rows themselves as indices,
  which refer to a set of actually unique identifiers for users.
 
  On Sun, Apr 11, 2010 at 11:12 AM, vineet daniel vineetdan...@gmail.com
 
  wrote:
   its not a problem its a scenario, which we need to handle. And all I
 am
   trying to do is to achieve what is not there with API i.e a workaroud.
  
   On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black b...@b3k.us wrote:
  
   A system that permits multiple people to have the same username has a
   serious problem.
  
   On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel 
 vineetdan...@gmail.com
   wrote:
How to handle same usernames. Otherwise seems fine to me.
   
On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun su...@dopsun.com wrote:
   
Hi,
   
   
   
As far as I can see it, the Cassandra API currently supports
criterias
on:
   
Token – Key – Super Column Name (if applicable) - Column Names
   
   
   
I guess Token is not usually used for the day to day queries, so,
Key
and
Column Names are normally used for querying. For the user name and
password
case, I guess it can be done like this:
   
   
   
Define a CF as UserAuth with type as Super, and Key is user name,
while
password can be the SuperKeyName. So, while you receive the user
name
and
password from the UI (or any other methods), it can be queried
 via:
multiget_slice or get_range_slices, if there are anything
 returned,
means
that the user name and password matches.
   
   
   
If not using the super column name, and put the password as the
column
name, the column name usually not used for these kind of
discretionary
values (actually, I don’t see any definitive documents on how to
 use
the
column Names and Super Columns, flexibility is the good of
Cassandra,
or is
it bad if abused? :P)
   
   
   
Not sure whether this is the best way, but I guess it will work.
   
   
   
Regards,
   
Dop
   
   
   
From: Lucifer Dignified [mailto:vineetdan...@gmail.com]
Sent: Sunday, April 11, 2010 5:33 PM
To: user@cassandra.apache.org
Subject: Re: How to perform queries on Cassandra?
   
   
   
Hi Benjamin
   
I'll try to make it more clear to you.
We have a user table with fields 'id', 'username', and 'password'.
Now
if
use the ideal way to store key/value, like :
username : vineetdaniel
timestamp
password : password
timestamp
   
second user :
   
username: seconduser
timestamp
password:password
   
and so on, here what i assume is that as we cannot make search