RE: General questions about Cassandra
Are there plans to build-in some sort of map-reduce framework into Cassandra and CQL? It seems that users should be able to apply a Java method to selected rows in parallel on the distributed Cassandra JVMs. I believe Solandra uses such an integration. Don From: Alessio Cecchi [ales...@skye.it] Sent: Friday, February 17, 2012 4:42 AM To: user@cassandra.apache.org Subject: General questions about Cassandra Hi, we have developed a software that store logs from mail servers in MySQL, but for huge enviroments we are developing a version that store this data in HBase. Raw logs are, once a day, first normalized, so the output is like this: username,date of login, IP Address, protocol username,date of login, IP Address, protocol username,date of login, IP Address, protocol [...] and after inserted into the database. As I was saying, for huge installation (from 1 to 10 million of logins per day, keep for 12 months) we are working with HBase, but I would also consider Cassandra. The advantage of HBase is MapReduce which makes searching the logs very fast by splitting the "query" concurrently on multiple hosts. Query will be launched from a web interface (will be few requests per day) and the search keys are user and time range. But Cassandra seems less complex to manage and simply to run, so I want to evaluate it instead of HBase. My question is, can also Cassandra split a "query" over the cluster like MapReduce? Reading on-line Cassandra seems fast in insert data but slower than HBase to "query". Is it really so? We want not install Hadoop over Cassandra. Any suggestion is welcome :-) -- Alessio Cecchi is: @ ILS -> http://www.linux.it/~alessice/ on LinkedIn -> http://www.linkedin.com/in/alessice Assistenza Sistemi GNU/Linux -> http://www.cecchi.biz/ @ PLUG -> ex-Presidente, adesso senatore a vita, http://www.prato.linux.it @ LOLUG -> Socio http://www.lolug.net
RE: Suggestion about syntax of CREATE COLUMN FAMILY
I believe you're right!. The change to cli would be an easy fix, I imagine; for backwards compatibility, they'd probably want to allow either the old or new syntax for cli. But I understand their decision not to devote time to a deprecated tool.For cqlsh I hope it's not too late to deprecate the old, unclear syntax. Don From: ehers...@gmail.com [ehers...@gmail.com] Sent: Thursday, December 22, 2011 10:20 AM To: user@cassandra.apache.org Subject: Re: Suggestion about syntax of CREATE COLUMN FAMILY Doesn't CQL have the same issue? http://crlog.info/2011/09/17/cassandra-query-language-cql-v2-0-reference/#Column+Family+Options+(optional) http://www.datastax.com/docs/1.0/references/cql/CREATE_COLUMNFAMILY CREATE COLUMNFAMILY user_events (user text PRIMARY KEY) WITH comparator=timestamp AND default_validation=int; Do CQL enhancements also belong in the same jira project, or somewhere else? Ernie On Thu, Dec 22, 2011 at 11:51 AM, Don Smith mailto:dsm...@likewise.com>> wrote: FYI, I submitted an enhancement ticket<https://issues.apache.org/jira/browse/CASSANDRA-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel> to JIRA about this. The ticket was resolved with the comment: "cli is kept around for backwards compatiblity at this point; cqlsh is 'the future.'" Don From: Stephen Pope [stephen.p...@quest.com<mailto:stephen.p...@quest.com>] Sent: Monday, December 12, 2011 6:34 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: RE: Suggestion about syntax of CREATE COLUMN FAMILY I’d like to second this. I’ve been working with Cassandra for a good while now, but when I first started little things like this were confusing. From: Don Smith [mailto:dsm...@likewise.com<mailto:dsm...@likewise.com>] Sent: Friday, December 09, 2011 3:41 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Suggestion about syntax of CREATE COLUMN FAMILY Currently, the syntax for creating column families is like this: create column family Users with comparator=UTF8Type and default_validation_class=UTF8Type and key_validation_class=UTF8Type; It's not clear what "comparator" and "default_validation_class" refer to. Much clearer would be: create column family Users with column_name_comparator=UTF8Type and column_value_validation_class=UTF8Type and key_validation_class=UTF8Type; BTW, instead of "column_name_comparator", I'd actually prefer "column_key_comparator" since it seems more accurate to call column names "column keys." Don
RE: Suggestion about syntax of CREATE COLUMN FAMILY
FYI, I submitted an enhancement ticket<https://issues.apache.org/jira/browse/CASSANDRA-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel> to JIRA about this. The ticket was resolved with the comment: "cli is kept around for backwards compatiblity at this point; cqlsh is 'the future.'" Don From: Stephen Pope [stephen.p...@quest.com] Sent: Monday, December 12, 2011 6:34 AM To: user@cassandra.apache.org Subject: RE: Suggestion about syntax of CREATE COLUMN FAMILY I’d like to second this. I’ve been working with Cassandra for a good while now, but when I first started little things like this were confusing. From: Don Smith [mailto:dsm...@likewise.com] Sent: Friday, December 09, 2011 3:41 PM To: user@cassandra.apache.org Subject: Suggestion about syntax of CREATE COLUMN FAMILY Currently, the syntax for creating column families is like this: create column family Users with comparator=UTF8Type and default_validation_class=UTF8Type and key_validation_class=UTF8Type; It's not clear what "comparator" and "default_validation_class" refer to. Much clearer would be: create column family Users with column_name_comparator=UTF8Type and column_value_validation_class=UTF8Type and key_validation_class=UTF8Type; BTW, instead of "column_name_comparator", I'd actually prefer "column_key_comparator" since it seems more accurate to call column names "column keys." Don
RE: Cassandra C client implementation
VIrgil apparently lets you access cassandra via a RESTful interface: http://code.google.com/a/apache-extras.org/p/virgil/ Depending on your performance needs and the maturity of virgil's code (I think it's alpha), that may work. You could always fork a java process and pipe to it. Don From: Vlad Paiu [vladp...@opensips.org] Sent: Wednesday, December 14, 2011 8:33 AM To: user@cassandra.apache.org Subject: Re: Cassandra C client implementation Hello, Thanks for your answer. Unfortunately libcassandra is C++ , I'm looking for something written in ANSI C. I've searched alot and my guess is glibc thrift is my only option, but I could not find even one example onto how to make a connection & some queries to Cassandra using glibc thrift. Does anyone have experience/some examples for this ? Regards, Vlad i...@iyyang.com wrote: >BTW please use >https://github.com/eyealike/libcassandra > > >Best Regards, >Yi "Steve" Yang >~~~ >+1-401-441-5086 >+86-13910771510 > >Sent via BlackBerry® from China Mobile > >-Original Message- >From: i...@iyyang.com >Date: Wed, 14 Dec 2011 15:52:56 >To: >Reply-To: i...@iyyang.com >Subject: Re: Cassandra C client implementation > >Try libcassandra, but it doesn't support connection pooling > >--Original Message-- >From: Vlad Paiu >To: user@cassandra.apache.org >ReplyTo: user@cassandra.apache.org >Subject: Cassandra C client implementation >Sent: Dec 14, 2011 11:11 PM > >Hello, > >I am trying to integrate some Cassandra related ops ( insert, get, etc ) into >an application written entirelly in C, so C++ is not an option. > >Is there any C client library for cassandra ? > > I have also tried to generate thrift glibc code for Cassandra, but on > wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C. > >Can anybody suggest a C client library for Cassandra or provide some working >examples for Thrift in C ? > >Thanks and Regards, >Vlad >Best Regards, >Yi "Steve" Yang >~~~ >+1-401-441-5086 >+86-13910771510 > >Sent via BlackBerry® from China Mobile
Suggestion about syntax of CREATE COLUMN FAMILY
Currently, the syntax for creating column families is like this: create column family Users with comparator=UTF8Type and default_validation_class=UTF8Type and key_validation_class=UTF8Type; It's not clear what "comparator" and "default_validation_class" refer to. Much clearer would be: create column family Users with column_name_comparator=UTF8Type and column_value_validation_class=UTF8Type and key_validation_class=UTF8Type; BTW, instead of "column_name_comparator", I'd actually prefer "column_key_comparator" since it seems more accurate to call column names "column keys." Don
re: User Survey
cli's "show keyspaces" command shows way too much information by default. I think by default it should show just one line per keyspace. A "-v" option could show more info. What GUI alternatives are there to cli for browsing a cassandra ring? Lots of people WILL use cli, so it should be spiffy. Thanks, Don
Re: Efficiency of hector's setRowCount (and setStartKey!)
It's actually setStartKey that's the important method call (in combination with setRowCount). So I should have been clearer. The following code performs as expected, as far as returning the expected data in the expected order. I believe that the use of IndexedSliceQuery's setStartKey will support efficient queries -- avoiding repulling the entire data set from cassandra. Correct? void demoPaging() { String lastKey = processPage("don",""); // get first batch, starting with "" (smallest key) lastKey = processPage("don",lastKey);// get second batch starting with previous last key lastKey = processPage("don",lastKey);// get third batch starting with previous last key // } // return last key processed, null when no records left String processPage(String username, String startKey) { String lastKey=null; IndexedSlicesQuery indexedSlicesQuery = HFactory.createIndexedSlicesQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); indexedSlicesQuery.addEqualsExpression("user", username); indexedSlicesQuery.setColumnNames("source","ip"); indexedSlicesQuery.setColumnFamily(ourColumnFamilyName); indexedSlicesQuery.setStartKey(startKey); // < indexedSlicesQuery.setRowCount(batchSize); QueryResultString>> result =indexedSlicesQuery.execute(); OrderedRows rows = result.get(); for(Row row:rows ){ if (row==null) { continue; } totalCount++; String key = row.getKey(); if (!startKey.equals(key)) {lastKey=key;} } totalCount--; return lastKey; } On 10/13/2011 09:15 AM, Patricio Echagüe wrote: Hi Don. No it will not. IndexedSlicesQuery will read just the amount of rows specified by RowCount and will go to the DB to get the new page when needed. SetRowCount is doing indexClause.setCount(rowCount); On Mon, Oct 10, 2011 at 3:52 PM, Don Smith <mailto:dsm...@likewise.com>> wrote: Hector's IndexedSlicesQuery has a setRowCount method that you can use to page through the results, as described in https://github.com/rantav/hector/wiki/User-Guide . rangeSlicesQuery.setRowCount(1001); . rangeSlicesQuery.setKeys(lastRow.getKey(), ""); Is it efficient? Specifically, suppose my query returns 100,000 results and I page through batches of 1000 at a time (making 100 executes of the query). Will it internally retrieve all the results each time (but pass only the desired set of 1000 or so to me)? Or will it optimize queries to avoid the duplication? I presume the latter. :) Can IndexedSlicesQuery's setStartKey method be used for the same effect? Thanks, Don
Efficiency of hector's setRowCount
Hector's IndexedSlicesQuery has a setRowCount method that you can use to page through the results, as described in https://github.com/rantav/hector/wiki/User-Guide . rangeSlicesQuery.setRowCount(1001); . rangeSlicesQuery.setKeys(lastRow.getKey(), ""); Is it efficient? Specifically, suppose my query returns 100,000 results and I page through batches of 1000 at a time (making 100 executes of the query). Will it internally retrieve all the results each time (but pass only the desired set of 1000 or so to me)? Or will it optimize queries to avoid the duplication? I presume the latter. :) Can IndexedSlicesQuery's setStartKey method be used for the same effect? Thanks, Don
Question about sharding of rows and atomicity
Does Cassandra shard the columns of a single row across multiple nodes so that to read the columns of the row it may need access to multiple nodes? I'd say "no." Will a read from a given node ever return partial results or is the write to a node of a row atomic? Thanks, Don