RE: General questions about Cassandra

2012-02-17 Thread Don Smith
Are there plans to build-in some sort of map-reduce framework into Cassandra 
and CQL?   It seems that users should be able to apply a Java method to 
selected rows in parallel  on the distributed Cassandra JVMs.   I believe 
Solandra uses such an integration.

 Don

From: Alessio Cecchi [ales...@skye.it]
Sent: Friday, February 17, 2012 4:42 AM
To: user@cassandra.apache.org
Subject: General questions about Cassandra

Hi,

we have developed a software that store logs from mail servers in MySQL,
but for huge enviroments we are developing a version that store this
data in HBase. Raw logs are, once a day, first normalized, so the output
is like this:

username,date of login, IP Address, protocol
username,date of login, IP Address, protocol
username,date of login, IP Address, protocol
[...]

and after inserted into the database.

As I was saying, for huge installation (from 1 to 10 million of logins
per day, keep for 12 months) we are working with HBase, but I would also
consider Cassandra.

The advantage of HBase is MapReduce which makes searching the logs very
fast by splitting the "query" concurrently on multiple hosts.

Query will be launched from a web interface (will be few requests per
day) and the search keys are user and time range.

But Cassandra seems less complex to manage and simply to run, so I want
to evaluate it instead of HBase.

My question is, can also Cassandra split a "query" over the cluster like
MapReduce? Reading on-line Cassandra seems fast in insert data but
slower than HBase to "query". Is it really so?

We want not install Hadoop over Cassandra.

Any suggestion is welcome :-)

--
Alessio Cecchi is:
@ ILS ->  http://www.linux.it/~alessice/
on LinkedIn ->  http://www.linkedin.com/in/alessice
Assistenza Sistemi GNU/Linux ->  http://www.cecchi.biz/
@ PLUG ->  ex-Presidente, adesso senatore a vita, http://www.prato.linux.it
@ LOLUG ->  Socio http://www.lolug.net



RE: Suggestion about syntax of CREATE COLUMN FAMILY

2011-12-22 Thread Don Smith
I believe you're right!.  The change to cli would be an easy fix, I imagine; 
for backwards compatibility, they'd probably want to allow either the old or 
new syntax for cli.   But I understand their decision not to devote time to a 
deprecated tool.For cqlsh I hope it's not too late to deprecate the old, 
unclear syntax.

 Don

From: ehers...@gmail.com [ehers...@gmail.com]
Sent: Thursday, December 22, 2011 10:20 AM
To: user@cassandra.apache.org
Subject: Re: Suggestion about syntax of CREATE COLUMN FAMILY

Doesn't CQL have the same issue?

http://crlog.info/2011/09/17/cassandra-query-language-cql-v2-0-reference/#Column+Family+Options+(optional)
http://www.datastax.com/docs/1.0/references/cql/CREATE_COLUMNFAMILY


CREATE COLUMNFAMILY user_events (user text PRIMARY KEY)
   WITH comparator=timestamp AND default_validation=int;

Do CQL enhancements also belong in the same jira project, or somewhere else?

Ernie

On Thu, Dec 22, 2011 at 11:51 AM, Don Smith 
mailto:dsm...@likewise.com>> wrote:
FYI, I submitted an enhancement 
ticket<https://issues.apache.org/jira/browse/CASSANDRA-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel>
 to JIRA about this.   The ticket was resolved with the comment: "cli is kept 
around for backwards compatiblity at this point; cqlsh is 'the future.'"

 Don

From: Stephen Pope [stephen.p...@quest.com<mailto:stephen.p...@quest.com>]
Sent: Monday, December 12, 2011 6:34 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: Suggestion about syntax of CREATE COLUMN FAMILY

I’d like to second this. I’ve been working with Cassandra for a good while now, 
but when I first started little things like this were confusing.

From: Don Smith [mailto:dsm...@likewise.com<mailto:dsm...@likewise.com>]
Sent: Friday, December 09, 2011 3:41 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Suggestion about syntax of CREATE COLUMN FAMILY

Currently, the syntax for creating column families is like this:
create column family Users
with comparator=UTF8Type
and default_validation_class=UTF8Type
and key_validation_class=UTF8Type;

It's not clear what "comparator" and "default_validation_class" refer to. Much 
clearer would be:
create column family Users
with column_name_comparator=UTF8Type
and column_value_validation_class=UTF8Type
and key_validation_class=UTF8Type;

BTW, instead of "column_name_comparator", I'd actually prefer 
"column_key_comparator" since it seems more accurate to call column names 
"column keys."

  Don



RE: Suggestion about syntax of CREATE COLUMN FAMILY

2011-12-22 Thread Don Smith
FYI, I submitted an enhancement 
ticket<https://issues.apache.org/jira/browse/CASSANDRA-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel>
 to JIRA about this.   The ticket was resolved with the comment: "cli is kept 
around for backwards compatiblity at this point; cqlsh is 'the future.'"

 Don

From: Stephen Pope [stephen.p...@quest.com]
Sent: Monday, December 12, 2011 6:34 AM
To: user@cassandra.apache.org
Subject: RE: Suggestion about syntax of CREATE COLUMN FAMILY

I’d like to second this. I’ve been working with Cassandra for a good while now, 
but when I first started little things like this were confusing.

From: Don Smith [mailto:dsm...@likewise.com]
Sent: Friday, December 09, 2011 3:41 PM
To: user@cassandra.apache.org
Subject: Suggestion about syntax of CREATE COLUMN FAMILY

Currently, the syntax for creating column families is like this:
create column family Users
with comparator=UTF8Type
and default_validation_class=UTF8Type
and key_validation_class=UTF8Type;

It's not clear what "comparator" and "default_validation_class" refer to. Much 
clearer would be:
create column family Users
with column_name_comparator=UTF8Type
and column_value_validation_class=UTF8Type
and key_validation_class=UTF8Type;

BTW, instead of "column_name_comparator", I'd actually prefer 
"column_key_comparator" since it seems more accurate to call column names 
"column keys."

  Don


RE: Cassandra C client implementation

2011-12-14 Thread Don Smith
VIrgil apparently lets you access cassandra via a RESTful interface:

   http://code.google.com/a/apache-extras.org/p/virgil/ 

Depending on your performance needs and the maturity of virgil's code (I think 
it's alpha), that may work.

You could always fork a java process and pipe to it.

 Don

From: Vlad Paiu [vladp...@opensips.org]
Sent: Wednesday, December 14, 2011 8:33 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra C client implementation

Hello,

Thanks for your answer.
Unfortunately libcassandra is C++ , I'm looking for something written in ANSI C.

I've searched alot and my guess is glibc thrift is my only option, but I could 
not find even one example onto how to make a connection & some queries to 
Cassandra using glibc thrift.
Does anyone have experience/some examples for this ?

Regards,
Vlad


i...@iyyang.com wrote:

>BTW please use
>https://github.com/eyealike/libcassandra
>
>
>Best Regards,
>Yi "Steve" Yang
>~~~
>+1-401-441-5086
>+86-13910771510
>
>Sent via BlackBerry® from China Mobile
>
>-Original Message-
>From: i...@iyyang.com
>Date: Wed, 14 Dec 2011 15:52:56
>To: 
>Reply-To: i...@iyyang.com
>Subject: Re: Cassandra C client implementation
>
>Try libcassandra, but it doesn't support connection pooling
>
>--Original Message--
>From: Vlad Paiu
>To: user@cassandra.apache.org
>ReplyTo: user@cassandra.apache.org
>Subject: Cassandra C client implementation
>Sent: Dec 14, 2011 11:11 PM
>
>Hello,
>
>I am trying to integrate some Cassandra related ops ( insert, get, etc ) into 
>an application written entirelly in C, so C++ is not an option.
>
>Is there any C client library for cassandra ?
>
> I have also tried to generate thrift glibc code for Cassandra, but on 
> wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C.
>
>Can anybody suggest a C client library for Cassandra or provide some working 
>examples for Thrift in C ?
>
>Thanks and Regards,
>Vlad
>Best Regards,
>Yi "Steve" Yang
>~~~
>+1-401-441-5086
>+86-13910771510
>
>Sent via BlackBerry® from China Mobile


Suggestion about syntax of CREATE COLUMN FAMILY

2011-12-09 Thread Don Smith
Currently, the syntax for creating column families is like this:

create column family Users
with comparator=UTF8Type
and default_validation_class=UTF8Type
and key_validation_class=UTF8Type;

It's not clear what "comparator" and "default_validation_class" refer to. Much 
clearer would be:

create column family Users
with column_name_comparator=UTF8Type
and column_value_validation_class=UTF8Type
and key_validation_class=UTF8Type;

BTW, instead of "column_name_comparator", I'd actually prefer 
"column_key_comparator" since it seems more accurate to call column names 
"column keys."

  Don


re: User Survey

2011-11-29 Thread Don Smith

cli's "show keyspaces" command shows way too much information by default.

I think by default it should show just one line per keyspace.   A "-v" 
option could show more info.


What GUI alternatives are there to cli for browsing a cassandra ring?

Lots of people WILL use cli, so it should be spiffy.

 Thanks, Don


Re: Efficiency of hector's setRowCount (and setStartKey!)

2011-10-13 Thread Don Smith
It's actually setStartKey that's the important method call (in 
combination with setRowCount). So I should have been clearer.


The following code performs as expected, as far as returning the 
expected data in the expected order.  I believe that the use of 
IndexedSliceQuery's setStartKey will support efficient queries -- 
avoiding repulling the entire data set from cassandra. Correct?



void demoPaging() {
String lastKey = processPage("don","");  // get first 
batch, starting with "" (smallest key)
lastKey = processPage("don",lastKey);// get second 
batch starting with previous last key
lastKey = processPage("don",lastKey);// get third 
batch starting with previous last key

   //
}

// return last key processed, null when no records left
String processPage(String username, String startKey) {
String lastKey=null;
IndexedSlicesQuery 
indexedSlicesQuery =

HFactory.createIndexedSlicesQuery(keyspace, stringSerializer, 
stringSerializer, stringSerializer);

indexedSlicesQuery.addEqualsExpression("user", username);

indexedSlicesQuery.setColumnNames("source","ip");

indexedSlicesQuery.setColumnFamily(ourColumnFamilyName);

indexedSlicesQuery.setStartKey(startKey);   // 
<

indexedSlicesQuery.setRowCount(batchSize);
QueryResultString>> result =indexedSlicesQuery.execute();
OrderedRows rows 
= result.get();

for(Row row:rows ){
if (row==null) { continue; }
totalCount++;
String key = row.getKey();

if (!startKey.equals(key)) 
{lastKey=key;}

}
totalCount--;
return lastKey;
}






On 10/13/2011 09:15 AM, Patricio Echagüe wrote:
Hi Don. No it will not. IndexedSlicesQuery will read just the amount 
of rows specified by RowCount and will go to the DB to get the new 
page when needed.


SetRowCount is doing indexClause.setCount(rowCount);

On Mon, Oct 10, 2011 at 3:52 PM, Don Smith <mailto:dsm...@likewise.com>> wrote:


Hector's IndexedSlicesQuery has a setRowCount method that you can
use to page through the results, as described in
https://github.com/rantav/hector/wiki/User-Guide .

rangeSlicesQuery.setRowCount(1001);
 .
rangeSlicesQuery.setKeys(lastRow.getKey(),  "");

Is it efficient?  Specifically, suppose my query returns 100,000
results and I page through batches of 1000 at a time (making 100
executes of the query). Will it internally retrieve all the
results each time (but pass only the desired set of 1000 or so to
me)? Or will it optimize queries to avoid the duplication?  I
presume the latter. :)

Can IndexedSlicesQuery's setStartKey method be used for the same
effect?

  Thanks,  Don






Efficiency of hector's setRowCount

2011-10-10 Thread Don Smith
Hector's IndexedSlicesQuery has a setRowCount method that you can use to 
page through the results, as described in 
https://github.com/rantav/hector/wiki/User-Guide .


 rangeSlicesQuery.setRowCount(1001);
  .
 rangeSlicesQuery.setKeys(lastRow.getKey(),  "");

Is it efficient?  Specifically, suppose my query returns 100,000 results 
and I page through batches of 1000 at a time (making 100 executes of the 
query). Will it internally retrieve all the results each time (but pass 
only the desired set of 1000 or so to me)? Or will it optimize queries 
to avoid the duplication?  I presume the latter. :)


Can IndexedSlicesQuery's setStartKey method be used for the same effect?

   Thanks,  Don


Question about sharding of rows and atomicity

2011-10-05 Thread Don Smith
Does Cassandra shard the columns of a single row across multiple nodes 
so that to read the columns of the row it may need access to multiple 
nodes?   I'd say "no."   Will a read from a given node ever return 
partial results or is the write to a node of a row atomic?


 Thanks, Don