Re: are there any free Cassandra -> ElasticSearch connector / plugin ?

2016-10-13 Thread Brian O'Neill
I haven't used it yet, but
https://github.com/vroyer/elassandra <https://github.com/vroyer/elassandra>

-- 
Brian O'Neill
Principal Architect @ Monetate
m: 215.588.6024
bone...@monetate.com <mailto:bone...@monetate.com>


> On Oct 13, 2016, at 6:02 PM, Eric Ho  wrote:
> 
> I don't want to change my code to write into C* and then to ES.
> So, I'm looking for some sort of a sync tool that will sync my C* table into 
> ES and it should be smart enough to avoid duplicates or gaps.
> Is there such a tool / plugin ?
> I'm using stock apache Cassandra 3.7.
> I know that some premium Cassandra has ES builtin or integrated but I can't 
> afford premium right now...
> Thanks.
> 
> -eric ho
> 



Re: Support for ad-hoc query

2015-06-09 Thread Brian O'Neill
Cassandra isn¹t great at ad hoc queries.  Many of us have paired it with an
indexing engine like SOLR or Elastic Search.
(built-into the DSE solution)

As of late, I think there are a few of us exploring Spark SQL.  (which you
can then use via JDBC or REST)

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Srinivasa T N 
Reply-To:  
Date:  Tuesday, June 9, 2015 at 2:38 AM
To:  "user@cassandra.apache.org" 
Subject:  Support for ad-hoc query

Hi All,
   I have an web application running with my backend data stored in
cassandra.  Now I want to do some analysis on the data stored which requires
some ad-hoc queries fired on cassandra.  How can I do the same?

Regards,
Seenu.




Re: Spark SQL JDBC Server + DSE

2015-06-03 Thread Brian O'Neill

Kudos Ben.  We¹ve been tracking Zeppelin, and considered doing the same
thing.
You beat us to it.  Well done.

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Ben Bromhead 
Reply-To:  
Date:  Tuesday, June 2, 2015 at 5:05 PM
To:  
Subject:  Re: Spark SQL JDBC Server + DSE

If you want a web based notebook style approach (similar to ipython) check
out https://github.com/apache/incubator-zeppelin

And https://github.com/apache/incubator-zeppelin/pull/86

Bonus free pretty graphs!

On 1 June 2015 at 11:41, Sebastian Estevez 
wrote:
> Have you looked at job server?
> 
> https://github.com/spark-jobserver/spark-jobserver
> https://www.youtube.com/watch?v=8k9ToZ4m6os
> http://planetcassandra.org/blog/post/fast-spark-queries-on-in-memory-datasets/
> 
> All the best,
> 
> 
>  <http://www.datastax.com/>
> Sebastián Estévez
> Solutions Architect | 954 905 8615   |
> sebastian.este...@datastax.com
>  <https://www.linkedin.com/company/datastax>
> <https://www.facebook.com/datastax>   <https://twitter.com/datastax>
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
> 
>  <http://cassandrasummit-datastax.com/>
> 
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world¹s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the database
> technology and transactional backbone of choice for the worlds most innovative
> companies such as Netflix, Adobe, Intuit, and eBay.
> 
> On Mon, Jun 1, 2015 at 8:13 AM, Mohammed Guller 
> wrote:
>> Brian,
>> We haven¹t open sourced the REST server, but not  opposed to doing it. Just
>> need to carve out some time to clean up the code and carve it out from all
>> the other stuff that we do in that REST server.  Will try to do it in the
>> next few weeks. If you need it sooner, let me know.
>>  
>> I did consider the option of writing our own Spark SQL JDBC driver for C*,
>> but it is lower on the priority list right now.
>>  
>> 
>> Mohammed
>>  
>> 
>> From: Brian O'Neill [mailto:boneil...@gmail.com] On Behalf Of Brian O'Neill
>> Sent: Saturday, May 30, 2015 3:12 AM
>> 
>> 
>> To: user@cassandra.apache.org
>> Subject: Re: Spark SQL JDBC Server + DSE
>>  
>> 
>>  
>> 
>> Any chance you open-sourced, or could open-source the REST server? ;)
>> 
>>  
>> 
>> In thinking about itŠ
>> 
>> It doesn¹t feel like it would be that hard to write a Spark SQL JDBC driver
>> against Cassandra, akin to what they have for hive:
>> 
>> https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-t
>> hrift-jdbcodbc-server
>> 
>>  
>> 
>> I wouldn¹t mind collaborating on that, if you are headed in that direction.
>> 
>> (and then I could write the REST server on top of that)
>> 
>>  
>> 
>> LMK,
>> 
>>  
>> 
>> -brian
>> 
>>  
>> 
>> ---
>> Brian O'Neill 
>> Chief Technology Officer
>> Health Market Science, a LexisNexis Company
>> 215.588.6024   Mobile € @boneill42
>> <http://www.twitter.com/boneill42>
>>  
>> This information transmitted in this email message is for the intended
>> recipient only and may contain confidential and/or privileged material. If
>> you received this email in error and are not the intended recipient, or the
>> person responsible to deliver it to the intended recipient, please contact
>> the sender at the email above and delete this email and any attachments and
>> destroy any copies thereof. Any review, retransmission, dissemination,
>> copying or other use of, or taking any action in reliance upon, this
>> information by persons or entities other than the intended recipient is
>&

Re: Spark SQL JDBC Server + DSE

2015-05-30 Thread Brian O'Neill

Any chance you open-sourced, or could open-source the REST server? ;)

In thinking about itŠ
It doesn¹t feel like it would be that hard to write a Spark SQL JDBC driver
against Cassandra, akin to what they have for hive:
https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-
thrift-jdbcodbc-server

I wouldn¹t mind collaborating on that, if you are headed in that direction.
(and then I could write the REST server on top of that)

LMK,

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Mohammed Guller 
Reply-To:  
Date:  Friday, May 29, 2015 at 2:15 PM
To:  "user@cassandra.apache.org" 
Subject:  RE: Spark SQL JDBC Server + DSE

Brian,
I implemented a similar REST server last year and it works great. Now we
have a requirement to support JDBC connectivity in addition to the REST API.
We want to allow users to use tools like Tableau to connect to C* through
the Spark SQL JDBC/Thift server.
 

Mohammed
 

From: Brian O'Neill [mailto:boneil...@gmail.com] On Behalf Of Brian O'Neill
Sent: Thursday, May 28, 2015 6:16 PM
To: user@cassandra.apache.org
Subject: Re: Spark SQL JDBC Server + DSE
 

Mohammed,

 

This doesn¹t really answer your question, but I¹m working on a new REST
server that allows people to submit SQL queries over REST, which get
executed via Spark SQL.   Based on what I started here:

http://brianoneill.blogspot.com/2015/05/spark-sql-against-cassandra-example.
html

 

I assume you need JDBC connectivity specifically?

 

-brian

 

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>
 
This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 

 

From: Mohammed Guller 
Reply-To: 
Date: Thursday, May 28, 2015 at 8:26 PM
To: "user@cassandra.apache.org" 
Subject: RE: Spark SQL JDBC Server + DSE

 

Anybody out there using DSE + Spark SQL JDBC server?
 

Mohammed
 

From: Mohammed Guller [mailto:moham...@glassbeam.com]
Sent: Tuesday, May 26, 2015 6:17 PM
To: user@cassandra.apache.org
Subject: Spark SQL JDBC Server + DSE
 
Hi ­
As I understand, the Spark SQL Thrift/JDBC server cannot be used with the
open source C*. Only DSE supports  the Spark SQL JDBC server.
 
We would like to find out whether how many organizations are using this
combination. If you do use DSE + Spark SQL JDBC server, it would be great if
you could share your experience. For example, what kind of issues you have
run into? How is the performance? What reporting tools you are using?
 
Thank  you!
 
Mohammed 
 




Re: Spark SQL JDBC Server + DSE

2015-05-28 Thread Brian O'Neill
Mohammed,

This doesn¹t really answer your question, but I¹m working on a new REST
server that allows people to submit SQL queries over REST, which get
executed via Spark SQL.   Based on what I started here:
http://brianoneill.blogspot.com/2015/05/spark-sql-against-cassandra-example.
html

I assume you need JDBC connectivity specifically?

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Mohammed Guller 
Reply-To:  
Date:  Thursday, May 28, 2015 at 8:26 PM
To:  "user@cassandra.apache.org" 
Subject:  RE: Spark SQL JDBC Server + DSE

Anybody out there using DSE + Spark SQL JDBC server?
 

Mohammed
 

From: Mohammed Guller [mailto:moham...@glassbeam.com]
Sent: Tuesday, May 26, 2015 6:17 PM
To: user@cassandra.apache.org
Subject: Spark SQL JDBC Server + DSE
 
Hi ­
As I understand, the Spark SQL Thrift/JDBC server cannot be used with the
open source C*. Only DSE supports  the Spark SQL JDBC server.
 
We would like to find out whether how many organizations are using this
combination. If you do use DSE + Spark SQL JDBC server, it would be great if
you could share your experience. For example, what kind of issues you have
run into? How is the performance? What reporting tools you are using?
 
Thank  you!
 
Mohammed 
 




Re: cassandra and spark from cloudera distirbution

2015-04-22 Thread Brian O'Neill
Depends which veresion of Spark you are running on Cloudera.

Once you know that ‹ have a look at the compatibility chart here:
https://github.com/datastax/spark-cassandra-connector

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Serega Sheypak 
Reply-To:  
Date:  Wednesday, April 22, 2015 at 1:48 PM
To:  user 
Subject:  Re: cassandra and spark from cloudera distirbution

We already use it. Would like to use Spark from cloudera distribution.
Should it work?

2015-04-22 19:43 GMT+02:00 Jay Ken :
> There is a Enerprise Edition from Datastax; where they have Spark and
> Cassandra Integration.
> 
> http://www.datastax.com/what-we-offer/products-services/datastax-enterprise
> 
> Thanks,
> Jay
> 
> On Wed, Apr 22, 2015 at 6:41 AM, Serega Sheypak 
> wrote:
>> Hi, are Cassandra and Spark from Cloudera compatible?
>> Where can I find these compatilibity notes?
> 





Re: Adhoc querying in Cassandra?

2015-04-22 Thread Brian O'Neill
Again ‹ agreed.

They have different usage patterns (C* heavy writes, ES heavy read), I would
separate them.
SOLR should be sufficient.  I believe DSE is a tight integration between
SOLR and C*.

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Ali Akhtar 
Reply-To:  
Date:  Wednesday, April 22, 2015 at 8:10 AM
To:  
Subject:  Re: Adhoc querying in Cassandra?

I believe ElasticSearch has better support for scaling horizontally (by
adding nodes) than Solr does. Some benchmarks that I've looked at, also show
it as performing better under high load.

I probably wouldn't run them both on the same node, or you might see low
performance as they compete for resources.

What type of usage do you expect - mostly read, or mostly write?

On Wed, Apr 22, 2015 at 5:06 PM, Matthew Johnson 
wrote:
> Hi Ali, Brian,
>  
> Thanks for the suggestion ­ we have previously used Solr (SolrCloud for
> distribution) for a lot of other products, presumably this will do the same
> job as ElasticSearch? Or does ElasticSearch have specifically better
> integration with Cassandra or better support for aggregate queries?
>  
> Would it be an ok architecture to have a Cassandra node and a Solr/ES instance
> on each box, so they scale together? Or is it better to have separate servers
> for storage and search?
>  
> Cheers,
> Matt
>  
> 
> From: Brian O'Neill [mailto:boneil...@gmail.com] On Behalf Of Brian O'Neill
> Sent: 22 April 2015 12:56
> To: user@cassandra.apache.org
> Subject: Re: Adhoc querying in Cassandra?
>  
> 
>  
> 
> +1, I think many organizations (including ours) pair Elastic Search with
> Cassandra.
> 
> Use Cassandra as your system of record, then index the data with ES.
> 
>  
> 
> -brian
> 
>  
> 
> ---
> Brian O'Neill 
> Chief Technology Officer
> Health Market Science, a LexisNexis Company
> 215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>
>  
> This information transmitted in this email message is for the intended
> recipient only and may contain confidential and/or privileged material. If you
> received this email in error and are not the intended recipient, or the person
> responsible to deliver it to the intended recipient, please contact the sender
> at the email above and delete this email and any attachments and destroy any
> copies thereof. Any review, retransmission, dissemination, copying or other
> use of, or taking any action in reliance upon, this information by persons or
> entities other than the intended recipient is strictly prohibited.
>  
> 
>  
> 
> From: Ali Akhtar 
> Reply-To: 
> Date: Wednesday, April 22, 2015 at 7:52 AM
> To: 
> Subject: Re: Adhoc querying in Cassandra?
> 
>  
> You might find it better to use elasticsearch for your aggregate queries and
> analytics. Cassandra is more of just a data store.
> 
> On Apr 22, 2015 4:42 PM, "Matthew Johnson"  wrote:
> 
> Hi all,
>  
> Currently we are setting up a ³big² data cluster, but we are only going to
> have a couple of servers to start with but we need to be able to scale out
> quickly when usage ramps up. Previously we have used Hadoop/HBase for our big
> data cluster, but since we are starting this one on only two nodes I think
> Cassandra will be a much better fit, as Hadoop and HBase really need at least
> 3 to achieve any sort of resilience (zookeeper quorum etc).
>  
> My question is this:
>  
> I have used Apache Phoenix as a JDBC layer on top of HBase, which allows me to
> issue ad-hoc SQL-style queries. (eg count the number of times users have
> clicked on a certain button after clicking a different button in the last 3
> weeks etc). My understanding is that CQL does not support this style of adhoc
> aggregate querying out of the box. Is there a recommended way to do count,
> sum, average etc without writing client code (in my case Java) every time I
> want to run one? I have been looking at projects like Drill, Spark etc that
> could potentially sit on top of Cassandra but without actually setting
> everything up and testing them it is difficult to figure out what they would
> give us.
>  
> Does anyone else interactively issue adhoc aggregate queries to Cassandra, and
> if so, what stack do you use?
>  
> Thanks!
> Matt
>  





Re: Adhoc querying in Cassandra?

2015-04-22 Thread Brian O'Neill

+1, I think many organizations (including ours) pair Elastic Search with
Cassandra.
Use Cassandra as your system of record, then index the data with ES.

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Ali Akhtar 
Reply-To:  
Date:  Wednesday, April 22, 2015 at 7:52 AM
To:  
Subject:  Re: Adhoc querying in Cassandra?


You might find it better to use elasticsearch for your aggregate queries and
analytics. Cassandra is more of just a data store.

On Apr 22, 2015 4:42 PM, "Matthew Johnson"  wrote:
> Hi all,
>  
> Currently we are setting up a ³big² data cluster, but we are only going to
> have a couple of servers to start with but we need to be able to scale out
> quickly when usage ramps up. Previously we have used Hadoop/HBase for our big
> data cluster, but since we are starting this one on only two nodes I think
> Cassandra will be a much better fit, as Hadoop and HBase really need at least
> 3 to achieve any sort of resilience (zookeeper quorum etc).
>  
> My question is this:
>  
> I have used Apache Phoenix as a JDBC layer on top of HBase, which allows me to
> issue ad-hoc SQL-style queries. (eg count the number of times users have
> clicked on a certain button after clicking a different button in the last 3
> weeks etc). My understanding is that CQL does not support this style of adhoc
> aggregate querying out of the box. Is there a recommended way to do count,
> sum, average etc without writing client code (in my case Java) every time I
> want to run one? I have been looking at projects like Drill, Spark etc that
> could potentially sit on top of Cassandra but without actually setting
> everything up and testing them it is difficult to figure out what they would
> give us.
>  
> Does anyone else interactively issue adhoc aggregate queries to Cassandra, and
> if so, what stack do you use?
>  
> Thanks!
> Matt
>  




Re: Cassandra - Storm

2015-04-03 Thread Brian O'Neill

I¹d recommend using Storm¹s State abstraction.

Check out:
https://github.com/hmsonline/storm-cassandra-cql

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Vanessa Gligor 
Reply-To:  
Date:  Friday, April 3, 2015 at 1:13 AM
To:  
Subject:  Cassandra - Storm

Hi all,

Did anybody use Cassandra for the tuple storage in Storm? I have this
scenario: I have a spout (getting messages from RabbitMQ) and I want to save
all these messages in Cassandra using a bolt. What is the best choice
regarding the connection to the DB? I have read about Hector API. I used it,
but for now I wasn't able to add a new row in a column family.

Any help would be appreciated.

Regards,
Vanessa.




Re: Frequent timeout issues

2015-04-01 Thread Brian O'Neill

Are you using the storm-cassandra-cql driver?
(https://github.com/hmsonline/storm-cassandra-cql)

If so, what version?
Batching or no batching?

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Amlan Roy 
Reply-To:  
Date:  Wednesday, April 1, 2015 at 11:37 AM
To:  
Subject:  Re: Frequent timeout issues

Replication factor is 2.
CREATE KEYSPACE ct_keyspace WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC1': '2'
};

Inserts are happening from Storm using java driver. Using prepared statement
without batch.


On 01-Apr-2015, at 8:42 pm, Brice Dutheil  wrote:

> And the keyspace? What is the replication factor.
> 
> Also how are the inserts done?
> 
> On Wednesday, April 1, 2015, Amlan Roy  wrote:
>> Write consistency level is ONE.
>> 
>> This is the describe output for one of the tables.
>> 
>> CREATE TABLE event_data (
>>   event text,
>>   week text,
>>   bucket int,
>>   date timestamp,
>>   unique text,
>>   adt int,
>>   age list,
>>   arrival list,
>>   bank text,
>>   bf double,
>>   cabin text,
>>   card text,
>>   carrier list,
>>   cb double,
>>   channel text,
>>   chd int,
>>   company text,
>>   cookie text,
>>   coupon list,
>>   depart list,
>>   dest list,
>>   device text,
>>   dis double,
>>   domain text,
>>   duration bigint,
>>   emi int,
>>   expressway boolean,
>>   flight list,
>>   freq_flyer list,
>>   host text,
>>   host_ip text,
>>   inf int,
>>   instance text,
>>   insurance text,
>>   intl boolean,
>>   itinerary text,
>>   journey text,
>>   meal_pref list,
>>   mkp double,
>>   name list,
>>   origin list,
>>   pax_type list,
>>   payment text,
>>   pref_carrier list,
>>   referrer text,
>>   result_cnt int,
>>   search text,
>>   src text,
>>   src_ip text,
>>   stops int,
>>   supplier list,
>>   tags list,
>>   total double,
>>   trip text,
>>   user text,
>>   user_agent text,
>>   PRIMARY KEY ((event, week, bucket), date, unique)
>> ) WITH CLUSTERING ORDER BY (date DESC, unique ASC) AND
>>   bloom_filter_fp_chance=0.01 AND
>>   caching='KEYS_ONLY' AND
>>   comment='' AND
>>   dclocal_read_repair_chance=0.10 AND
>>   gc_grace_seconds=864000 AND
>>   index_interval=128 AND
>>   read_repair_chance=0.00 AND
>>   replicate_on_write='true' AND
>>   populate_io_cache_on_flush='false' AND
>>   default_time_to_live=0 AND
>>   speculative_retry='99.0PERCENTILE' AND
>>   memtable_flush_period_in_ms=0 AND
>>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>>   compression={'sstable_compression': 'LZ4Compressor¹};
>> 
>> 
>> On 01-Apr-2015, at 8:00 pm, Eric R Medley >  > wrote:
>> 
>>> Also, can you provide the table details and the consistency level you are
>>> using?
>>> 
>>> Regards,
>>> 
>>> Eric R Medley
>>> 
>>>> On Apr 1, 2015, at 9:13 AM, Eric R Medley >>>  > wrote:
>>>> 
>>>> Amlan,
>>>> 
>>>> Can you provide information on how much data is being written? Are any of
>>>> the columns really large? Are any writes succeeding or are all timing out?
>>>> 
>>>> Regards,
>>>> 
>>>> Eric R Medley
>>>> 
>>>>> On Apr 1, 2015, at 9:03 AM, Amlan Roy >>>>  > wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am new to Cassandra. I have setup a cluster with Cassandra 2.0.13. I am
>>>>> writing the same data in HBase and Cassandra and find that the writes are
>>>>> extremely slow in Cassandra and frequently seeing exception ³Cassandra
>>>>> timeout during write query at consistency ONE". The cluster size for both
>>>>> HBase and Cassandra are same.
>>>>> 
>>>>> Looks like something is wrong with my cluster setup. What can be the
>>>>> possible issue? Data and commit logs are written into two separate disks.
>>>>> 
>>>>> Regards,
>>>>> Amlan
>>>> 
>>> 
>> 
> 
> 
> -- 
> Brice





Re: cassandra source code

2015-03-24 Thread Brian O'Neill
FWIW ‹ I just went through this, and posted the process I used to get up and
running:
http://brianoneill.blogspot.com/2015/03/getting-started-with-cassandra.html

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Divya Divs 
Reply-To:  
Date:  Tuesday, March 24, 2015 at 1:29 AM
To:  , Jason Wee , Eric
Stevens 
Subject:  cassandra source code

Hi
I'm Divya, I'm trying to run the source code of cassandra in eclipse. I'm
taking the source code from github. I'm using windows 64-bit, I'm following
the instructions from this website.
http://runningcassandraineclipse.blogspot.in/. In the github
cassandra-trunk, conf/log4j-server.properies directories and
org.apache.cassandra.thrift.CassandraDaemon, main class is not there. please
give me a document to run the source code of cassandra. Please kindly help
me to proceed. Please reply me as soon as possible.
   Thanking you







Re: IF NOT EXISTS on UPDATE statements?

2014-11-18 Thread Brian O'Neill
Exactly.  Perfect.  Will do.
Thanks Robert.

-brian

---
Brian O'Neill
Chief Technology Officer


Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Robert Stupp 
Reply-To:  
Date:  Tuesday, November 18, 2014 at 2:26 PM
To:  
Subject:  Re: IF NOT EXISTS on UPDATE statements?

> 
> For (2), we would love to see:
> UPSERT value=new_value where (not exists || value=read_value)
> 

That would be something like "UPDATE Š IF column=value OR NOT EXISTS³.

Took at the C* source and that feels like a LHF (for 3.0) so I opened
https://issues.apache.org/jira/browse/CASSANDRA-8335 for that.
Fell free to comment on that :)





Re: IF NOT EXISTS on UPDATE statements?

2014-11-18 Thread Brian O'Neill

FWIW ‹ we have the exact same need.
And we have been struggling with the differences in CQL between UPDATE and
INSERT.

Our use case:

We do in-memory dimensional aggregations that we want to write to C* using
LWT.  
(so, it¹s a low-volume of writes, because we are doing aggregations across
time windows)

On ³commit², we:
1) Read current value for time window
(which returns null if not exists for time window, or current_value if
exists)
2) Then we need to UPSERT new_value for window
where new_value = current_value + agg_value
but only if no other node has updated the value

For (2), we would love to see:
UPSERT value=new_value where (not exists || value=read_value)

(ignoring some intricacies)

-brian

---
Brian O'Neill
Chief Technology Officer


Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Robert Stupp 
Reply-To:  
Date:  Tuesday, November 18, 2014 at 12:35 PM
To:  
Subject:  Re: IF NOT EXISTS on UPDATE statements?


>> > There is no way to mimic IF NOT EXISTS on UPDATE and it's not a bug. INSERT
>> and UPDATE are not totally orthogonal
> in CQL and you should use INSERT for actual insertion and UPDATE for updates
> (granted, the database will not reject
> our query if you break this rule but it's nonetheless the way it's intended to
> be used).
> 
> OK.. (and not trying to be difficult here).  We can¹t have it both ways. One
> of these use cases is a bugŠ
> 
> You¹re essentially saying ³don¹t do that, but yeah, you can do it.. ³
> 
> Either UPDATE should support IF NOT EXISTS or UPDATE should not perform
> INSERTs.
> 

UPDATE performs like INSERT in the meaning of an UPSERT - means: INSERT
allows to write the same primary key again and UPDATE allows to write data
to a non-existing primary key (effectively inserting data).
(That¹s what NoSQL databases do.)
Take that as an advantage / feature not present on other DBs.

"UPDATE Š IF EXISTS³ and "INSERT Š IF NOT EXISTS³ are *expensive* operations
(require serial-consistency/LWT which requires some more network
roundtrips).
"IF [NOT] EXISTS³ is basically some kind of "convenience³.
And please take into account that UPDATE also has "IF column = value
³ condition (using LWT).





Re: [ANN] SparkSQL support for Cassandra with Calliope

2014-10-03 Thread Brian O'Neill
Well done Rohit. (and crew)

-brian

---
Brian O'Neill
Chief Technology Officer


Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Rohit Rai 
Reply-To:  
Date:  Friday, October 3, 2014 at 2:16 PM
To:  
Subject:  [ANN] SparkSQL support for Cassandra with Calliope

Hi All,

An year ago we started this journey and laid the path for Spark + Cassandra
stack. We established the ground work and direction for Spark Cassandra
connectors and we have been happy seeing the results.

With Spark 1.1.0 and SparkSQL release, we its time to take Calliope
<http://tuplejump.github.io/calliope/>  to the logical next level also
paving the way for much more advanced functionality to come.

Yesterday we released Calliope 1.1.0 Community Tech Preview
<https://twitter.com/tuplejump/status/517739186124627968> , which brings
Native SparkSQL support for Cassandra. The further details are available
here <http://tuplejump.github.io/calliope/tech-preview.html> .

This release showcases in core spark-sql
<http://tuplejump.github.io/calliope/start-with-sql.html> , hiveql
<http://tuplejump.github.io/calliope/start-with-hive.html>  and
HiveThriftServer <http://tuplejump.github.io/calliope/calliope-server.html>
support. 

I differentiate it as "native" spark-sql integration as it doesn't rely on
Cassandra's hive connectors (like Cash or DSE) and saves a level of
indirection through Hive.

It also allows us to harness Spark's analyzer and optimizer in future to
work out the best execution plan targeting a balance between Cassandra's
querying restrictions and Sparks in memory processing.

As far as we know this it the first and only third party data store
connector for SparkSQL. This is a CTP release as it relies on Spark
internals that still don't have/stabilized a developer API and we will work
with the Spark Community in documenting the requirements and working towards
a standard and stable API for third party data store integration.

On another note, we no longer require you to signup to access the early
access code repository.

Inviting all of you try it and give us your valuable feedback.

Regards,

Rohit
Founder & CEO, Tuplejump, Inc.

www.tuplejump.com <http://www.tuplejump.com>
The Data Engineering Platform




Re: Cassandra blob storage

2014-03-18 Thread Brian O'Neill
You may want to look at:
https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store

-brian

---
Brian O'Neill
Chief Technology Officer


Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  prem yadav 
Reply-To:  
Date:  Tuesday, March 18, 2014 at 1:41 PM
To:  
Subject:  Cassandra blob storage

Hi,
I have been spending some time looking into whether large files(>100mb) can
be stores in Cassandra. As per Cassandra faq:

"Currently Cassandra isn't optimized specifically for large file or BLOB
storage. However, files of around 64Mb and smaller can be easily stored in
the database without splitting them into smaller chunks. This is primarily
due to the fact that Cassandra's public API is based on Thrift, which offers
no streaming abilities; any value written or fetched has to fit in to
memory."

Does the above statement still hold? Thrift supports framed data transport,
does that change the above statement. If not, why does casssandra not adopt
the Thrift framed data transfer support?

Thanks





Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Brian O'Neill

just when you thought the thread diedŠ


First, let me say we are *WAY* off topic.  But that is a good thing.
I love this community because there are a ton of passionate, smart people.
(often with differing perspectives ;)

RE: Reporting against C* (@Peter Lin)
We¹ve had the same experience.  Pig + Hadoop is painful.  We are
experimenting with Spark/Shark, operating directly against the data.
http://brianoneill.blogspot.com/2014/03/spark-on-cassandra-w-calliope.html

The Shark layer gives you SQL and caching capabilities that make it easy to
use and fast (for smaller data sets).  In front of this, we are going to add
dimensional aggregations so we can operate at larger scales.  (then the Hive
reports will run against the aggregations)

RE: REST Server (@Russel Bradbury)
We had moderate success with Virgil, which was a REST server built directly
on Thrift.  We built it directly on top of Thrift, so one day it could be
easily embedded in the C* server itself.   It could be deployed separately,
or run an embedded C*.  More often than not, we ended up running it
separately to separate the layers.  (just like Titan and Rexster)  I¹ve
started on a rewrite of Virgil called Memnon that rides on top of CQL. (I¹d
love some help)
https://github.com/boneill42/memnon

RE: CQL vs. Thrift
We¹ve hitched our wagons to CQL.  CQL != Relational.
We¹ve had success translating our ³native² schemas into CQL, including all
the NoSQL goodness of wide-rows, etc.  You just need a good understanding of
how things translate into storage and underlying CFs.  If anything, I think
we could add some DESCRIBE information, which would help users with this,
along the lines of:
(https://issues.apache.org/jira/browse/CASSANDRA-6676)

CQL does open up the *opportunity* for users to articulate more complex
queries using more familiar syntax.  (including future things such as joins,
grouping, etc.)   To me, that is exciting, and again ‹ one of the reasons we
are leaning on it.

my two cents,
brian

---
Brian O'Neill
Chief Technology Officer


Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Peter Lin 
Reply-To:  
Date:  Wednesday, March 12, 2014 at 8:44 AM
To:  "user@cassandra.apache.org" 
Subject:  Re: Proposal: freeze Thrift starting with 2.1.0


yes, I was looking at intravert last nite.

For the kinds of reports my customers ask us to do, joins and subqueries are
important. Having tried to do a simple join in PIG, the level of pain is
high. I'm a masochist, so I don't mind breaking a simple join into multiple
MR tasks, though I do find myself asking "why the hell does it need to be so
painful in PIG?" Many of my friends say "what is this crap!" or "this is
better than writing sql queries to run reports?"

Plus, using ETL techniques to extract summaries only works for cases where
the data is small enough. Once it gets beyond a certain size, it's not
practical, which means we're back to crappy reporting languages that make
life painful. Lots of big healthcare companies have thousands of MOLAP cubes
on dozens of mainframes. The old OLTP -> DW/OLAP creates it's own set of
management headaches.

being able to report directly on the raw data avoids many of the issues, but
that's my bias perspective.




On Wed, Mar 12, 2014 at 8:15 AM, DuyHai Doan  wrote:
> "I would love to see Cassandra get to the point where users can define complex
> queries with subqueries, like, group by and joins" --> Did you have a look at
> Intravert ? I think it does union & intersection on server side for you. Not
> sure about join though..
> 
> 
> On Wed, Mar 12, 2014 at 12:44 PM, Peter Lin  wrote:
>> 
>> Hi Ed,
>> 
>> I agree Solr is deeply integrated into DSE. I've looked at Solandra in the
>> past and studied the code.
>> 
>> My understanding is DSE uses Cassandra for storage and the user has both API
>> available. I do think it can be integrated further to make moderate to
>> complex queries easier and probably faster. That's why we built our own
>> JPA-like object query API. I would love to see Cassandra get to the point

[Blog] : Storm and Cassandra : A Three Year Retrospective

2014-02-13 Thread Brian O'Neill

A community member asked for a blog post on Storm + Cassandra.

FWIW, here was our journey.
http://brianoneill.blogspot.com/2014/02/storm-and-cassandra-three-year.html

-brian

---
Brian O'Neill
Chief Technology Officer


Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 




Re: CQL list command

2014-02-07 Thread Brian O'Neill

+1, agreed.  I do the same thing.

If cli is going away, we¹ll need this ability in cqlsh.

I created a JIRA issue for it:
https://issues.apache.org/jira/browse/CASSANDRA-6676


We¹ll see what the crew come back with.

-brian

---
Brian O'Neill
Chief Technology Officer

Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 2/7/14, 2:33 AM, "Ben Hood" <0x6e6...@gmail.com> wrote:

>On Thu, Feb 6, 2014 at 9:01 PM, Andrew Cobley 
>wrote:
>> I often use the CLI command LIST for debugging or when teaching
>>students showing them what's going on under the hood of CQL.  I see that
>>CLI swill be removed in Cassandra 3 and we will lose this ability.  It
>>would be nice if CQL retained it, or something like it for debugging and
>>etching purposes.
>
>I agree. I use LIST every now and then to verify the storage layout of
>partitioning and cluster columns. What would be cool is to do
>something like:
>
>cqlsh:y> CREATE TABLE x (
>  ... a int,
>  ... b int,
>  ... c int,
>  ... PRIMARY KEY (a,b)
>  ... );
>cqlsh:y> insert into x (a,b,c) values (1,1,1);
>cqlsh:y> insert into x (a,b,c) values (2,1,1);
>cqlsh:y> insert into x (a,b,c) values (2,2,1);
>cqlsh:y> select * from x;
> a | b | c
>---+---+---
> 1 | 1 | 1
> 2 | 1 | 1
> 2 | 2 | 1
>
>(3 rows)
>
>cqlsh:y> select * from x show storage; // requires monospace font
>
>   +---+
>+---+  |b:1|
>|a:1| +--> |---|
>+---+  |c:1|
>   +---+
>
>   +---+---+
>+---+  |b:1|b:2|
>|a:2| +--> |---|---|
>+---+  |c:1|c:2|
>   +---+---+
>
>(2 rows)




Re: Dimensional SUM, COUNT, & DISTINCT in C* (replacing Acunu)

2013-12-18 Thread Brian O'Neill

Thanks for the pointer Alain.

At a quick glance, it looks like people are looking for query time
filtering/aggregation, which will suffice for small data sets.  Hopefully we
might be able to extend that to perform pre-computations as well. (which
would support much larger data sets / volumes)

I¹ll continue the discussion on the issue.

thanks again,
brian


---
Brian O'Neill
Chief Architect
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Alain RODRIGUEZ 
Reply-To:  
Date:  Wednesday, December 18, 2013 at 5:13 AM
To:  
Cc:  "d...@cassandra.apache.org" 
Subject:  Re: Dimensional SUM, COUNT, & DISTINCT in C* (replacing Acunu)

Hi, this would indeed be much appreciated by a lot of people.

There is this issue, existing about this subject

 https://issues.apache.org/jira/browse/CASSANDRA-4914

Maybe could you help commiters there.

Hope this will be usefull to you.

Please let us know when you find a way to do these operations.

Cheers.


2013/12/18 Brian O'Neill 
> We are seeking to replace Acunu in our technology stack / platform.  It is the
> only component in our stack that is not open source.
> 
> In preparation, over the last few weeks I¹ve migrated Virgil to CQL.   The
> vision is that Virgil could receive a REST request to upsert/delete data
> (hierarchical JSON to support collections).  Virgil would lookup the
> dimensions/aggregations for that table, add the key to the pertinent
> dimensional tables (e.g. DISTINCT), incorporate values into aggregations (e.g.
> SUMs) and increment/decrement relevant counters (COUNT).  (using additional
> CF¹s)
> 
> This seems straight-forward, but appears to require a read-before-write.
> (e.g. read the current value of a SUM, incorporate the new value, then use the
> lightweight transactions of C* 2.0 to conditionally update the value.)
> 
> Before I go down this path, I was wondering if anyone is designing/working on
> the same, perhaps at a lower level?  (CQL?)
> 
> Is there any intent to support aggregations/filters (COUNT, SUM, DISTINCT,
> etc) at the CQL level?  If so, is there a preliminary design?
> 
> I can see a lower-level approach, which would leverage the commit logs (and
> mem/sstables) and perform the aggregation during read-operations (and
> flush/compaction).
> 
> thoughts?  i'm open to all ideas.
> 
> -brian
> -- 
> Brian ONeill
> Chief Architect, Health Market Science (http://healthmarketscience.com)
> mobile:215.588.6024 
> blog: http://brianoneill.blogspot.com/
> twitter: @boneill42





Dimensional SUM, COUNT, & DISTINCT in C* (replacing Acunu)

2013-12-17 Thread Brian O'Neill
We are seeking to replace Acunu in our technology stack / platform.  It is
the only component in our stack that is not open source.

In preparation, over the last few weeks I’ve migrated Virgil to CQL.   The
vision is that Virgil could receive a REST request to upsert/delete data
(hierarchical JSON to support collections).  Virgil would lookup the
dimensions/aggregations for that table, add the key to the pertinent
dimensional tables (e.g. DISTINCT), incorporate values into aggregations
(e.g. SUMs) and increment/decrement relevant counters (COUNT).  (using
additional CF’s)

This seems straight-forward, but appears to require a read-before-write.
 (e.g. read the current value of a SUM, incorporate the new value, then use
the lightweight transactions of C* 2.0 to conditionally update the value.)

Before I go down this path, I was wondering if anyone is designing/working
on the same, perhaps at a lower level?  (CQL?)

Is there any intent to support aggregations/filters (COUNT, SUM, DISTINCT,
etc) at the CQL level?  If so, is there a preliminary design?

I can see a lower-level approach, which would leverage the commit logs (and
mem/sstables) and perform the aggregation during read-operations (and
flush/compaction).

thoughts?  i'm open to all ideas.

-brian
-- 
Brian ONeill
Chief Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: Drop keyspace via CQL hanging on master/trunk.

2013-12-10 Thread Brian O'Neill

Great.  Thanks Aaron.

FWIW, I am/was porting Virgil over CQL. 

I should be able to release a new REST API for C* (using CQL) shortly.

-brian

---
Brian O'Neill
Chief Architect
Health Market Science
The Science of Better Results
2700 Horizon Drive • King of Prussia, PA • 19406
M: 215.588.6024 • @boneill42  •  healthmarketscience.com

This information transmitted in this email message is for the intended 
recipient only and may contain confidential and/or privileged material. If you 
received this email in error and are not the intended recipient, or the person 
responsible to deliver it to the intended recipient, please contact the sender 
at the email above and delete this email and any attachments and destroy any 
copies thereof. Any review, retransmission, dissemination, copying or other use 
of, or taking any action in reliance upon, this information by persons or 
entities other than the intended recipient is strictly prohibited.

On Dec 10, 2013, at 1:51 PM, Aaron Morton  wrote:

> Looks like a bug, will try to fix today 
> https://issues.apache.org/jira/browse/CASSANDRA-6472
> 
> Cheers
> 
> -
> Aaron Morton
> New Zealand
> @aaronmorton
> 
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
> On 6/12/2013, at 10:25 am, Brian O'Neill  wrote:
> 
>> 
>> I removed the data directory just to make sure I had a clean environment. 
>> (eliminating the possibility of corrupt keyspaces/files causing problems)
>> 
>> -brian
>> 
>> ---
>> Brian O'Neill
>> Chief Architect
>> Health Market Science
>> The Science of Better Results
>> 2700 Horizon Drive • King of Prussia, PA • 19406
>> M: 215.588.6024 • @boneill42  •  
>> healthmarketscience.com
>> 
>> This information transmitted in this email message is for the intended 
>> recipient only and may contain confidential and/or privileged material. If 
>> you received this email in error and are not the intended recipient, or the 
>> person responsible to deliver it to the intended recipient, please contact 
>> the sender at the email above and delete this email and any attachments and 
>> destroy any copies thereof. Any review, retransmission, dissemination, 
>> copying or other use of, or taking any action in reliance upon, this 
>> information by persons or entities other than the intended recipient is 
>> strictly prohibited.
>>  
>> 
>> 
>> From: Jason Wee 
>> Reply-To: 
>> Date: Thursday, December 5, 2013 at 4:03 PM
>> To: 
>> Subject: Re: Drop keyspace via CQL hanging on master/trunk.
>> 
>> Hey Brian, just out of curiosity, why would you remove cassandra data 
>> directory entirely?
>> 
>> /Jason
>> 
>> 
>> On Fri, Dec 6, 2013 at 2:38 AM, Brian O'Neill  wrote:
>>> When running Cassandra from trunk/master, I see a drop keyspace command 
>>> hang at the CQL prompt.
>>> 
>>> To reproduce:
>>> 1) Removed my cassandra data directory entirely
>>> 2) Fired up cqlsh, and executed the following CQL commands in succession:
>>> 
>>> bone@zen:~/git/boneill42/cassandra-> bin/cqlsh
>>> Connected to Test Cluster at localhost:9160.
>>> [cqlsh 4.1.0 | Cassandra 2.1-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 
>>> 19.38.0]
>>> Use HELP for help.
>>> cqlsh> describe keyspaces;
>>> 
>>> system  system_traces
>>> 
>>> cqlsh> create keyspace test_keyspace with replication =3D {'class':'SimpleS=
>>> trategy', 'replication_factor':'1'};
>>> cqlsh> describe keyspaces;
>>> 
>>> system  test_keyspace  system_traces
>>> 
>>> cqlsh> drop keyspace test_keyspace;
>>> 
>>> 
>>> 
>>> thoughts?  user error? worth filing an issue?
>>> One other note — this happens using the CQL java driver as well.
>>> 
>>> -brian
>>> 
>>> ---
>>> Brian O'Neill
>>> Chief Architect
>>> Health Market Science
>>> The Science of Better Results
>>> 2700 Horizon Drive • King of Prussia, PA • 19406
>>> M: 215.588.6024 • @boneill42  •  
>>> healthmarketscience.com
>>> 
>>> This information transmitted in this email message is for the intended 
>>> recipient only and may contain confidential and/or privileged material. If 
>>> you received this email in error and are not the intended recipient, or the 
>>> person responsible to deliver it to the intended recipient, please contact 
>>> the sender at the email above and delete this email and any attachments and 
>>> destroy any copies thereof. Any review, retransmission, dissemination, 
>>> copying or other use of, or taking any action in reliance upon, this 
>>> information by persons or entities other than the intended recipient is 
>>> strictly prohibited.
>>>  
>>> 
>> 
> 



Re: Drop keyspace via CQL hanging on master/trunk.

2013-12-05 Thread Brian O'Neill

I removed the data directory just to make sure I had a clean environment.
(eliminating the possibility of corrupt keyspaces/files causing problems)

-brian

---
Brian O'Neill
Chief Architect
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Jason Wee 
Reply-To:  
Date:  Thursday, December 5, 2013 at 4:03 PM
To:  
Subject:  Re: Drop keyspace via CQL hanging on master/trunk.

Hey Brian, just out of curiosity, why would you remove cassandra data
directory entirely?

/Jason


On Fri, Dec 6, 2013 at 2:38 AM, Brian O'Neill  wrote:
> When running Cassandra from trunk/master, I see a drop keyspace command hang
> at the CQL prompt.
> 
> To reproduce:
> 1) Removed my cassandra data directory entirely
> 2) Fired up cqlsh, and executed the following CQL commands in succession:
> 
> bone@zen:~/git/boneill42/cassandra-> bin/cqlsh
> Connected to Test Cluster at localhost:9160.
> [cqlsh 4.1.0 | Cassandra 2.1-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol
> 19.38.0]
> Use HELP for help.
> cqlsh> describe keyspaces;
> 
> system  system_traces
> 
> cqlsh> create keyspace test_keyspace with replication =3D {'class':'SimpleS=
> trategy', 'replication_factor':'1'};
> cqlsh> describe keyspaces;
> 
> system  test_keyspace  system_traces
> 
> cqlsh> drop keyspace test_keyspace;
> 
> 
> 
> thoughts?  user error? worth filing an issue?
> One other note ‹ this happens using the CQL java driver as well.
> 
> -brian
> 
> ---
> Brian O'Neill
> Chief Architect
> Health Market Science
> The Science of Better Results
> 2700 Horizon Drive € King of Prussia, PA € 19406
> M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
> healthmarketscience.com
> 
> 
> This information transmitted in this email message is for the intended
> recipient only and may contain confidential and/or privileged material. If you
> received this email in error and are not the intended recipient, or the person
> responsible to deliver it to the intended recipient, please contact the sender
> at the email above and delete this email and any attachments and destroy any
> copies thereof. Any review, retransmission, dissemination, copying or other
> use of, or taking any action in reliance upon, this information by persons or
> entities other than the intended recipient is strictly prohibited.
>  





Drop keyspace via CQL hanging on master/trunk.

2013-12-05 Thread Brian O'Neill
When running Cassandra from trunk/master, I see a drop keyspace command hang
at the CQL prompt.

To reproduce:
1) Removed my cassandra data directory entirely
2) Fired up cqlsh, and executed the following CQL commands in succession:

bone@zen:~/git/boneill42/cassandra-> bin/cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.0 | Cassandra 2.1-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol
19.38.0]
Use HELP for help.
cqlsh> describe keyspaces;

system  system_traces

cqlsh> create keyspace test_keyspace with replication =3D {'class':'SimpleS=
trategy', 'replication_factor':'1'};
cqlsh> describe keyspaces;

system  test_keyspace  system_traces

cqlsh> drop keyspace test_keyspace;



thoughts?  user error? worth filing an issue?
One other note ‹ this happens using the CQL java driver as well.

-brian

---
Brian O'Neill
Chief Architect
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 




Re: Main method not found in class org.apache.cassandra.service.CassandraDaemon

2013-07-17 Thread Brian O'Neill
Vivek,

You could try echoing the CLASSPATH to double check.  Drop an echo into the
launch_service function in the cassandra shell script.  (~line 121)

Let us know the output.

-brian

---
Brian O'Neill
Chief Architect
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Vivek Mishra 
Reply-To:  
Date:  Wednesday, July 17, 2013 10:24 AM
To:  
Subject:  Re: Main method not found in class
org.apache.cassandra.service.CassandraDaemon

Hi Brian,
Thanks for your response.
I think i did change CASSANDRA_HOME to point to new directory.

-Vivek


On Wed, Jul 17, 2013 at 7:03 PM, Brian O'Neill 
wrote:
> Vivek,
> 
> The location of CassandraDaemon changed between versions.  (from
> org.apache.cassandra.thrift to org.apache.cassandra.service)
> 
> It is likely that the start scripts are picking up the old version on the
> classpath, which results in the main method not being found.
> 
> Do you have CASSANDRA_HOME set?  I believe the start scripts will use that.
> Perhaps you have that set and pointed to the older 1.1.X version?
> 
> -brian
> 
> 
> On Wed, Jul 17, 2013 at 8:31 AM, Vivek Mishra  wrote:
>> Finally,
>> i have to delete all rpm installed files to get this working, folders are:
>> /usr/share/cassandra
>> /etc/alternatives/cassandra
>> /usr/bin/cassandra
>> /usr/bin/cassandra.in.sh <http://cassandra.in.sh>
>> /usr/bin/cassandra-cli
>> 
>> Still don't understand why it's giving me such weird error:
>> 
>> Error: Main method not found in class
>> org.apache.cassandra.service.CassandraDaemon, please define the main method
>> as:
>>public static void main(String[] args)
>> ***
>> 
>> This is not informative at all and does not even Help!
>> 
>> -Vivek
>> 
>> 
>> On Wed, Jul 17, 2013 at 3:49 PM, Vivek Mishra  wrote:
>>> @aaron
>>> Thanks for your reply. I did have a look rpm installed files
>>> 1.  /etc/alternatives/cassandra, it contains configuration files only.
>>> and .sh files are installed within /usr/bin folder.
>>> 
>>> Even if i try to run from extracted tar ball folder as
>>> 
>>> /home/impadmin/apache-cassandra-1.2.4/bin/cassandra -f
>>> 
>>> same error.  
>>> 
>>> /home/impadmin/apache-cassandra-1.2.4/bin/cassandra -v
>>> 
>>> gives me 1.1.12 though it should give me 1.2.4
>>> 
>>> 
>>> -Vivek
>>> it gives me same error.
>>> 
>>> 
>>> On Wed, Jul 17, 2013 at 3:37 PM, aaron morton 
>>> wrote:
>>>> Something is messed up in your install.  Can you try scrubbing the install
>>>> and restarting ?
>>>> 
>>>> Cheers
>>>> 
>>>> -
>>>> Aaron Morton
>>>> Cassandra Consultant
>>>> New Zealand
>>>> 
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>> 
>>>> On 17/07/2013, at 6:47 PM, Vivek Mishra  wrote:
>>>> 
>>>>> Error: Main method not found in class
>>>>> org.apache.cassandra.service.CassandraDaemon, please define the main
>>>>> method as:
>>>>>public static void main(String[] args)
>>>>> 
>>>>> 
>>>>> Hi,
>>>>> I am getting this error. Earlier it was working fine for me, when i simply
>>>>> downloaded the tarball installation and ran cassandra server. Recently i
>>>>> did rpm package installation of Cassandra and which is working fine. But
>>>>> somehow when i try to run it via originally extracted tar package. i am
>>>>> getting:
>>>>> 
>>>>> 

Re: Main method not found in class org.apache.cassandra.service.CassandraDaemon

2013-07-17 Thread Brian O'Neill
Vivek,

The location of CassandraDaemon changed between versions.  (from
org.apache.cassandra.thrift to org.apache.cassandra.service)

It is likely that the start scripts are picking up the old version on the
classpath, which results in the main method not being found.

Do you have CASSANDRA_HOME set?  I believe the start scripts will use that.
 Perhaps you have that set and pointed to the older 1.1.X version?

-brian


On Wed, Jul 17, 2013 at 8:31 AM, Vivek Mishra  wrote:

> Finally,
> i have to delete all rpm installed files to get this working, folders are:
> /usr/share/cassandra
> /etc/alternatives/cassandra
> /usr/bin/cassandra
> /usr/bin/cassandra.in.sh
> /usr/bin/cassandra-cli
>
> Still don't understand why it's giving me such weird error:
> 
> Error: Main method not found in class
> org.apache.cassandra.service.CassandraDaemon, please define the main method
> as:
>public static void main(String[] args)
> ***
>
> This is not informative at all and does not even Help!
>
> -Vivek
>
>
> On Wed, Jul 17, 2013 at 3:49 PM, Vivek Mishra wrote:
>
>> @aaron
>> Thanks for your reply. I did have a look rpm installed files
>> 1.  /etc/alternatives/cassandra, it contains configuration files only.
>> and .sh files are installed within /usr/bin folder.
>>
>> Even if i try to run from extracted tar ball folder as
>>
>> /home/impadmin/apache-cassandra-1.2.4/bin/cassandra -f
>>
>> same error.
>>
>> /home/impadmin/apache-cassandra-1.2.4/bin/cassandra -v
>>
>> gives me 1.1.12 though it should give me 1.2.4
>>
>>
>> -Vivek
>> it gives me same error.
>>
>>
>> On Wed, Jul 17, 2013 at 3:37 PM, aaron morton wrote:
>>
>>> Something is messed up in your install.  Can you try scrubbing the
>>> install and restarting ?
>>>
>>> Cheers
>>>
>>>-
>>> Aaron Morton
>>> Cassandra Consultant
>>> New Zealand
>>>
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 17/07/2013, at 6:47 PM, Vivek Mishra  wrote:
>>>
>>> Error: Main method not found in class
>>> org.apache.cassandra.service.CassandraDaemon, please define the main method
>>> as:
>>>public static void main(String[] args)
>>> 
>>>
>>> Hi,
>>> I am getting this error. Earlier it was working fine for me, when i
>>> simply downloaded the tarball installation and ran cassandra server.
>>> Recently i did rpm package installation of Cassandra and which is working
>>> fine. But somehow when i try to run it via originally extracted tar
>>> package. i am getting:
>>>
>>> *
>>> xss =  -ea
>>> -javaagent:/home/impadmin/software/apache-cassandra-1.2.4//lib/jamm-0.2.5.jar
>>> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1024M -Xmx1024M
>>> -Xmn256M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
>>> Error: Main method not found in class
>>> org.apache.cassandra.service.CassandraDaemon, please define the main method
>>> as:
>>>public static void main(String[] args)
>>> *
>>>
>>> I tried setting CASSANDRA_HOME directory, but no luck.
>>>
>>> Error is bit confusing, Any suggestions???
>>>
>>> -Vivek
>>>
>>>
>>>
>>
>


-- 
Brian ONeill
Chief Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: "SQL" Injection C* (via CQL & Thrift)

2013-06-18 Thread Brian O'Neill

Perfect.  Thanks Sylvain.  That is exactly the input I was looking for, and
I agree completely.
(t's easy enough to protect against)

As for the thrift side (i.e. using Hector or Astyanax), anyone have a crafty
way to inject something?

At first glance, it doesn't appear possible, but I'm not 100% confident
making that assertion.

-brian

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Sylvain Lebresne 
Reply-To:  
Date:  Tuesday, June 18, 2013 8:51 AM
To:  "user@cassandra.apache.org" 
Subject:  Re: "SQL" Injection C* (via CQL & Thrift)

If you're not careful, then "CQL injection" is possible.

Say you naively build you query with
  "UPDATE foo SET col='" + user_input + "' WHERE key = 'k'"
then if user_input is "foo' AND col2='bar", your user will have overwritten
a column it shouldn't have been able to. And something equivalent in a BATCH
statement could allow to overwrite/delete some random row in some random
table.

Now CQL being much more restricted than SQL (no subqueries, no generic
transaction, ...), the extent of what you can do with a CQL injection is way
smaller than in SQL. But you do have to be careful.

As far as the Datastax java driver is concerned, you can fairly easily
protect yourself by using either:
1) prepared statements: if the user input is a prepared variable, there is
nothing the user can do (it's "equivalent" to the thrift situation).
2) using the query builder: it will escape quotes in the strings you
provided, thuse avoiding injection.

So I would say that injections are definitively possible if you concatenate
strings too naively, but I don't think preventing them is very hard.

--
Sylvain


On Tue, Jun 18, 2013 at 2:02 PM, Brian O'Neill 
wrote:
> 
> Mostly for fun, I wanted to throw this out there...
> 
> We are undergoing a security audit for our platform (C* + Elastic Search +
> Storm).  One component of that audit is susceptibility to SQL injection.  I
> was wondering if anyone has attempted to construct a SQL injection attack
> against Cassandra?  Is it even possible?
> 
> I know the code paths fairly well, but...
> Does there exists a path in the code whereby user data gets interpreted, which
> could be exploited to perform user operations?
> 
> From the Thrift side of things, I've always felt safe.  Data is opaque.
> Serializers are used to convert it to Bytes, and C* doesn't ever really do
> anything with the data.
> 
> In examining the CQL java-driver, it looks like there might be a bit more
> exposure to injection.  (or even CQL over Thrift)  I haven't dug into the code
> yet, but dependent on which flavor of the API you are using, you may be
> including user data in your statements.
> 
> Does anyone know if the CQL java-driver does anything to protect against
> injection?  Or is it possible to say that the syntax is strict enough that any
> embedded operations in data would not parse?
> 
> just some food for thought...
> I'll be digging into this over the next couple weeks.  If people are
> interested, I can throw a blog post out there with the findings.
> 
> -brian
> 
> -- 
> Brian ONeill
> Lead Architect, Health Market Science (http://healthmarketscience.com)
> mobile:215.588.6024
> blog: http://brianoneill.blogspot.com/
> twitter: @boneill42





"SQL" Injection C* (via CQL & Thrift)

2013-06-18 Thread Brian O'Neill
Mostly for fun, I wanted to throw this out there...

We are undergoing a security audit for our platform (C* + Elastic Search +
Storm).  One component of that audit is susceptibility to SQL injection.  I
was wondering if anyone has attempted to construct a SQL injection attack
against Cassandra?  Is it even possible?

I know the code paths fairly well, but...
Does there exists a path in the code whereby user data gets interpreted,
which could be exploited to perform user operations?

>From the Thrift side of things, I've always felt safe.  Data is opaque.
 Serializers are used to convert it to Bytes, and C* doesn't ever really do
anything with the data.

In examining the CQL java-driver, it looks like there might be a bit more
exposure to injection.  (or even CQL over Thrift)  I haven't dug into the
code yet, but dependent on which flavor of the API you are using, you may
be including user data in your statements.

Does anyone know if the CQL java-driver does anything to protect against
injection?  Or is it possible to say that the syntax is strict enough that
any embedded operations in data would not parse?

just some food for thought...
I'll be digging into this over the next couple weeks.  If people are
interested, I can throw a blog post out there with the findings.

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


[BLOG] : Cassandra as a Deep Storage Mechanism for Druid Real-Time Analytics Engine

2013-05-17 Thread Brian O'Neill
FWIW, we were able to integrate Druid and Cassandra.

Its only in PoC right now, but it seems like a powerful combination:
http://brianoneill.blogspot.com/2013/05/cassandra-as-deep-storage-mechanism-for.html

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: multitenant support with key spaces

2013-05-06 Thread Brian O'Neill

You may want to look at using virtual keyspaces:
http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html

And follow these tickets:
http://wiki.apache.org/cassandra/MultiTenant

-brian


On May 6, 2013, at 2:37 AM, Darren Smythe wrote:

> How many keyspaces can you reasonably have? We have around 500 customers and 
> expect that to double end of year. We're looking into C* and wondering if it 
> makes sense for a separate KS per customer?
> 
> If we have 1000 customers, so one KS per customer is 1000 keyspaces. Is that 
> something C* can handle efficiently? Each customer has about 10 GB of data 
> (not taking replication into account).
> 
> Or is this symptomatic of a bad design?
> 
> I guess the same question applies to our notion of breaking up the column 
> families into time ranges. We're naively trying to avoid having few large 
> CFs/KSs. Is/should that be a concern?
> 
> What are the tradeoffs of a smaller number of heavyweight KS/CFs vs. manually 
> sharding the data into more granular KSs/CFs?
> 
> Thanks for any info.

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



Re: Exporting all data within a keyspace

2013-04-30 Thread Brian O'Neill

You could always do something like this as well:
http://brianoneill.blogspot.com/2012/05/dumping-data-from-cassandra-like.htm
l

-brian

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Kumar Ranjan 
Reply-To:  
Date:  Tuesday, April 30, 2013 9:11 AM
To:  "user@cassandra.apache.org" 
Subject:  Re: Exporting all data within a keyspace

Try sstable2json and json2sstable. But it works on column family so you can
fetch all column family and iterate over list of CF and use sstable2json
tool to extract data. Remember this will only fetch on disk data do anything
in memtable/cache which is to be flushed will be missed. So run compaction
and then run the written script.

On Tuesday, April 30, 2013, Chidambaran Subramanian  wrote:
> Is there any easy way of exporting all data for a keyspace (and conversely)
> importing it.
> 
> Regards
> Chiddu




Re: DB Change management tools for Cassandra?

2013-04-25 Thread Brian O'Neill

I haven't seen any, which has one of our developers (CC'd) looking at
extending myBatis migrations and/or Flyway with CQL to do it.

-brian

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 4/25/13 7:35 AM, "Marko Asplund"  wrote:

>hi,
>
>Do database change management tools similar to Liquibase and dbdeploy
>exist for Cassandra?
>I need to handle change management for CQL3 schema.
>
>
>thanks,
>
>marko




Re: Blobs in CQL?

2013-04-11 Thread Brian O'Neill
Bingo! Thanks to both of you.  (the C* community rocks)

A few hours worth of work, and I've got a working REST-based photo
repository backed by  C* using the CQL java driver. =)

rock on, thanks again,
-brian


On Thu, Apr 11, 2013 at 9:33 AM, Sylvain Lebresne wrote:

>
> I assume I'm doing something wrong in the select.  Am I incorrectly using
>> the ResultSet?
>>
>
> You're incorrectly using the returned ByteBuffer. But you should not feel
> bad, that API kinda
> sucks.
>
> The short version is that .array() returns the backing array of the
> ByteBuffer. But there is no
> guarantee that you'll have a one-to-one correspondence between the valid
> content of the
> ByteBuffer and the backing array, the backing array can be bigger in
> particular (long story short,
> this allows multiple ByteBuffer to share the same backing array, which can
> avoid doing copies).
>
> I also note that there is no guarantee that .array() will work unless
> you've called .hasArray().
>
> Anyway, what you could do is:
> ByteBuffer bb = resultSet.one().getBytes("data");
> byte[] data = new byte[bb.remaining()];
> bb.get(data);
>
> Alternatively, you can use the result of .array(), but you should only
> consider the bb.remaining()
> bytes starting at bb.arrayOffset() + bb.position() (where bb is the
> returned ByteBuffer).
>
> --
> Sylvain
>
>
>
>>
>> -brian
>>
>> On Thu, Apr 11, 2013 at 9:09 AM, Brian O'Neill wrote:
>>
>>> Yep, it worked like a charm.  (PreparedStatement avoided the hex
>>> conversion)
>>>
>>> But now, I'm seeing a few extra bytes come back in the select….
>>> (I'll keep digging, but maybe you have some insight?)
>>>
>>> I see this:
>>>
>>> ERROR [2013-04-11 13:05:03,461] com.skookle.dao.RepositoryDao:
>>> repository.add() byte.length()=[259804]
>>>
>>> ERROR [2013-04-11 13:08:08,487] com.skookle.dao.RepositoryDao:
>>> repository.get() [foo.jpeg] byte.length()=[259861]
>>>
>>> (Notice the length's don't match up)
>>>
>>> Using this code:
>>>
>>> public void addContent(String key, byte[] data)
>>>
>>> throws NoHostAvailableException {
>>>
>>> LOG.error("repository.add() byte.length()=[" + data.length + "]"
>>> );
>>>
>>> String statement = "INSERT INTO " + KEYSPACE + "." + TABLE + "(key,
>>> data) VALUES (?, ?)";
>>>
>>> PreparedStatement ps = session.prepare(statement);
>>>
>>> BoundStatement bs = ps.bind(key, ByteBuffer.wrap(data));
>>>
>>> session.execute(bs);
>>>
>>> }
>>>
>>>
>>> public byte[] getContent(String key) throwsNoHostAvailableException {
>>>
>>> Query select = select("data").from(KEYSPACE, TABLE).where(eq(
>>> "key", key));
>>>
>>> ResultSet resultSet = session.execute(select);
>>>
>>> byte[] data = resultSet.one().getBytes("data").array();
>>>
>>> LOG.error("repository.get() [" + key + "] byte.length()=[" +
>>> data.length + "]");
>>>
>>> return data;
>>>
>>> }
>>>
>>> ---
>>>
>>> Brian O'Neill
>>>
>>> Lead Architect, Software Development
>>>
>>> *Health Market Science*
>>>
>>> *The Science of Better Results*
>>>
>>> 2700 Horizon Drive • King of Prussia, PA • 19406
>>>
>>> M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42>  •
>>>
>>> healthmarketscience.com
>>>
>>>
>>> This information transmitted in this email message is for the intended
>>> recipient only and may contain confidential and/or privileged material. If
>>> you received this email in error and are not the intended recipient, or the
>>> person responsible to deliver it to the intended recipient, please contact
>>> the sender at the email above and delete this email and any attachments and
>>> destroy any copies thereof. Any review, retransmission, dissemination,
>>> copying or other use of, or taking any action in reliance upon, this
>>> information by persons or entities other than the intended recipient is
>>> strictly prohibited.
>>>
>>> ** **
>>>
>>>

Re: Blobs in CQL?

2013-04-11 Thread Brian O'Neill
Sylvain,

Interesting, when I look at the actual bytes returned, I see the byte array
is prefixed with the keyspace and table name.

I assume I'm doing something wrong in the select.  Am I incorrectly using
the ResultSet?

-brian

On Thu, Apr 11, 2013 at 9:09 AM, Brian O'Neill wrote:

> Yep, it worked like a charm.  (PreparedStatement avoided the hex
> conversion)
>
> But now, I'm seeing a few extra bytes come back in the select….
> (I'll keep digging, but maybe you have some insight?)
>
> I see this:
>
> ERROR [2013-04-11 13:05:03,461] com.skookle.dao.RepositoryDao:
> repository.add() byte.length()=[259804]
>
> ERROR [2013-04-11 13:08:08,487] com.skookle.dao.RepositoryDao:
> repository.get() [foo.jpeg] byte.length()=[259861]
>
> (Notice the length's don't match up)
>
> Using this code:
>
> public void addContent(String key, byte[] data)
>
> throws NoHostAvailableException {
>
> LOG.error("repository.add() byte.length()=[" + data.length + "]");
>
> String statement = "INSERT INTO " + KEYSPACE + "." + TABLE + "(key,
> data) VALUES (?, ?)";
>
> PreparedStatement ps = session.prepare(statement);
>
> BoundStatement bs = ps.bind(key, ByteBuffer.wrap(data));
>
> session.execute(bs);
>
> }
>
>
> public byte[] getContent(String key) throws NoHostAvailableException {
>
> Query select = select("data").from(KEYSPACE, TABLE).where(eq("key",
> key));
>
> ResultSet resultSet = session.execute(select);
>
> byte[] data = resultSet.one().getBytes("data").array();
>
> LOG.error("repository.get() [" + key + "] byte.length()=[" + data.
> length + "]");
>
> return data;
>
> }
>
> ---
>
> Brian O'Neill
>
> Lead Architect, Software Development
>
> *Health Market Science*
>
> *The Science of Better Results*
>
> 2700 Horizon Drive • King of Prussia, PA • 19406
>
> M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42>  •
>
> healthmarketscience.com
>
>
> This information transmitted in this email message is for the intended
> recipient only and may contain confidential and/or privileged material. If
> you received this email in error and are not the intended recipient, or the
> person responsible to deliver it to the intended recipient, please contact
> the sender at the email above and delete this email and any attachments and
> destroy any copies thereof. Any review, retransmission, dissemination,
> copying or other use of, or taking any action in reliance upon, this
> information by persons or entities other than the intended recipient is
> strictly prohibited.
>
> ** **
>
>
> From: Sylvain Lebresne 
> Reply-To: 
> Date: Thursday, April 11, 2013 8:48 AM
> To: "user@cassandra.apache.org" 
> Cc: Gabriel Ciuloaica 
> Subject: Re: Blobs in CQL?
>
>
> Hopefully, the prepared statement doesn't do the conversion.
>>
>
> It does not.
>
>
>> (I'm not sure if it is a limitation of the CQL protocol itself)
>>
>> thanks again,
>> -brian
>>
>>
>>
>> ---
>> Brian O'Neill
>> Lead Architect, Software Development
>> Health Market Science
>> The Science of Better Results
>> 2700 Horizon Drive • King of Prussia, PA • 19406
>> M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42>  •
>> healthmarketscience.com
>>
>> This information transmitted in this email message is for the intended
>> recipient only and may contain confidential and/or privileged material. If
>> you received this email in error and are not the intended recipient, or
>> the person responsible to deliver it to the intended recipient, please
>> contact the sender at the email above and delete this email and any
>> attachments and destroy any copies thereof. Any review, retransmission,
>> dissemination, copying or other use of, or taking any action in reliance
>> upon, this information by persons or entities other than the intended
>> recipient is strictly prohibited.
>>
>>
>>
>>
>>
>>
>>
>> On 4/11/13 8:34 AM, "Gabriel Ciuloaica"  wrote:
>>
>> >I'm not using the query builder but the PreparedStatement.
>> >
>> >Here is the sample code: https://gist.github.com/devsprint/5363023
>> >
>> >Gabi
>> >On 4/11/13 3:27 PM, Brian O'Neill wrote:
>> >> Great!
>> >>
>> >> Thanks Gabrie

Re: Blobs in CQL?

2013-04-11 Thread Brian O'Neill
Yep, it worked like a charm.  (PreparedStatement avoided the hex conversion)

But now, I'm seeing a few extra bytes come back in the select….
(I'll keep digging, but maybe you have some insight?)

I see this:
ERROR [2013-04-11 13:05:03,461] com.skookle.dao.RepositoryDao:
repository.add() byte.length()=[259804]

ERROR [2013-04-11 13:08:08,487] com.skookle.dao.RepositoryDao:
repository.get() [foo.jpeg] byte.length()=[259861]


(Notice the length's don't match up)

Using this code:
public void addContent(String key, byte[] data)

throws NoHostAvailableException {

LOG.error("repository.add() byte.length()=[" + data.length + "]");

String statement = "INSERT INTO " + KEYSPACE + "." + TABLE + "(key,
data) VALUES (?, ?)";

PreparedStatement ps = session.prepare(statement);

BoundStatement bs = ps.bind(key, ByteBuffer.wrap(data));

session.execute(bs);

}



public byte[] getContent(String key) throws NoHostAvailableException {

Query select = select("data").from(KEYSPACE, TABLE).where(eq("key",
key));

ResultSet resultSet = session.execute(select);

byte[] data = resultSet.one().getBytes("data").array();

LOG.error("repository.get() [" + key + "] byte.length()=[" +
data.length + "]");

return data;

}


---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive • King of Prussia, PA • 19406
M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42>   •
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Sylvain Lebresne 
Reply-To:  
Date:  Thursday, April 11, 2013 8:48 AM
To:  "user@cassandra.apache.org" 
Cc:  Gabriel Ciuloaica 
Subject:  Re: Blobs in CQL?


> Hopefully, the prepared statement doesn't do the conversion.

It does not.
 
> (I'm not sure if it is a limitation of the CQL protocol itself)
> 
> thanks again,
> -brian
> 
> 
> 
> ---
> Brian O'Neill
> Lead Architect, Software Development
> Health Market Science
> The Science of Better Results
> 2700 Horizon Drive • King of Prussia, PA • 19406
> M: 215.588.6024   • @boneill42
> <http://www.twitter.com/boneill42>  •
> healthmarketscience.com <http://healthmarketscience.com>
> 
> This information transmitted in this email message is for the intended
> recipient only and may contain confidential and/or privileged material. If
> you received this email in error and are not the intended recipient, or
> the person responsible to deliver it to the intended recipient, please
> contact the sender at the email above and delete this email and any
> attachments and destroy any copies thereof. Any review, retransmission,
> dissemination, copying or other use of, or taking any action in reliance
> upon, this information by persons or entities other than the intended
> recipient is strictly prohibited.
> 
> 
> 
> 
> 
> 
> 
> On 4/11/13 8:34 AM, "Gabriel Ciuloaica"  wrote:
> 
>> >I'm not using the query builder but the PreparedStatement.
>> >
>> >Here is the sample code: https://gist.github.com/devsprint/5363023
>> >
>> >Gabi
>> >On 4/11/13 3:27 PM, Brian O'Neill wrote:
>>> >> Great!
>>> >>
>>> >> Thanks Gabriel.  Do you have an example? (are using QueryBuilder?)
>>> >> I couldn't find the part of  the API that allowed you to pass in the
>>> >>byte
>>> >> array.
>>> >>
>>> >> -brian
>>> >>
>>> >> ---
>>> >> Brian O'Neill
>>> >> Lead Architect, Software Development
>>> >> Health Market Science
>>> >> The Science of Better Results
>>> >> 2700 Horizon Drive € King of Prussia, PA € 19406
>>> >> M: 215.588.6024   € @boneill42
>>> <http://www.twitter.com/boneill42>  €
>>> >> healthmarketscience.com <http://healthmarketscience.com>
>>> >>
>>>

Re: Blobs in CQL?

2013-04-11 Thread Brian O'Neill
Cool.  That might be it.  I'll take a look at PreparedStatement.

For query building, I took a look under the covers, and even when I was
passing in a ByteBuffer, it runs through the following code in the
java-driver:

Utils.java:
   if (value instanceof ByteBuffer) {
  sb.append("0x");
  sb.append(ByteBufferUtil.bytesToHex((ByteBuffer)value));
   }

Hopefully, the prepared statement doesn't do the conversion.
(I'm not sure if it is a limitation of the CQL protocol itself)

thanks again,
-brian



---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive • King of Prussia, PA • 19406
M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42>  •
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 4/11/13 8:34 AM, "Gabriel Ciuloaica"  wrote:

>I'm not using the query builder but the PreparedStatement.
>
>Here is the sample code: https://gist.github.com/devsprint/5363023
>
>Gabi
>On 4/11/13 3:27 PM, Brian O'Neill wrote:
>> Great!
>>
>> Thanks Gabriel.  Do you have an example? (are using QueryBuilder?)
>> I couldn't find the part of  the API that allowed you to pass in the
>>byte
>> array.
>>
>> -brian
>>
>> ---
>> Brian O'Neill
>> Lead Architect, Software Development
>> Health Market Science
>> The Science of Better Results
>> 2700 Horizon Drive € King of Prussia, PA € 19406
>> M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
>> healthmarketscience.com
>>
>> This information transmitted in this email message is for the intended
>> recipient only and may contain confidential and/or privileged material.
>>If
>> you received this email in error and are not the intended recipient, or
>> the person responsible to deliver it to the intended recipient, please
>> contact the sender at the email above and delete this email and any
>> attachments and destroy any copies thereof. Any review, retransmission,
>> dissemination, copying or other use of, or taking any action in reliance
>> upon, this information by persons or entities other than the intended
>> recipient is strictly prohibited.
>>   
>>
>>
>>
>>
>>
>>
>> On 4/11/13 8:25 AM, "Gabriel Ciuloaica"  wrote:
>>
>>> Hi Brian,
>>>
>>> I'm using the blobs to store images in cassandra(1.2.3) using the
>>> java-driver version 1.0.0-beta1.
>>> There is no need to convert a byte array into hex.
>>>
>>> Br,
>>> Gabi
>>>
>>> On 4/11/13 3:21 PM, Brian O'Neill wrote:
>>>> I started playing around with the CQL driver.
>>>> Has anyone used blobs with it yet?
>>>>
>>>> Are you forced to convert a byte[] to hex?
>>>> (e.g. I have a photo that I want to store in C* using the java-driver
>>>> API)
>>>>
>>>> -brian
>>>>
>>>> -- 
>>>> Brian ONeill
>>>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>>>> mobile:215.588.6024
>>>> blog: http://brianoneill.blogspot.com/
>>>> twitter: @boneill42
>>
>




Re: Blobs in CQL?

2013-04-11 Thread Brian O'Neill
Great!

Thanks Gabriel.  Do you have an example? (are using QueryBuilder?)
I couldn't find the part of  the API that allowed you to pass in the byte
array.

-brian

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 4/11/13 8:25 AM, "Gabriel Ciuloaica"  wrote:

>Hi Brian,
>
>I'm using the blobs to store images in cassandra(1.2.3) using the
>java-driver version 1.0.0-beta1.
>There is no need to convert a byte array into hex.
>
>Br,
>Gabi
>
>On 4/11/13 3:21 PM, Brian O'Neill wrote:
>>
>> I started playing around with the CQL driver.
>> Has anyone used blobs with it yet?
>>
>> Are you forced to convert a byte[] to hex?
>> (e.g. I have a photo that I want to store in C* using the java-driver
>>API)
>>
>> -brian
>>
>> -- 
>> Brian ONeill
>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>> mobile:215.588.6024
>> blog: http://brianoneill.blogspot.com/
>> twitter: @boneill42
>




Blobs in CQL?

2013-04-11 Thread Brian O'Neill
I started playing around with the CQL driver.
Has anyone used blobs with it yet?

Are you forced to convert a byte[] to hex?
(e.g. I have a photo that I want to store in C* using the java-driver API)

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-10 Thread Brian O'Neill


(at least until we can determine if this can/should be proposed under 1472)

For those interested in analytics and set-based queries, see below...

-brian

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 4/10/13 10:43 PM, "Matt Stump"  wrote:

>Druid was our inspiration to layer bitmap indexes on top of Cassandra.
>Druid doesn't work for us because or data set is too large. We would need
>many hundreds of nodes just for the pre-processed data. What I envisioned
>was the ability to perform druid style queries (no aggregation) without
>the
>limitations imposed by having the entire dataset in memory. I primarily
>need to query whether a user performed some event, but I also intend to
>add
>trigram indexes for LIKE, ILIKE or possibly regex style matching.
>
>I wasn't aware of CONCISE, thanks for the pointer. We are currently
>evaluating fastbit, which is a very similar project:
>https://sdm.lbl.gov/fastbit/
>
>
>On Wed, Apr 10, 2013 at 5:49 PM, Brian O'Neill
>wrote:
>
>>
>> How does this compare with Druid?
>> https://github.com/metamx/druid
>>
>> We're currently evaluating Acunu, Vertica and Druid...
>>
>> 
>>http://brianoneill.blogspot.com/2013/04/bianalytics-on-big-datacassandra.
>>html
>>
>> With its bitmapped indexes, Druid appears to have the most potential.
>> They boast some pretty impressive stats, especially WRT handling
>> "real-time" updates and adding new dimensions.
>>
>> They also use a compression algorithm, CONCISE, to cut down on the space
>> requirements.
>> http://ricerca.mat.uniroma3.it/users/colanton/concise.html
>>
>> I haven't looked too deep into the Druid code, but I've been meaning to
>> see if it could be backed by C*.
>>
>> We'd be game to join the hunt if you pursue such a beast. (with your
>>code,
>> or with portions of Druid)
>>
>> -brian
>>
>>
>> On Apr 10, 2013, at 5:40 PM, mrevilgnome wrote:
>>
>> > What do you think about set manipulation via indexes in Cassandra? I'm
>> > interested in answering queries such as give me all users that
>>performed
>> > event 1, 2, and 3, but not 4. If the answer is yes than I can make a
>>case
>> > for spending my time on C*. The only downside for us would be our
>>current
>> > prototype is in C++ so we would loose some performance and the
>>ability to
>> > dedicate an entire machine to caching/performing queries.
>> >
>> >
>> > On Wed, Apr 10, 2013 at 11:57 AM, Jonathan Ellis 
>> wrote:
>> >
>> >> If you mean, "Can someone help me figure out how to get started
>>updating
>> >> these old patches to trunk and cleaning out the Avro?" then yes, I've
>> been
>> >> knee-deep in indexing code recently.
>> >>
>> >>
>> >> On Wed, Apr 10, 2013 at 11:34 AM, mrevilgnome 
>> >> wrote:
>> >>
>> >>> I'm currently building a distributed cluster on top of cassandra to
>> >> perform
>> >>> fast set manipulation via bitmap indexes. This gives me the ability
>>to
>> >>> perform unions, intersections, and set subtraction across
>>sub-queries.
>> >>> Currently I'm storing index information for thousands of dimensions
>>as
>> >>> cassandra rows, and my cluster keeps this information cached,
>> distributed
>> >>> and replicated in order to answer queries.
>> >>>
>> >>> Every couple of days I think to myself this should really exist in
>>C*.
>> >>> Given all the benifits would there be any interest in
>> >>> reviving CASSANDRA-1472?
>> >>>
>> >>> Some downsides are that this is very memory intensive, even for
>>sparse
>> >>> bitmaps.
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder, http://www.datastax.com
>> >> @spyced
>> >>
>>
>> --
>> Brian ONeill
>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>> mobile:215.588.6024
>> blog: http://weblogs.java.net/blog/boneill42/
>> blog: http://brianoneill.blogspot.com/
>>
>>




BI/Analtyics/Warehousing for data in C*

2013-04-01 Thread Brian O'Neill
We are trudging through an options analysis for BI/DW solutions for data
stored in C*.

I'd love to hear people's experiences.  Here is what we've found so far:
http://brianoneill.blogspot.com/2013/04/bianalytics-on-big-datacassandra.html

Maybe we just use Intravert with a custom handler to handle the dimensional
cubes?
https://github.com/zznate/intravert-ug

Then, we could slap a javascript charting framework on it and call it
cubert. =)
http://www.classicgamesarcade.com/game/21652/q*bert.html

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: any other NYC* attendees find your usb stick of the proceedings empty?

2013-03-25 Thread Brian O'Neill
I think the recorded sessions will be posted to the PlanetCassandra Youtube
channel:
http://www.planetcassandra.org/blog/post/nyc-big-data-tech-day-update

Some of the slides have been posted up to slideshare:
http://www.slideshare.net/boneill42/hms-nyc-talk
http://www.slideshare.net/edwardcapriolo/intravert

-brian

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Brian Tarbox 
Reply-To:  
Date:  Monday, March 25, 2013 11:43 AM
To:  
Subject:  any other NYC* attendees find your usb stick of the proceedings
empty?

Last week I attended DataStax's NYC* conference and one of the give-aways
was a wooden USB stick.  Finally getting around to loading it I find it
empty.

Anyone else have this problem?  Are the conference presentations available
somewhere else?

Brian Tarbox




Re: Netflix/Astynax Client for Cassandra

2013-02-07 Thread Brian O'Neill

Incidentally, we run Astyanax against 1.2.1. We haven't had any issues.

When running against 1.2.0, we ran into this:
https://github.com/Netflix/astyanax/issues/191


-brian

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 2/7/13 6:58 AM, "Peter Lin"  wrote:

>if i'm not mistaken, isn't this due to limitations of thrift versus
>binary protocol? That's my understanding from datastax blogs.
>
>unless someone really needs all the features of 1.2 like asynchronous
>queries, astyanax and hector should work fine.
>
>On Thu, Feb 7, 2013 at 1:20 AM, Gabriel Ciuloaica 
>wrote:
>> Astyanax is not working with Cassandra 1.2.1. Only java-driver is
>>working
>> very well with both Cassandra 1.2 and 1.2.1.
>>
>> Cheers,
>> Gabi
>>
>> On 2/7/13 8:16 AM, Michael Kjellman wrote:
>>
>> It's a really great library and definitely recommended by me and many
>>who
>> are reading this.
>>
>> And if you are just starting out on 1.2.1 with C* you might also want to
>> evaluate https://github.com/datastax/java-driver and the new binary
>> protocol.
>>
>> Best,
>> michael
>>
>> From: Cassa L 
>> Reply-To: "user@cassandra.apache.org" 
>> Date: Wednesday, February 6, 2013 10:13 PM
>> To: "user@cassandra.apache.org" 
>> Subject: Netflix/Astynax Client for Cassandra
>>
>> Hi,
>>  Has anyone used Netflix/astynax java client library for Cassandra? I
>>have
>> used Hector before and would like to evaluate astynax. Not sure, how it
>>is
>> accepted in Cassandra community. Any issues with it or advantagest? API
>> looks very clean and simple compare to Hector. Has anyone used it in
>> production except Netflix themselves?
>>
>> Thanks
>> LCassa
>>
>>




Re: cql: show tables in a keystone

2013-01-28 Thread Brian O'Neill

cqlsh> use keyspace;
cqlsh:cirrus> describe tables;

For more info:
cqlsh> help describe

-brian


---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 1/28/13 2:27 PM, "Paul van Hoven"  wrote:

>Is there some way in cql to get a list of all tables or column
>families that belong to a keystore like "show tables" in sql?




Re: Accessing Metadata of Column Familes

2013-01-28 Thread Brian O'Neill
Through CQL, you see the logical schema.
Through CLI, you see the physical schema.

This may help:
http://www.datastax.com/dev/blog/cql3-for-cassandra-experts

-brian

On Mon, Jan 28, 2013 at 7:26 AM, Rishabh Agrawal
 wrote:
> I found following issues while working on Cassandra version 1.2, CQL 3 and
> Thrift protocol 19.35.0.
>
>
>
> Case 1:
>
> Using CQL I created a table t1 with columns col1 and col2 with col1 being my
> primary key.
>
>
>
> When I access same data using CLI, I see col1 gets adopted as rowkey and
> col2 being another column. Now I have inserted value in another column
> (col3) in same row using CLI.  Now when I query same table again from CQL I
> am unable to find col3.
>
>
>
> Case 2:
>
>
>
> Using CLI, I have created table t2. Now I added a row key  row1 and two
> columns (keys)  col1 and col2 with some values in each. When I access t2
> from CQL I find following resultset with three columns:
>
>
>
>   key | column1 | value
>
> row1| col1  | val1
>
> row1| col2  | val2
>
>
>
>
>
> This behavior raises certain questions:
>
>
>
> · What is the reason for such schema anomaly or is this a problem?
>
> · Which schema should be deemed as correct or consistent?
>
> · How to access meta data on the same?
>
>
>
>
>
> Thanks and Regards
>
> Rishabh Agrawal
>
>
>
>
>
> From: Harshvardhan Ojha [mailto:harshvardhan.o...@makemytrip.com]
> Sent: Monday, January 28, 2013 12:57 PM
>
>
> To: user@cassandra.apache.org
> Subject: RE: Accessing Metadata of Column Familes
>
>
>
> You can get storage attributes from /data/system/ keyspace.
>
>
>
> From: Rishabh Agrawal [mailto:rishabh.agra...@impetus.co.in]
> Sent: Monday, January 28, 2013 12:42 PM
> To: user@cassandra.apache.org
> Subject: RE: Accessing Metadata of Column Familes
>
>
>
> Thank for the reply.
>
>
>
> I do not want to go by API route. I wish to access files and column families
> which store meta data information
>
>
>
> From: Harshvardhan Ojha [mailto:harshvardhan.o...@makemytrip.com]
> Sent: Monday, January 28, 2013 12:25 PM
> To: user@cassandra.apache.org
> Subject: RE: Accessing Metadata of Column Familes
>
>
>
> Which API are you using?
>
> If you are using Hector use ColumnFamilyDefinition.
>
>
>
> Regards
>
> Harshvardhan OJha
>
>
>
> From: Rishabh Agrawal [mailto:rishabh.agra...@impetus.co.in]
> Sent: Monday, January 28, 2013 12:16 PM
> To: user@cassandra.apache.org
> Subject: Accessing Metadata of Column Familes
>
>
>
> Hello,
>
>
>
> I wish to access metadata information on column families. How can I do it?
> Any ideas?
>
>
>
> Thanks and Regards
>
> Rishabh Agrawal
>
>
>
>
>
> 
>
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>
> The contents of this email, including the attachments, are PRIVILEGED AND
> CONFIDENTIAL to the intended recipient at the email address to which it has
> been addressed. If you receive it in error, please notify the sender
> immediately by return email and then permanently delete it from your system.
> The unauthorized use, distribution, copying or alteration of this email,
> including the attachments, is strictly forbidden. Please note that neither
> MakeMyTrip nor the sender accepts any responsibility for viruses and it is
> your responsibility to scan the email and attachments (if any). No contracts
> may be concluded on behalf of MakeMyTrip by means of email communications.
>
>
>
> 
>
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>
> The contents of this email, including the attachments, are PRIVILEGED AND
> CONFIDENTIAL to the intended recipient at the email address to which it has
> been addressed. If you receive it in error, please notify the sender
> immediately by return email and then permanently delete it from your system.
> The unauthorized use, distribution, copying or alteration of this email,
> including the attachments, is strictly forbidden. Please note that neither
> MakeMyTrip nor the sender accepts any responsibility for viruses and it is
> your responsibi

Webinar: Using Storm for Distributed Processing on Cassandra

2013-01-16 Thread Brian O'Neill
Just an FYI --

We will be hosting a webinar tomorrow demonstrating the use of Storm
as a distributed processing layer on top of Cassandra.

I'll be tag teaming with Taylor Goetz, the original author of storm-cassandra.
http://www.datastax.com/resources/webinars/collegecredit

It is part of the C*ollege Credit Webinar Series from Datastax.

All are welcome.

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: Cassandra 1.2 Thrift and CQL 3 issue

2013-01-12 Thread Brian O'Neill

I reported the issue here.  You may be missing a component in your column name.

https://issues.apache.org/jira/browse/CASSANDRA-5138

-brian


On Jan 12, 2013, at 12:48 PM, Shahryar Sedghi wrote:

> Hi
> 
> I am trying to test my application that runs with JDBC, CQL 3 with Cassandra 
> 1.2. After getting many weird errors and downgrading from JDBC to thrift, I 
> realized the thrift on Cassandra 1.2 has issues with wide rows. If I define 
> the table as:
> 
> CREATE TABLE  test(interval int,id text, body text, primary key (interval, 
> id));
> 
> select interval, id, body from test;
> 
>  fails with:
> 
> ERROR [Thrift:16] 2013-01-11 18:23:35,997 CustomTThreadPoolServer.java (line 
> 217) Error occurred during processing of message.
> java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 1
> at 
> org.apache.cassandra.config.CFMetaData.getColumnDefinitionFromColumnName(CFMetaData.java:923)
> at 
> org.apache.cassandra.cql.QueryProcessor.processStatement(QueryProcessor.java:502)
> at 
> org.apache.cassandra.cql.QueryProcessor.process(QueryProcessor.java:789)
> at 
> org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1652)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:4048)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:4036)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1121)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
> at java.lang.Thread.run(Thread.java:780)
> 
> Same code works well with Cassandra 1.1. 
> 
> At the same time, if I define the table as:
> CREATE TABLE  test1(interval int,id text, body text, primary key (interval));
> 
> everything works fine. I am using 
> 
> DataStax Community 1.2
> 
> apache-cassandra-clientutil-1.2.0.jar
> apache-cassandra-thrift-1.2.0.jar
> libthrift-0.7.0.jar
> 
> Apparently client.set_cql_version("3.0.0"); has no effect either. Is there a 
> setting that I miss on the client side to dictate cql3 or it is a bug?
> 
> Thanks in advance
> 
> Shahryar
> 
> -- 
> "Life is what happens while you are making other plans." ~ John Lennon

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



Re: Astyanax

2013-01-08 Thread Brian O'Neill
Not sure where you are on the learning curve, but I've put a couple "getting
started" projects out on github:
https://github.com/boneill42/astyanax-quickstart

And the latest from the webinar is here:
https://github.com/boneill42/naughty-or-nice
http://brianoneill.blogspot.com/2013/01/creating-your-frist-java-application
-w.html

-brian

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Radek Gruchalski 
Reply-To:  
Date:  Tuesday, January 8, 2013 10:17 AM
To:  "user@cassandra.apache.org" 
Cc:  "user@cassandra.apache.org" 
Subject:  Re: Astyanax

Hi,

We are using astyanax and we found out that github wiki with stackoverflow
is the most comprehensive set of documentation.

Do you have any specific questions?

Kind regards,
Radek Gruchalski

On 8 Jan 2013, at 15:46, Everton Lima  wrote:

> I was studing by there, but I would to know if anyone knows other sources.
> 
> 2013/1/8 Markus Klems 
>> The wiki? https://github.com/Netflix/astyanax/wiki
>> 
>> 
>> On Tue, Jan 8, 2013 at 2:44 PM, Everton Lima  wrote:
>>> Hi,
>>> Someone has or could indicate some good tutorial or book to learn Astyanax?
>>> 
>>> Thanks
>>> 
>>> -- 
>>> Everton Lima Aleixo
>>> Mestrando em Ciência da Computação pela UFG
>>> Programador no LUPA
>>> 
>> 
> 
> 
> 
> -- 
> Everton Lima Aleixo
> Bacharel em Ciência da Computação pela UFG
> Mestrando em Ciência da Computação pela UFG
> Programador no LUPA
> 




Re: CQL3 Compound Primary Keys - Do I have the right idea?

2012-12-22 Thread Brian O'Neill

Agreed.  I actually flip between cli and cqlsh these days. 

cqlsh shows the logical view.
cli shows the physical view.

This is useful, especially when developing using a thrift-based client.
Here are the slides and video if you want to have a look.

-brian



On Dec 22, 2012, at 3:36 AM, Wz1975 wrote:

> You still add one row. The  column name is the remaining part of the 
> composite key (repeat for each column) plus each of the column which is not 
> in the composite key. I found it is much clearer to look at the data through 
> Cassandra -cli which shows you how data is stored. 
> 
> 
> Thanks.
> -Wei
> 
> Sent from my Samsung smartphone on AT&T 
> 
> 
>  Original message 
> Subject: CQL3 Compound Primary Keys - Do I have the right idea? 
> From: Adam Venturella  
> To: user@cassandra.apache.org 
> CC: 
> 
> 
> Trying to better grasp compound primary keys and what they are conceptually 
> doing under the hood. When you create a table with a compound primary key in 
> cql3 (http://www.datastax.com/dev/blog/schema-in-cassandra-1-1) the first 
> part of the key is the partition key. I get that and the subsequent parts 
> help with the row name as I understand it.
> 
> So when you add a new row to that columnfamily/table, you are still adding a 
> row. In other words, the RandomPartitioner places it somewhere in the cluster 
> as a row on it's own as opposed to just adding a new column to an existing 
> row, which would live on the same node as the row
> 
> The effect of the compound key means that those rows are effectively treated 
> as if they were part of the same column, making it a wide column.
> 
> Is that the right idea or do I have the row / rp thing wrong?
> 


Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



Re: Best Java Driver for Cassandra?

2012-12-13 Thread Brian O'Neill

Well, we'll talk a bit about this in my webinar later todayŠ
http://brianoneill.blogspot.com/2012/12/presenting-for-datastax-college-cre
dit.html

I put together a quick decision matrix for all of the options based on
production-readiness, potential and momentum.  I think the slides will be
made available afterwards.

I also have a laundry list here: (written before I knew about Firebrand)
http://brianoneill.blogspot.com/2012/08/cassandra-apis-laundry-list.html

-brian

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 12/13/12 9:03 AM, "stephen.m.thomp...@wellsfargo.com"
 wrote:

>There seem to be a number of good options listed ... FireBrand and Hector
>seem to have the most attractive sites, but that doesn't necessarily mean
>anything.  :)  Can anybody make a case for one of the drivers over
>another, especially in terms of which ones seem to be most used in major
>implementations?
>
>Thanks
>Steve




Datastax C*ollege Credit Webinar Series : Create your first Java App w/ Cassandra

2012-12-12 Thread Brian O'Neill
FWIW --
I'm presenting tomorrow for the Datastax C*ollege Credit Webinar Series:
http://brianoneill.blogspot.com/2012/12/presenting-for-datastax-college-credit.html

I hope to make CQL part of the presentation and show how it integrates
with the Java APIs.
If you are interested, drop in.

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: Datatype Conversion in CQL-Client?

2012-11-19 Thread Brian O'Neill

Hector does, but the newer clients/drivers no longer use Thrift.  (Thrift is
the legacy protocol)

If you are still in early stages and you know you want your primary
interface to be CQL, you may want to look at the java driver that Datastax
just released.
  http://github.com/datastax/java-driver

-brian

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Timmy Turner 
Reply-To:  
Date:  Monday, November 19, 2012 3:37 PM
To:  
Subject:  Re: Datatype Conversion in CQL-Client?

Do these other clients use the thrift API internaly?


2012/11/19 John Sanda 
> You might want to take  look a org.apache.cassandra.transport.SimpleClient and
> org.apache.cassandra.transport.messages.ResultMessage.
> 
> 
> On Mon, Nov 19, 2012 at 9:48 AM, Timmy Turner  wrote:
>> What I meant was the method that the Cassandra-jars give you when you include
>> them in your project:
>> 
>>   TTransport tr = new TFramedTransport(new TSocket("localhost", 9160));
>>   TProtocol proto = new TBinaryProtocol(tr);
>>   Cassandra.Client client = new Cassandra.Client(proto);
>>   tr.open();
>>   client.execute_cql_query(ByteBuffer.wrap(cql.getBytes()),
>> Compression.NONE);
>> 
>> 
>> 
>> 2012/11/19 Brian O'Neill 
>>> I don't think Michael and/or Jonathan have published the CQL java driver
>>> yet.  (CCing them)
>>> 
>>> Hopefully they'll find a public home for it soon, I hope to include it in
>>> the Webinar in December.
>>> (http://www.datastax.com/resources/webinars/collegecredit)
>>> 
>>> -brian
>>> 
>>> ---
>>> Brian O'Neill
>>> Lead Architect, Software Development
>>> Health Market Science
>>> The Science of Better Results
>>> 2700 Horizon Drive € King of Prussia, PA € 19406
>>> M: 215.588.6024   € @boneill42
>>> <http://www.twitter.com/boneill42>   €
>>> healthmarketscience.com
>>> 
>>> 
>>> This information transmitted in this email message is for the intended
>>> recipient only and may contain confidential and/or privileged material. If
>>> you received this email in error and are not the intended recipient, or the
>>> person responsible to deliver it to the intended recipient, please contact
>>> the sender at the email above and delete this email and any attachments and
>>> destroy any copies thereof. Any review, retransmission, dissemination,
>>> copying or other use of, or taking any action in reliance upon, this
>>> information by persons or entities other than the intended recipient is
>>> strictly prohibited.
>>>  
>>> 
>>> 
>>> From:  Tommi Laukkanen 
>>> Reply-To:  
>>> Date:  Monday, November 19, 2012 2:36 AM
>>> 
>>> To:  
>>> Subject:  Re: Datatype Conversion in CQL-Client?
>>> 
>>> I think Timmy might be referring to the upcoming native CQL Java driver that
>>> might be coming with 1.2 - It was mentioned here:
>>> http://www.datastax.com/wp-content/uploads/2012/08/7_Datastax_Upcoming_Chang
>>> es_in_Drivers.pdf
>>> 
>>> I would also be interested on testing that but I can't find it from
>>> repositories. Any hints?
>>> 
>>> Regards,
>>> Tommi L.
>>> 
>>> From: Brian O'Neill [mailto:boneil...@gmail.com] On Behalf Of Brian O'Neill
>>>> Sent: 18. marraskuuta 2012 17:47
>>>> To: user@cassandra.apache.org
>>>> Subject: Re: Datatype Conversion in CQL-Client?
>>>> Importance: Low
>>>>  
>>>> 
>>>>  
>>>> If you are talking about the CQL-client that comes with Cassandra (cqlsh),
>>>> it is actually written in Python:
>>>> 
>>>> https://github.com/apache/cassandra/blob/trunk/bin/cqlsh
>>>

Re: Datastax Java Driver

2012-11-19 Thread Brian O'Neill
Woohoo!

Thanks for making this available.

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Sylvain Lebresne 
Reply-To:  
Date:  Monday, November 19, 2012 1:50 PM
To:  "user@cassandra.apache.org" 
Subject:  Datastax Java Driver

Everyone,

We've just open-sourced a new Java driver we have been working on here at
DataStax. This driver is CQL3 only and is built to use the new binary
protocol
that will be introduced with Cassandra 1.2. It will thus only work with
Cassandra 1.2 onwards. Currently, it means that testing it requires
1.2.0-beta2. This is also alpha software at this point. You are welcome to
try
and play with it and we would very much welcome feedback, but be sure that
break, it will. The driver is accessible at:
  http://github.com/datastax/java-driver

Today we're open-sourcing the core part of this driver. This main goal of
this
core module is to handle connections to the Cassandra cluster with all the
features that one would expect. The currently supported features are:
  - Asynchronous: the driver uses the new CQL binary protocol asynchronous
capabilities.
  - Nodes discovery.
  - Configurable load balancing/routing.
  - Transparent fail-over.
  - C* tracing handling.
  - Convenient schema access.
  - Configurable retry policy.

This core module provides a simple low-level API (that works directly with
query strings). We plan to release a higher-level, thin object mapping API
based on top of this core shortly.

Please refer to the project README for more information.

--
The DataStax Team




Re: Datatype Conversion in CQL-Client?

2012-11-19 Thread Brian O'Neill

Gotcha Timmy.  That is the Thrift API.  You are operating at a pretty
low-level.   I'm not sure that is considered the "official" CQL client.
IMHO, you might be better off moving up a level.  I'd probably either wait
for the official CQL Java Driver, or access CQL via a higher-level client
like Hector.

If you stick with Thrift, I think you can access the Schema metadata:
https://github.com/apache/cassandra/blob/trunk/interface/thrift/gen-java/org
/apache/cassandra/thrift/CqlMetadata.java
(Those are the generated classes for the Thrift interface)

But I'm not sure where the code is to apply that metadata to the result set
in:
https://github.com/apache/cassandra/blob/trunk/interface/thrift/gen-java/org
/apache/cassandra/thrift/CqlResult.java

-brian

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Timmy Turner 
Reply-To:  
Date:  Monday, November 19, 2012 9:48 AM
To:  
Subject:  Re: Datatype Conversion in CQL-Client?

What I meant was the method that the Cassandra-jars give you when you
include them in your project:

  TTransport tr = new TFramedTransport(new TSocket("localhost", 9160));
  TProtocol proto = new TBinaryProtocol(tr);
  Cassandra.Client client = new Cassandra.Client(proto);
  tr.open();
  client.execute_cql_query(ByteBuffer.wrap(cql.getBytes()),
Compression.NONE);



2012/11/19 Brian O'Neill 
> I don't think Michael and/or Jonathan have published the CQL java driver yet.
> (CCing them)
> 
> Hopefully they'll find a public home for it soon, I hope to include it in the
> Webinar in December.
> (http://www.datastax.com/resources/webinars/collegecredit)
> 
> -brian
> 
> ---
> Brian O'Neill
> Lead Architect, Software Development
> Health Market Science
> The Science of Better Results
> 2700 Horizon Drive € King of Prussia, PA € 19406
> M: 215.588.6024   € @boneill42
> <http://www.twitter.com/boneill42>   €
> healthmarketscience.com
> 
> 
> This information transmitted in this email message is for the intended
> recipient only and may contain confidential and/or privileged material. If you
> received this email in error and are not the intended recipient, or the person
> responsible to deliver it to the intended recipient, please contact the sender
> at the email above and delete this email and any attachments and destroy any
> copies thereof. Any review, retransmission, dissemination, copying or other
> use of, or taking any action in reliance upon, this information by persons or
> entities other than the intended recipient is strictly prohibited.
>  
> 
> 
> From:  Tommi Laukkanen 
> Reply-To:  
> Date:  Monday, November 19, 2012 2:36 AM
> 
> To:  
> Subject:  Re: Datatype Conversion in CQL-Client?
> 
> I think Timmy might be referring to the upcoming native CQL Java driver that
> might be coming with 1.2 - It was mentioned here:
> http://www.datastax.com/wp-content/uploads/2012/08/7_Datastax_Upcoming_Changes
> _in_Drivers.pdf
> 
> I would also be interested on testing that but I can't find it from
> repositories. Any hints?
> 
> Regards,
> Tommi L.
> 
> From: Brian O'Neill [mailto:boneil...@gmail.com] On Behalf Of Brian O'Neill
>> Sent: 18. marraskuuta 2012 17:47
>> To: user@cassandra.apache.org
>> Subject: Re: Datatype Conversion in CQL-Client?
>> Importance: Low
>>  
>> 
>>  
>> If you are talking about the CQL-client that comes with Cassandra (cqlsh), it
>> is actually written in Python:
>> 
>> https://github.com/apache/cassandra/blob/trunk/bin/cqlsh
>> 
>>  
>> 
>> For information on datatypes (and conversion) take a look at the CQL
>> definition:
>> 
>> http://www.datastax.com/docs/1.0/references/cql/index
>> 
>> (Look at the CQL Data Types section)
>> 
>>  
>> 
>> If that's not the client you are referencing, let us know which one you mean:
>> 
>> http://brianoneill.blogspot.com/2012/08/cassandr

Re: Datatype Conversion in CQL-Client?

2012-11-19 Thread Brian O'Neill
I don't think Michael and/or Jonathan have published the CQL java driver
yet.  (CCing them)

Hopefully they'll find a public home for it soon, I hope to include it in
the Webinar in December.
(http://www.datastax.com/resources/webinars/collegecredit)

-brian

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Tommi Laukkanen 
Reply-To:  
Date:  Monday, November 19, 2012 2:36 AM
To:  
Subject:  Re: Datatype Conversion in CQL-Client?

I think Timmy might be referring to the upcoming native CQL Java driver that
might be coming with 1.2 - It was mentioned here:
http://www.datastax.com/wp-content/uploads/2012/08/7_Datastax_Upcoming_Chang
es_in_Drivers.pdf

I would also be interested on testing that but I can't find it from
repositories. Any hints?

Regards,
Tommi L.

From: Brian O'Neill [mailto:boneil...@gmail.com] On Behalf Of Brian O'Neill
> Sent: 18. marraskuuta 2012 17:47
> To: user@cassandra.apache.org
> Subject: Re: Datatype Conversion in CQL-Client?
> Importance: Low
>  
> 
>  
> If you are talking about the CQL-client that comes with Cassandra (cqlsh), it
> is actually written in Python:
> 
> https://github.com/apache/cassandra/blob/trunk/bin/cqlsh
> 
>  
> 
> For information on datatypes (and conversion) take a look at the CQL
> definition:
> 
> http://www.datastax.com/docs/1.0/references/cql/index
> 
> (Look at the CQL Data Types section)
> 
>  
> 
> If that's not the client you are referencing, let us know which one you mean:
> 
> http://brianoneill.blogspot.com/2012/08/cassandra-apis-laundry-list.html
> 
>  
> 
> -brian
> 
>  
> 
> On Nov 17, 2012, at 9:54 PM, Timmy Turner wrote:
> 
> 
>> 
>> Thanks for the links, however I'm interested in the functionality that the
>> official Cassandra client/API (which is in Java) offers.
>> 
>>  
>> 
>> 2012/11/17 aaron morton 
>>> 
>>>> Does the official/built-in Cassandra CQL client (in 1.2)
>>> What language ?
>>> 
>>>  
>>> 
>>> Check the Java http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/
>>> and python http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/
>>> drivers.
>>> 
>>>  
>>> 
>>> Cheers
>>> 
>>>  
>>> 
>>>  
>>> 
>>> -
>>> 
>>> Aaron Morton
>>> 
>>> Freelance Cassandra Developer
>>> 
>>> New Zealand
>>> 
>>>  
>>> 
>>> @aaronmorton
>>> 
>>> http://www.thelastpickle.com <http://www.thelastpickle.com/>
>>> 
>>>  
>>> 
>>> On 16/11/2012, at 11:21 AM, Timmy Turner  wrote:
>>> 
>>> 
>>>> 
>>>> Does the official/built-in Cassandra CQL client (in 1.2) offer any built-in
>>>> option to get direct values/objects when reading a field, instead of just a
>>>> byte array? 
>>>  
>>  
>  
> 
> -- 
> Brian ONeill
> Lead Architect, Health Market Science (http://healthmarketscience.com
> <http://healthmarketscience.com/> )
> mobile:215.588.6024 
> blog: http://weblogs.java.net/blog/boneill42/
> blog: http://brianoneill.blogspot.com/
>  





Re: Datatype Conversion in CQL-Client?

2012-11-18 Thread Brian O'Neill

If you are talking about the CQL-client that comes with Cassandra (cqlsh), it 
is actually written in Python:
https://github.com/apache/cassandra/blob/trunk/bin/cqlsh

For information on datatypes (and conversion) take a look at the CQL definition:
http://www.datastax.com/docs/1.0/references/cql/index
(Look at the CQL Data Types section)

If that's not the client you are referencing, let us know which one you mean:
http://brianoneill.blogspot.com/2012/08/cassandra-apis-laundry-list.html

-brian

On Nov 17, 2012, at 9:54 PM, Timmy Turner wrote:

> Thanks for the links, however I'm interested in the functionality that the 
> official Cassandra client/API (which is in Java) offers.
> 
> 
> 2012/11/17 aaron morton 
>> Does the official/built-in Cassandra CQL client (in 1.2) 
> What language ? 
> 
> Check the Java http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/ 
> and python http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/ 
> drivers.
> 
> Cheers
> 
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 16/11/2012, at 11:21 AM, Timmy Turner  wrote:
> 
>> Does the official/built-in Cassandra CQL client (in 1.2) offer any built-in 
>> option to get direct values/objects when reading a field, instead of just a 
>> byte array?
> 
> 

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



Re: [BETA RELEASE] Apache Cassandra 1.2.0-beta2 released

2012-11-10 Thread Brian O'Neill

Wow...good catch.

We had puppet scripts which automatically assigned the proper tokens given the 
cluster size.
What is the range now?  Got a link?

-brian

On Nov 10, 2012, at 9:27 PM, Edward Capriolo wrote:

> just a note for all. The default partitioner is no longer randompartitioner. 
> It is now murmur, and the token range starts in negative numbers. So you 
> don't chose tokens Luke your father taught you anymore.
> 
> On Friday, November 9, 2012, Sylvain Lebresne  wrote:
> > The Cassandra team is pleased to announce the release of the second beta for
> > the future Apache Cassandra 1.2.0.
> > Let me first stress that this is beta software and as such is *not* ready 
> > for
> > production use.
> > This release is still beta so is likely not bug free. However, lots have 
> > been
> > fixed since beta1 and if everything goes right, we are hopeful that a first
> > release candidate may follow shortly. Please do help testing this beta to 
> > help
> > make that happen. If you encounter any problem during your testing, please
> > report[3,4] them. And be sure to a look at the change log[1] and the release
> > notes[2] to see where Cassandra 1.2 differs from the previous series.
> > Apache Cassandra 1.2.0-beta2[5] is available as usual from the cassandra
> > website (http://cassandra.apache.org/download/) and a debian package is
> > available using the 12x branch (see 
> > http://wiki.apache.org/cassandra/DebianPackaging).
> > Thank you for your help in testing and have fun with it.
> > [1]: http://goo.gl/wnDAV (CHANGES.txt)
> > [2]: http://goo.gl/CBsqs (NEWS.txt)
> > [3]: https://issues.apache.org/jira/browse/CASSANDRA
> > [4]: user@cassandra.apache.org
> > [5]: 
> > http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-1.2.0-beta2
> >

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



Indexing Data in Cassandra with Elastic Search

2012-11-08 Thread Brian O'Neill
For those looking to index data in Cassandra with Elastic Search, here
is what we decided to do:
http://brianoneill.blogspot.com/2012/11/big-data-quadfecta-cassandra-storm.html

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: logging servers? any interesting in one for cassandra?

2012-11-07 Thread Brian O'Neill

Thanks Dean.  We'll definitely take a look.  (probably in January)

-brian

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive • King of Prussia, PA • 19406
M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42>  •
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 11/6/12 11:19 AM, "Hiller, Dean"  wrote:

>Sure, in our playing around, we have an awesome log back configuration for
>development time only that shows warning, severe in red in eclipse and
>let's you click on every single log taking you right to the code that
>logged it…(thought you might enjoy it)...
>
>https://github.com/deanhiller/playorm/blob/master/input/javasrc/logback.xm
>l
>
>
>The java appender is here(called CassandraAppender)
>https://github.com/deanhiller/playorm/tree/master/input/javasrc/com/alvaza
>n
>/play/logging
>
>
>The AsyncAppender there is different then log backs in that it allows
>bursting but once reaches the limit, it essentially becomes synchronous
>again which allows us to not drop logs like log backs and allow for bursts
>of performance
>
>The CircularBufferAppender is an inmemory buffer that flushes all logs X
>level and above to child appender when a warning or severe happens where X
>is configurable.  
>
>We have only tested out the CassandraAppender at this point.  Right now
>you have to call CassandraAppender.setFactory to set the
>NoSqlEntityManager factory to set it.  It creates a LogEvent rows as well
>as an index on the session and partitions by the first two characters of
>the web session id so there is an index per partition.  This allows us to
>the look at a single web session of a user.  The only thing I don't like
>is we have to do a read when updating the index to be able to delete old
>values in the index(ick), but I couldn't figure any other way around that.
>
>Also, if you have high event rates, there is a MDCLevelFilter so you can
>tag the MDC with something like user=__program__ and ignore all logs for
>him unless they are warning logs which we use to limit the logs from just
>being huge.
>
>Later,
>Dean
>
>
>On 11/6/12 6:32 AM, "Brian O'Neill"  wrote:
>
>>Nice DeanŠ
>>
>>I'm not so sure we would run the server, but we'd definitely be
>>interested
>>in the logback adaptor.
>>(We would then just access the data via Virgil (over REST), with a thin
>>javascript UI)
>>
>>Let me/us know if you end up putting it out there.  We intend centralize
>>logging sometime over the next few months.
>>
>>-brian
>>
>>---
>>Brian O'Neill
>>Lead Architect, Software Development
>>Health Market Science
>>The Science of Better Results
>>2700 Horizon Drive € King of Prussia, PA € 19406
>>M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
>>healthmarketscience.com
>>
>>This information transmitted in this email message is for the intended
>>recipient only and may contain confidential and/or privileged material.
>>If
>>you received this email in error and are not the intended recipient, or
>>the person responsible to deliver it to the intended recipient, please
>>contact the sender at the email above and delete this email and any
>>attachments and destroy any copies thereof. Any review, retransmission,
>>dissemination, copying or other use of, or taking any action in reliance
>>upon, this information by persons or entities other than the intended
>>recipient is strictly prohibited.
>> 
>>
>>
>>
>>
>>
>>
>>On 11/1/12 10:33 AM, "Hiller, Dean"  wrote:
>>
>>>2 questions
>>>
>>> 1.  What are people using for logging servers for their web tier
>>>logging?
>>> 2.  Would anyone be interested in a new logging server(any programming
>>>language) for web tier to log to your existing cassandra(it uses up disk
>>>space in proportion to number of web servers and just has a rolling
>>>window of logs along with a window of thresho

Re: logging servers? any interesting in one for cassandra?

2012-11-06 Thread Brian O'Neill
Nice DeanŠ

I'm not so sure we would run the server, but we'd definitely be interested
in the logback adaptor.
(We would then just access the data via Virgil (over REST), with a thin
javascript UI)

Let me/us know if you end up putting it out there.  We intend centralize
logging sometime over the next few months.

-brian

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 11/1/12 10:33 AM, "Hiller, Dean"  wrote:

>2 questions
>
> 1.  What are people using for logging servers for their web tier logging?
> 2.  Would anyone be interested in a new logging server(any programming
>language) for web tier to log to your existing cassandra(it uses up disk
>space in proportion to number of web servers and just has a rolling
>window of logs along with a window of threshold dumps)?
>
>Context for second question: I like less systems since it is less
>maintenance/operations cost and so yesterday I quickly wrote up some log
>back appenders which support (SLF4J/log4j/jdk/commons libraries) and send
>the logs from our client tier into cassandra.  It is simply a rolling
>window of logs so the space used in cassandra is proportional to the
>amount of web  servers I have(currently, I have 4 web servers).  I am
>also thinking about adding warning type logging such that on warning, the
>last N logs info and above are flushed along with the warning so
>basically two rolling windows.  Then in the GUI, it simply shows the logs
>and if you click on a session, it switches to a view with all the logs
>for that session(no matter which server since in our cluster the session
>switches servers on every request since we are statelessŠ.our session id
>is in the cookie).
>
>Well, let me know if anyone is interested and would actually use such a
>thing and if so, we might create a server around it.
>
>Thanks,
>Dean




Keeping the record straight for Cassandra Benchmarks...

2012-10-25 Thread Brian O'Neill
People probably saw...
http://www.networkworld.com/cgi-bin/mailto/x.cgi?pagetosend=/news/tech/2012/102212-nosql-263595.html

To clarify things take a look at...
http://brianoneill.blogspot.com/2012/10/solid-nosql-benchmarks-from-ycsb-w-side.html

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: Using compound primary key

2012-10-08 Thread Brian O'Neill
Hey Vivek,

The same thing happened to me the other day.  You may be missing a component in 
your compound key.

See this thread:
http://mail-archives.apache.org/mod_mbox/cassandra-dev/201210.mbox/%3ccajhhpg20rrcajqjdnf8sf7wnhblo6j+aofksgbxyxwcoocg...@mail.gmail.com%3E

I also wrote a couple blogs on it:
http://brianoneill.blogspot.com/2012/09/composite-keys-connecting-dots-between.html
http://brianoneill.blogspot.com/2012/10/cql-astyanax-and-compoundcomposite-keys.html

They've fixed this in the 1.2 beta, whereby it checks (at the thrift layer) to 
ensure you have the requisite number of components in the compound/composite 
key.

-brian


On Oct 8, 2012, at 10:32 PM, Vivek Mishra wrote:

> Certainly. As these are available with cql3 only! 
> Example mentioned on datastax website is working fine, only difference is i 
> tried with a compound primary key with 3 composite columns in place of 2
> 
> -Vivek
> 
> On Tue, Oct 9, 2012 at 7:57 AM, Arindam Barua  wrote:
>  
> 
> Did you use the “--cql3” option with the cqlsh command?
> 
>  
> 
> From: Vivek Mishra [mailto:mishra.v...@gmail.com] 
> Sent: Monday, October 08, 2012 7:22 PM
> To: user@cassandra.apache.org
> 
> 
> Subject: Using compound primary key
> 
>  
> 
> Hi,
> 
>  
> 
> I am trying to use compound primary key column name and i am referring to:
> 
> http://www.datastax.com/dev/blog/whats-new-in-cql-3-0
> 
>  
> 
> As mentioned on this example, i tried to create a column family containing 
> compound primary key (one or more) as:
> 
>  
> 
>  CREATE TABLE altercations (
> 
>instigator text,
> 
>started_at timestamp,
> 
>ships_destroyed int,
> 
>energy_used float,
> 
>alliance_involvement boolean,
> 
>PRIMARY KEY (instigator,started_at,ships_destroyed)
> 
>);
> 
>  
> 
> And i am getting:
> 
>  
> 
> **
> 
> TSocket read 0 bytes
> 
> cqlsh:testcomp> 
> 
> **
> 
>  
> 
>  
> 
> Then followed by insert and select statements giving me following errors:
> 
>  
> 
> 
> 
>  
> 
> cqlsh:testcomp>INSERT INTO altercations (instigator, started_at, 
> ships_destroyed,
> 
> ...  energy_used, 
> alliance_involvement)
> 
> ...  VALUES ('Jayne Cobb', '2012-07-23', 2, 
> 4.6, 'false');
> 
> TSocket read 0 bytes
> 
>  
> 
> cqlsh:testcomp> select * from altercations;
> 
> Traceback (most recent call last):
> 
>   File "bin/cqlsh", line 1008, in perform_statement
> 
> self.cursor.execute(statement, decoder=decoder)
> 
>   File "bin/../lib/cql-internal-only-1.0.10.zip/cql-1.0.10/cql/cursor.py", 
> line 117, in execute
> 
> response = self.handle_cql_execution_errors(doquery, prepared_q, compress)
> 
>   File "bin/../lib/cql-internal-only-1.0.10.zip/cql-1.0.10/cql/cursor.py", 
> line 132, in handle_cql_execution_errors
> 
> return executor(*args, **kwargs)
> 
>   File 
> "bin/../lib/cql-internal-only-1.0.10.zip/cql-1.0.10/cql/cassandra/Cassandra.py",
>  line 1583, in execute_cql_query
> 
> self.send_execute_cql_query(query, compression)
> 
>   File 
> "bin/../lib/cql-internal-only-1.0.10.zip/cql-1.0.10/cql/cassandra/Cassandra.py",
>  line 1593, in send_execute_cql_query
> 
> self._oprot.trans.flush()
> 
>   File 
> "bin/../lib/thrift-python-internal-only-0.7.0.zip/thrift/transport/TTransport.py",
>  line 293, in flush
> 
> self.__trans.write(buf)
> 
>   File 
> "bin/../lib/thrift-python-internal-only-0.7.0.zip/thrift/transport/TSocket.py",
>  line 117, in write
> 
> plus = self.handle.send(buff)
> 
> error: [Errno 32] Broken pipe
> 
>  
> 
> cqlsh:testcomp> 
> 
>  
> 
> 
> 
>  
> 
>  
> 
>  
> 
> Any idea?  Is it a problem with CQL3 or with cassandra?
> 
>  
> 
> P.S: I did post same query on dev group as well to get a quick response.
> 
>  
> 
>  
> 
> -Vivek
> 
> 

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



Re: 1000's of column families

2012-10-02 Thread Brian O'Neill
Exactly.

---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 10/2/12 9:55 AM, "Ben Hood" <0x6e6...@gmail.com> wrote:

>Brian,
>
>On Tue, Oct 2, 2012 at 2:20 PM, Brian O'Neill  wrote:
>>
>> Without putting too much thought into it...
>>
>> Given the underlying architecture, I think you could/would have to write
>> your own partitioner, which would partition based on the prefix/virtual
>> keyspace.
>
>I might be barking up the wrong tree here, but looking at source of
>ColumnFamilyInputFormat, it seems that you can specify a KeyRange for
>the input, but only when you use an order preserving partitioner. So I
>presume that if you are using the RandomPartitioner, you are
>effectively doing a full CF scan (i.e. including all tenants in your
>system).
>
>Ben




Re: 1000's of CF's. virtual CFs possible Map/Reduce SOLUTION...

2012-10-02 Thread Brian O'Neill

Dean,

We moved away from Hadoop and M/R, and instead we are using Storm as our
compute grid.  We queue keys in Kafka, then Storm distributes the work to
the grid.  Its working well so far, but we haven't taken it to prod yet.
Data is read from Cassandra using a Cassandra-bolt.

If you end up using Storm, let me know.  We have an unreleased version of
the bolt that you probably want to use.  (we're waiting on Nathan/Storm to
fix some classpath loading issues)

RE: a customer virtual keyspace Partitioner, point well taken

-brian

---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive ? King of Prussia, PA ? 19406
M: 215.588.6024 ? @boneill42 <http://www.twitter.com/boneill42>  ?
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 10/2/12 9:33 AM, "Hiller, Dean"  wrote:

>Well, I think I know the direction we may follow so we can
>1. Have Virtual CF's
>2. Be able to map/reduce ONE Virtual CF
>
>Well, not map/reduce exactly but really really close.  We use PlayOrm with
>it's partitioning so I am now thinking what we will do is have a compute
>grid  where we can have each node doing a findAll query into the
>partitions it is responsible for.  In this way, I think we can 1000's of
>virtual CF's inside ONE CF and then PlayOrm does it's query and retrieves
>the rows for that partition of one virtual CF.
>
>Anyone know of a computer grid we can dish out work to?  That would be my
>only missing piece (well, that and the PlayOrm virtual CF feature but I
>can add that within a week probably though I am on vacation this Thursday
>to monday).
>
>Later,
>Dean
>
>
>On 10/2/12 6:35 AM, "Hiller, Dean"  wrote:
>
>>So basically, with moving towards the 1000's of CF all being put in one
>>CF, our performance is going to tank on map/reduce, correct?  I mean,
>>from
>>what I remember we could do map/reduce on a single CF, but by stuffing
>>1000's of virtual Cf's into one CF, our map/reduce will have to read in
>>all 999 virtual CF's rows that we don't want just to map/reduce the ONE
>>CF.
>>
>>Map/reduce VERY VERY SLOW when reading in 1000 times more rows :( :(.
>>
>>Is this correct?  This really sounds like highly undesirable behavior.
>>There needs to be a way for people with 1000's of CF's to also run
>>map/reduce on any one CF.  Doing Map/reduce on 1000 times the number of
>>rows will be 1000 times slowerŠ.and of course, we will most likely get up
>>to 20,000 tables from my most recent projectionsŠ.our last test load, we
>>ended up with 8k+ CF's.  Since I kept two other keyspaces, cassandra
>>started getting really REALLY slow when we got up to 15k+ CF's in the
>>system though I didn't look into why.
>>
>>I don't mind having 1000's of virtual CF's in ONE CF, BUT I need to
>>map/reduce "just" the virtual CF!  Ugh.
>>
>>Thanks,
>>Dean
>>
>>On 10/1/12 3:38 PM, "Ben Hood" <0x6e6...@gmail.com> wrote:
>>
>>>On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill 
>>>wrote:
>>>> Its just a convenient way of prefixing:
>>>> 
>>>>http://hector-client.github.com/hector/build/html/content/virtual_keysp
>>>>a
>>>>c
>>>>es.html
>>>
>>>So given that it is possible to use a CF per tenant, should we assume
>>>that there at sufficient scale that there is less overhead to prefix
>>>keys than there is to manage multiple CFs?
>>>
>>>Ben
>>
>




Re: 1000's of column families

2012-10-02 Thread Brian O'Neill

Agreed. 

Do we know yet what the overhead is for each column family?  What is the
limit?
If you have a SINGLE keyspace w/ 2+ CF's, what happens?  Anyone know?

-brian


---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive • King of Prussia, PA • 19406
M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42>  •
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 10/2/12 9:28 AM, "Hiller, Dean"  wrote:

>Thanks for the idea but…(but please keep thinking on it)...
>
>100% what we don't want since partitioned data resides on the same node.
>I want to map/reduce the column families and leverage the parallel disks
>
>:( :(
>
>I am sure others would want to do the same…..We almost need a feature of
>virtual Column Families and column family should really not be column
>family but should be called ReplicationGroup or something where
>replication is configured for all CF's in that group.
>
>ANYONE have any other ideas???
>
>Dean
>
>On 10/2/12 7:20 AM, "Brian O'Neill"  wrote:
>
>>
>>Without putting too much thought into it...
>>
>>Given the underlying architecture, I think you could/would have to write
>>your own partitioner, which would partition based on the prefix/virtual
>>keyspace.  
>>
>>-brian
>>
>>---
>>Brian O'Neill
>>Lead Architect, Software Development
>> 
>>Health Market Science
>>The Science of Better Results
>>2700 Horizon Drive € King of Prussia, PA € 19406
>>M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
>>healthmarketscience.com
>>
>>This information transmitted in this email message is for the intended
>>recipient only and may contain confidential and/or privileged material.
>>If
>>you received this email in error and are not the intended recipient, or
>>the person responsible to deliver it to the intended recipient, please
>>contact the sender at the email above and delete this email and any
>>attachments and destroy any copies thereof. Any review, retransmission,
>>dissemination, copying or other use of, or taking any action in reliance
>>upon, this information by persons or entities other than the intended
>>recipient is strictly prohibited.
>> 
>>
>>
>>
>>
>>
>>
>>On 10/2/12 9:00 AM, "Ben Hood" <0x6e6...@gmail.com> wrote:
>>
>>>Dean,
>>>
>>>On Tue, Oct 2, 2012 at 1:37 PM, Hiller, Dean 
>>>wrote:
>>>> Ben,
>>>>   to address your question, read my last post but to summarize, yes,
>>>>there
>>>> is less overhead in memory to prefix keys than manage multiple Cfs
>>>>EXCEPT
>>>> when doing map/reduce.  Doing map/reduce, you will now have HUGE
>>>>overhead
>>>> in reading a whole slew of rows you don't care about as you can't
>>>> map/reduce a single virtual CF but must map/reduce the whole CF
>>>>wasting
>>>> TONS of resources.
>>>
>>>That's a good point that I hadn't considered beforehand, especially as
>>>I'd like to run MR jobs against these CFs.
>>>
>>>Is this limitation inherent in the way that Cassandra is modelled as
>>>input for Hadoop or could you write a custom slice query to only feed
>>>in one particular prefix into Hadoop?
>>>
>>>Cheers,
>>>
>>>Ben
>>
>>
>




Re: 1000's of CF's. virtual CFs do NOT workŠ..map/reduce

2012-10-02 Thread Brian O'Neill
Dean,

Great point.  I hadn't considered that either.  Per my other email, think
we would need a custom partitioner for this? (a mix of
OrderPreservingPartitioner and RandomPartitioner, OPP for the prefix)

-brian

---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive ? King of Prussia, PA ? 19406
M: 215.588.6024 ? @boneill42 <http://www.twitter.com/boneill42>  ?
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 10/2/12 8:35 AM, "Hiller, Dean"  wrote:

>So basically, with moving towards the 1000's of CF all being put in one
>CF, our performance is going to tank on map/reduce, correct?  I mean, from
>what I remember we could do map/reduce on a single CF, but by stuffing
>1000's of virtual Cf's into one CF, our map/reduce will have to read in
>all 999 virtual CF's rows that we don't want just to map/reduce the ONE
>CF.
>
>Map/reduce VERY VERY SLOW when reading in 1000 times more rows :( :(.
>
>Is this correct?  This really sounds like highly undesirable behavior.
>There needs to be a way for people with 1000's of CF's to also run
>map/reduce on any one CF.  Doing Map/reduce on 1000 times the number of
>rows will be 1000 times slowerŠ.and of course, we will most likely get up
>to 20,000 tables from my most recent projectionsŠ.our last test load, we
>ended up with 8k+ CF's.  Since I kept two other keyspaces, cassandra
>started getting really REALLY slow when we got up to 15k+ CF's in the
>system though I didn't look into why.
>
>I don't mind having 1000's of virtual CF's in ONE CF, BUT I need to
>map/reduce "just" the virtual CF!  Ugh.
>
>Thanks,
>Dean
>
>On 10/1/12 3:38 PM, "Ben Hood" <0x6e6...@gmail.com> wrote:
>
>>On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill 
>>wrote:
>>> Its just a convenient way of prefixing:
>>> 
>>>http://hector-client.github.com/hector/build/html/content/virtual_keyspa
>>>c
>>>es.html
>>
>>So given that it is possible to use a CF per tenant, should we assume
>>that there at sufficient scale that there is less overhead to prefix
>>keys than there is to manage multiple CFs?
>>
>>Ben
>




Re: 1000's of column families

2012-10-02 Thread Brian O'Neill

Without putting too much thought into it...

Given the underlying architecture, I think you could/would have to write
your own partitioner, which would partition based on the prefix/virtual
keyspace.  

-brian

---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 10/2/12 9:00 AM, "Ben Hood" <0x6e6...@gmail.com> wrote:

>Dean,
>
>On Tue, Oct 2, 2012 at 1:37 PM, Hiller, Dean  wrote:
>> Ben,
>>   to address your question, read my last post but to summarize, yes,
>>there
>> is less overhead in memory to prefix keys than manage multiple Cfs
>>EXCEPT
>> when doing map/reduce.  Doing map/reduce, you will now have HUGE
>>overhead
>> in reading a whole slew of rows you don't care about as you can't
>> map/reduce a single virtual CF but must map/reduce the whole CF wasting
>> TONS of resources.
>
>That's a good point that I hadn't considered beforehand, especially as
>I'd like to run MR jobs against these CFs.
>
>Is this limitation inherent in the way that Cassandra is modelled as
>input for Hadoop or could you write a custom slice query to only feed
>in one particular prefix into Hadoop?
>
>Cheers,
>
>Ben




Re: 1000's of column families

2012-10-01 Thread Brian O'Neill
Its just a convenient way of prefixing:
http://hector-client.github.com/hector/build/html/content/virtual_keyspaces.html

-brian

On Mon, Oct 1, 2012 at 4:22 PM, Ben Hood <0x6e6...@gmail.com> wrote:
> Brian,
>
> On Mon, Oct 1, 2012 at 4:22 PM, Brian O'Neill  wrote:
>> We haven't committed either way yet, but given Ed Anuff's presentation
>> on virtual keyspaces, we were leaning towards a single column family
>> approach:
>> http://blog.apigee.com/detail/building_a_mobile_data_platform_with_cassandra_-_apigee_under_the_hood/?
>
> Is this doing something special or is this just a convenience way of
> prefixing keys to make the storage space multi-tenanted?
>
> Cheers,
>
> Ben



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)

mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: 1000's of column families

2012-10-01 Thread Brian O'Neill
Dean,

We have the same question...

We have thousands of separate feeds of data as well (20,000+).  To
date, we've been using a CF per feed strategy, but as we scale this
thing out to accommodate all of those feeds, we're trying to figure
out if we're going to blow out the memory.

The initial documentation for heap sizing had column families in the equation:
http://www.datastax.com/docs/0.7/operations/tuning#heap-sizing

But in the more recent documentation, it looks like they removed the
column family variable with the introduction of the universal
key_cache_size.
http://www.datastax.com/docs/1.0/operations/tuning#tuning-java-heap-size

We haven't committed either way yet, but given Ed Anuff's presentation
on virtual keyspaces, we were leaning towards a single column family
approach:
http://blog.apigee.com/detail/building_a_mobile_data_platform_with_cassandra_-_apigee_under_the_hood/?

Definitely let us know what you decide.

-brian

On Fri, Sep 28, 2012 at 11:48 AM, Flavio Baronti
 wrote:
> We had some serious trouble with dynamically adding CFs, although last time
> we tried we were using version 0.7, so maybe
> that's not an issue any more.
> Our problems were two:
> - You are (were?) not supposed to add CFs concurrently. Since we had more
> servers talking to the same Cassandra cluster,
> we had to use distributed locks (Hazelcast) to avoid concurrency.
> - You must be very careful to add new CFs to different Cassandra nodes. If
> you do that fast enough, and the clocks of
> the two servers are skewed, you will severely compromise your schema
> (Cassandra will not understand in which order the
> updates must be applied).
>
> As I said, this applied to version 0.7, maybe current versions solved these
> problems.
>
> Flavio
>
>
> Il 2012/09/27 16:11 PM, Hiller, Dean ha scritto:
>> We have 1000's of different building devices and we stream data from these
> devices.  The format and data from each one varies so one device has 
> temperature
> at timeX with some other variables, another device has CO2 percentage and 
> other
> variables.  Every device is unique and streams it's own data.  We dynamically
> discover devices and register them.  Basically, one CF or table per thing 
> really
> makes sense in this environment.  While we could try to find out which devices
> "are" similar, this would really be a pain and some devices add some new
> variable into the equation.  NOT only that but researchers can register new
> datasets and upload them as well and each dataset they have they do NOT want 
> to
> share with other researches necessarily so we have security groups and each CF
> belongs to security groups.  We dynamically create CF's on the fly as people
> register new datasets.
>>
>> On top of that, when the data sets get too large, we probably want to
> partition a single CF into time partitions.  We could create one CF and put 
> all
> the data and have a partition per device, but then a time partition will 
> contain
> "multiple" devices of data meaning we need to shrink our time partition size
> where if we have CF per device, the time partition can be larger as it is only
> for that one device.
>>
>> THEN, on top of that, we have a meta CF for these devices so some people want
> to query for streams that match criteria AND which returns a CF name and they
> query that CF name so we almost need a query with variables like select cfName
> from Meta where x = y and then select * from cfName where x. Which we can 
> do
> today.
>>
>> Dean
>>
>> From: Marcelo Elias Del Valle mailto:mvall...@gmail.com>>
>> Reply-To: "user@cassandra.apache.org"
> mailto:user@cassandra.apache.org>>
>> Date: Thursday, September 27, 2012 8:01 AM
>> To: "user@cassandra.apache.org"
> mailto:user@cassandra.apache.org>>
>> Subject: Re: 1000's of column families
>>
>> Out of curiosity, is it really necessary to have that amount of CFs?
>> I am probably still used to relational databases, where you would use a new
> table just in case you need to store different kinds of data. As Cassandra
> stores anything in each CF, it might probably make sense to have a lot of CFs 
> to
> store your data...
>> But why wouldn't you use a single CF with partitions in these case? Wouldn't
> it be the same thing? I am asking because I might learn a new modeling 
> technique
> with the answer.
>>
>> []s
>>
>> 2012/9/26 Hiller, Dean mailto:dean.hil...@nrel.gov>>
>> We are streaming data with 1 stream per 1 CF and we have 1000's of CF.  When
> using the tools they are all geared to analyzing ONE column family at a time 
> :(.
> If I remember correctly, Cassandra supports as many CF's as you want, correct?
> Even though I am going to have tons of funs with limitations on the tools,
> correct?
>>
>> (I may end up wrapping the node tool with my own aggregate calls if needed to
> sum up multiple column families and such).
>>
>> Thanks,
>> Dean
>>
>>
>>
>> --
>> Marcelo Elias Del V

Re: Kundera 2.1 released

2012-09-21 Thread Brian O'Neill

Well done, Vivek and team!!  This release was much anticipated.

I'll give this a test with Spring Data JPA when I return from vacation.

thanks,
-brian


On Sep 21, 2012, at 9:15 PM, Vivek Mishra wrote:

> Hi All,
> 
> We are happy to announce release of Kundera 2.0.7.
> 
> Kundera is a JPA 2.0 based, object-datastore papping library for NoSQL 
> datastores. The idea behind Kundera is to make working with NoSQL Databases
> drop-dead simple and fun. It currently supports Cassandra, HBase, MongoDB and 
> relational databases.
> 
> Major Changes in this release:
> ---
> * Allow user to set specific CQL versioning.
> 
> * Batch insert/update for Cassandra/MongoDB/HBase.
> 
> * Extended JPA Metamodel/TypedQuery/ProviderUtil implementation.
> 
> * Another Thrift client implementation for Cassandra.
> 
> * Deprecated support for properties with XML based Column family/Table/server 
> specific property configuration for Cassandra, MongoDB and HBase.
> 
> * Stronger query support:
>  a) JPQL support over all data types and associations.
>  b) JPQL support to query using primary key alongwith other columns.
> 
>  * Fixed github issues:
> 
>https://github.com/impetus-opensource/Kundera/issues/90
>https://github.com/impetus-opensource/Kundera/issues/91
>https://github.com/impetus-opensource/Kundera/issues/92
>https://github.com/impetus-opensource/Kundera/issues/93
>https://github.com/impetus-opensource/Kundera/issues/94
>https://github.com/impetus-opensource/Kundera/issues/96
>https://github.com/impetus-opensource/Kundera/issues/98
>https://github.com/impetus-opensource/Kundera/issues/99
>https://github.com/impetus-opensource/Kundera/issues/100
>https://github.com/impetus-opensource/Kundera/issues/101
>https://github.com/impetus-opensource/Kundera/issues/102
>https://github.com/impetus-opensource/Kundera/issues/104
>https://github.com/impetus-opensource/Kundera/issues/106
>https://github.com/impetus-opensource/Kundera/issues/107 
>https://github.com/impetus-opensource/Kundera/issues/108
>https://github.com/impetus-opensource/Kundera/issues/109
>https://github.com/impetus-opensource/Kundera/issues/111
>https://github.com/impetus-opensource/Kundera/issues/112   
>https://github.com/impetus-opensource/Kundera/issues/116
> 
> 
> To download, use or contribute to Kundera, visit:
> http://github.com/impetus-opensource/Kundera
> 
> Latest released tag version is 2.1. Kundera maven libraries are now available 
> at: https://oss.sonatype.org/content/repositories/releases/com/impetus and 
> http://kundera.googlecode.com/svn/maven2/maven-missing-resources.
> 
> Sample codes and examples for using Kundera can be found here:
> http://github.com/impetus-opensource/Kundera-Examples
> and 
> https://github.com/impetus-opensource/Kundera/tree/trunk/kundera-tests
> 
> Thank you all for your contributions!
> 
> Regards,
> Kundera Team.

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



Re: Using the commit log for external synchronization

2012-09-21 Thread Brian O'Neill
> IMHO it's a better design to multiplex the data stream at the application
> level.
+1, agreed.

That is where we ended up. (and Storm is proving to be a solid
framework for that)

-brian

On Fri, Sep 21, 2012 at 4:56 AM, aaron morton  wrote:
> The commit log is essentially internal implementation. The total size of the
> commit log is restricted, and the multiple files used to represent segments
> are recycled. So once all the memtables have been flushed for segment it may
> be overwritten.
>
> To archive the segments see the conf/commitlog_archiving.properties file.
>
> Large rows will bypass the commit log.
>
> A write commited to the commit log may still be considered a failure if CL
> nodes do not succeed.
>
> IMHO it's a better design to multiplex the data stream at the application
> level.
>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 21/09/2012, at 11:51 AM, Brian O'Neill  wrote:
>
>
> Along those lines...
>
> We sought to use triggers for external synchronization.   If you read
> through this issue:
> https://issues.apache.org/jira/browse/CASSANDRA-1311
>
> You'll see the idea of leveraging a commit log for synchronization, via
> triggers.
>
> We went ahead and implemented this concept in:
> https://github.com/hmsonline/cassandra-triggers
>
> With that, via AOP, you get handed the mutation as things change.  We used
> it for synchronizing SOLR.
>
> fwiw,
> -brian
>
>
>
> On Sep 20, 2012, at 7:18 PM, Michael Kjellman wrote:
>
> +1. Would be a pretty cool feature
>
> Right now I write once to cassandra and once to kafka.
>
> On 9/20/12 4:13 PM, "Data Craftsman 木匠" 
> wrote:
>
> This will be a good new feature. I guess the development team don't
>
> have time on this yet.  ;)
>
>
>
> On Thu, Sep 20, 2012 at 1:29 PM, Ben Hood <0x6e6...@gmail.com> wrote:
>
> Hi,
>
>
> I'd like to incrementally synchronize data written to Cassandra into
>
> an external store without having to maintain an index to do this, so I
>
> was wondering whether anybody is using the commit log to establish
>
> what updates have taken place since a given point in time?
>
>
> Cheers,
>
>
> Ben
>
>
>
>
> --
>
> Thanks,
>
>
> Charlie (@mujiang) 木匠
>
> ===
>
> Data Architect Developer 汉唐 田园牧歌DBA
>
> http://mujiang.blogspot.com
>
>
>
> 'Like' us on Facebook for exclusive content and other resources on all
> Barracuda Networks solutions.
> Visit http://barracudanetworks.com/facebook
>
>
>
> --
> Brian ONeill
> Lead Architect, Health Market Science (http://healthmarketscience.com)
> mobile:215.588.6024
> blog: http://weblogs.java.net/blog/boneill42/
> blog: http://brianoneill.blogspot.com/
>
>



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
Apache Cassandra MVP
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: Using the commit log for external synchronization

2012-09-20 Thread Brian O'Neill

Along those lines...

We sought to use triggers for external synchronization.   If you read through 
this issue:
https://issues.apache.org/jira/browse/CASSANDRA-1311

You'll see the idea of leveraging a commit log for synchronization, via 
triggers.

We went ahead and implemented this concept in:
https://github.com/hmsonline/cassandra-triggers

With that, via AOP, you get handed the mutation as things change.  We used it 
for synchronizing SOLR.  

fwiw,
-brian



On Sep 20, 2012, at 7:18 PM, Michael Kjellman wrote:

> +1. Would be a pretty cool feature
> 
> Right now I write once to cassandra and once to kafka.
> 
> On 9/20/12 4:13 PM, "Data Craftsman 木匠" 
> wrote:
> 
>> This will be a good new feature. I guess the development team don't
>> have time on this yet.  ;)
>> 
>> 
>> On Thu, Sep 20, 2012 at 1:29 PM, Ben Hood <0x6e6...@gmail.com> wrote:
>>> Hi,
>>> 
>>> I'd like to incrementally synchronize data written to Cassandra into
>>> an external store without having to maintain an index to do this, so I
>>> was wondering whether anybody is using the commit log to establish
>>> what updates have taken place since a given point in time?
>>> 
>>> Cheers,
>>> 
>>> Ben
>> 
>> 
>> 
>> -- 
>> Thanks,
>> 
>> Charlie (@mujiang) 木匠
>> ===
>> Data Architect Developer 汉唐 田园牧歌DBA
>> http://mujiang.blogspot.com
> 
> 
> 'Like' us on Facebook for exclusive content and other resources on all 
> Barracuda Networks solutions.
> Visit http://barracudanetworks.com/facebook
> 
> 

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



Re: Solr Use Cases

2012-09-19 Thread Brian O'Neill
Roshni,

We're using SOLR to support ad hoc queries and fuzzy searches against
unstructured data stored in Cassandra.  Cassandra is great for storage
and you can create data models and indexes that support your queries,
provided you can anticipate those queries.  When you can't anticipate
the queries, or if you need to support a large permutation of
multi-dimensional queries, your probably better off using an index
like SOLR.

Since SOLR only supports a flat document structure, you may need to
perform transformation before inserting into SOLR.  We chose not to
use DSE, so we used a cassandra-triggers as our mechanism to integrate
SOLR. (https://github.com/hmsonline/cassandra-triggers)  We intercept
the mutation, transform the data into a document (w/ multi-value
fields) and POST it to SOLR.

More recently though, we're looking to roll out ElasticSearch.  As our
query demand increases, we expect SOLR to quickly become a PITA to
administrer.  (master->slave relationships)  IMHO, ElasticSearch's
architecture is a better match for Cassandra.  We are also looking to
substitute cassandra-triggers for Storm, allowing us to build a data
processing flow using Cassandra and ElasticSearch bolts.  (we've open
sourced the Cassandra bolt and we'll be open sourcing the elastic
search bolt shortly)

-brian


On Wed, Sep 19, 2012 at 8:27 AM, Roshni Rajagopal
 wrote:
> Hi,
>
> Im new to Solr, and I hear that Solr is a great tool for improving search
> performance
> Im unsure whether Solr or DSE Search is a must for all cassandra deployments
>
> 1. For performance - I thought cassandra had great read & write performance.
> When should solr be used ?
> Taking the following use cases for cassandra from the datastax FAQ page, in
> which cases would Solr be useful, and whether for all?
>
> Time series data management
> High-velocity device data ingestion and analysis
> Media streaming (e.g., music, movies)
> Social media input and analysis
> Online web retail (e.g., shopping carts, user transactions)
> Web log management / analysis
> Web click-stream analysis
> Real-time data analytics
> Online gaming (e.g., real-time messaging)
> Write-intensive transaction systems
> Buyer event analytics
> Risk analysis and management
>
>
> 2. what changes to cassandra data modeling does Solr bring? We have some
> guidelines & best practices around cassandra data modeling.
> Is Solr so powerful, that it does not matter how data is modelled in
> cassandra? Are there different best practices for cassandra data modeling
> when Solr is in the picture?
> Is this something we should keep in mind while modeling for cassandra today-
> that it should be  good to be used via Solr in future?
>
> 3. Does Solr come with any drawbacks like its not real time ?
>
> I can & should read the manual, but it will be great if someone can explain
> at a high level.
>
> Thank you!
>
>
> Regards,
> Roshni



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
Apache Cassandra MVP
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: Data Modeling - JSON vs Composite columns

2012-09-19 Thread Brian O'Neill
Roshni,

We're going through the same debate right now.

I believe native support for JSON (or collections) is on the docket
for Cassandra.
Here is a discussion we had a few months ago on the topic:
http://comments.gmane.org/gmane.comp.db.cassandra.devel/5233

We presently store JSON, but we're considering a change to composite keys.

Presently, each client has to parse the JSON value.  If you are
retrieving lots of values, that's a lot of parsing.  Also, storing the
raw values allows for better integration with other tools, such as
reporting engines (e.g. JasperSoft).  Also, if you do want to update a
single value inside the json, you get into real trouble, because you
first need to read the value, update the field, then write the column
again.  The read before write is a problem, especially if you have a
lot of concurrency in your system.  (Two clients could read the old
value, then update different fields, and the second would overwrite
the firsts change)

One final note...
(As a side not, JSON values also complicated our wide-row indexing
mechanism: (https://github.com/hmsonline/cassandra-indexing))

For those reasons, we're considering a data model shift away from JSON.

That said, I'm keeping a close watch on:
https://issues.apache.org/jira/browse/CASSANDRA-3647

But if this is CQL only, I'm not sure how much use it will be for us
since we're coming in from different clients.
Anyone know how/if collections will be available from other clients?

-brian


On Wed, Sep 19, 2012 at 8:00 AM, Roshni Rajagopal
 wrote:
> Hi,
>
> There was a conversation on this some time earlier, and to continue it
>
> Suppose I want to associate a user to  an item, and I want to also store 3
> commonly used attributes without needing to go to an entity item column
> family , I have 2 options :-
>
> A) use composite columns
> UserId1 : {
>  : = Betty Crocker,
>  : = Cake
> : = 5
>  : = Nutella,
>  : = Choc spread
> : = 15
> }
>
> B) use a json with the data
> UserId1 : {
>   = {name: Betty Crocker,descr: Cake, Qty: 5},
>   ={name: Nutella,descr: Choc spread, Qty: 15}
> }
>
> Essentially A is better if one wants to update individual fields , while B
> is better if one wants easier paging, reading multiple items at once in one
> read. etc. The details are in this discussion thread
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-Modeling-another-question-td7581967.html
>
> I had an additional question,
> as its being said, that CQL is the direction in which cassandra is moving,
> and there's a lot of effort in making CQL the standard,
>
> How does approach B work in CQL. Can we read/write a JSON easily in CQL? Can
> we extract a field from a JSON in CQL or would that need to be done via the
> client code?
>
>
> Regards,
> Roshni



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
Apache Cassandra MVP
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Compound Keys: Connecting the dots between CQL3 and Java APIs

2012-09-11 Thread Brian O'Neill
Our data architects (ex-Oracle DBA types) are jumping on the CQL3
bandwagon and creating schemas for us.  That triggered me to write a
quick article mapping the CQL3 schemas to how they are accessed via
Java APIs (for our dev team).

I hope others find this useful as well:
http://brianoneill.blogspot.com/2012/09/composite-keys-connecting-dots-between.html

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
Apache Cassandra MVP
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: Cassandra API Library.

2012-09-04 Thread Brian O'Neill
You got it.  (done)

-brian

On Tue, Sep 4, 2012 at 7:08 AM, Filipe Gonçalves
 wrote:
> @Brian: you can add the Cassandra::Simple Perl client
> http://fmgoncalves.github.com/p5-cassandra-simple/
>
>
> 2012/8/27 Paolo Bernardi 
>>
>> On 08/23/2012 01:40 PM, Thomas Spengler wrote:
>>>
>>> 4) pelops (Thrift,Java)
>>>
>>>
>> I've been using Pelops for quite some time with pretty good results; it
>> felt much cleaner than Hector.
>>
>> Paolo
>>
>> --
>> @bernarpa
>> http://paolobernardi.wordpress.com
>>
>
>
>
> --
> Filipe Gonçalves



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
Apache Cassandra MVP
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: Spring - cassandra

2012-08-30 Thread Brian O'Neill

Yes.  I'm in contact with Oliver Gierke and Erez Mazor of Spring Data.

We are working on two fronts:
1) Spring Data support via JPA (using Kundera underneath)
- Initial attempt here:
http://brianoneill.blogspot.com/2012/07/spring-data-w-cassandra-using-jpa.h
tml
- Most recently (an hour ago): The issues w/ MetaModel are fixed, now
waiting on an enhancement to the EntityManager to fully support type
queries.

For this one, we're in a holding pattern until Kundera is fully JPA
compliant.

2) Spring Data support via Astyanax
- The project I'm working below should mimic Spring Data MongoDB's
approach and capabilities, allowing people to use Spring Data with
Cassandra without the constraints of JPA.  I'd love some help working on
the project.  Once we have it functional we should be able to push it to
Spring. (with Oliver's help)

Go ahead and fork.  Feel free to email me directly so we don't spam this
list.
(or setup a googlegroup just in case others want to contribute)

-brian


---
Brian O'Neill
Lead Architect, Software Development
Apache Cassandra MVP
 
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 8/30/12 9:01 AM, "Radim Kolar"  wrote:

>
>
>> You looking for the author of Spring Data Cassandra?
>> https://github.com/boneill42/spring-data-cassandra
>>
>> If so, I guess that is me. =)
>Did you get in touch with spring guys? They have cassandra support on
>their spring data todo list. They might have some todo or feature list
>they want to implement for cassandra, i am willing to code something to
>make official spring cassandra support happen faster.




Re: Spring - cassandra

2012-08-29 Thread Brian O'Neill

You looking for the author of Spring Data Cassandra?
https://github.com/boneill42/spring-data-cassandra

If so, I guess that is me. =)

-brian

---
Brian O'Neill
Lead Architect, Software Development
Apache Cassandra MVP
 
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 8/29/12 10:38 AM, "Radim Kolar"  wrote:

>is author of Spring - Cassandra here? I am interested in getting this
>merged into upstream spring. They have cassandra support on their todo
>list.




Re: Cassandra API Library.

2012-08-23 Thread Brian O'Neill
Ha… how could I forget? =)
Adding it now.

---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive • King of Prussia, PA • 19406
M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42>   •
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Robin Verlangen 
Reply-To:  
Date:  Thursday, August 23, 2012 9:56 AM
To:  
Subject:  Re: Cassandra API Library.

@Brian: You're missing PhpCassa (PHP library)

With kind regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.



2012/8/23 Hiller, Dean 
> No problem, if you like SQL at all and don't mind adding a PARTITIONS
> clause, we have a raw ad-hoc layer(if you have properly added meta data
> which the ORM objects do for you but can be done manually) you get a query
> like this
> 
> PARTITIONS p('account56') SELECT tr FROM Trades as tr WHERE tr. price > 70;
> 
> So it queries just the partition of the Trades table.  We are still
> investigating how large partitions can be but we know it is quite large
> from previous nosql projects.
> 
> Dean
> 
> 
> On 8/23/12 7:51 AM, "Brian O'Neill"  wrote:
> 
>> >
>> >Thanks Dean… I hadn't played with that one.  I wonder if that would better
>> >fit the bill for the Spring Data Cassandra module I'm hacking on.
>> >https://github.com/boneill42/spring-data-cassandra
>> >
>> >I'll poke around.
>> >
>> >-brian
>> >
>> >---
>> >Brian O'Neill
>> >Lead Architect, Software Development
>> >
>> >Health Market Science
>> >The Science of Better Results
>> >2700 Horizon Drive • King of Prussia, PA • 19406
>> >M: 215.588.6024   • @boneill42
>> <http://www.twitter.com/boneill42>  •
>> >healthmarketscience.com <http://healthmarketscience.com>
>> >
>> >This information transmitted in this email message is for the intended
>> >recipient only and may contain confidential and/or privileged material. If
>> >you received this email in error and are not the intended recipient, or
>> >the person responsible to deliver it to the intended recipient, please
>> >contact the sender at the email above and delete this email and any
>> >attachments and destroy any copies thereof. Any review, retransmission,
>> >dissemination, copying or other use of, or taking any action in reliance
>> >upon, this information by persons or entities other than the intended
>> >recipient is strictly prohibited.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >On 8/23/12 9:19 AM, "Hiller, Dean"  wrote:
>> >
>>> >>playOrm has a raw layer that if your columns are not defined ahead of
>>> >>time
>>> >>and SQL with no limitations on <, =, <=, etc. etc. as well as joins being
>>> >>added shortly BUT joins are for joining partitions so that your system
>>> >>can
>>> >>still scale to infinity.  Also has an in-memory database as well for unit
>>> >>testing that you can do TDD with built in.
>>> >>
>>> >>So if you like JQL but want infinite scale JQL, try playOrm.
>>> >>
>>> >>All 45 tests are passing.  We expect 100 unit tests to be in place by the
>>> >>end of the year.
>>> >>
>>> >>Dean
>>> >>
>>> >>On 8/23/12 6:46 AM, "Brian O'Neill"  wrote:
>>> >>
>>>> >>>

Re: Cassandra API Library.

2012-08-23 Thread Brian O'Neill
FWIW.. I just threw this together...
http://brianoneill.blogspot.com/2012/08/cassandra-apis-laundry-list.html

Let me know if I missed any others. (I didn't have playorm on there)

-brian

On Thu, Aug 23, 2012 at 9:51 AM, Brian O'Neill  wrote:
>
> Thanks Dean… I hadn't played with that one.  I wonder if that would better
> fit the bill for the Spring Data Cassandra module I'm hacking on.
> https://github.com/boneill42/spring-data-cassandra
>
> I'll poke around.
>
> -brian
>
> ---
> Brian O'Neill
> Lead Architect, Software Development
>
> Health Market Science
> The Science of Better Results
> 2700 Horizon Drive • King of Prussia, PA • 19406
> M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42>  •
> healthmarketscience.com
>
> This information transmitted in this email message is for the intended
> recipient only and may contain confidential and/or privileged material. If
> you received this email in error and are not the intended recipient, or
> the person responsible to deliver it to the intended recipient, please
> contact the sender at the email above and delete this email and any
> attachments and destroy any copies thereof. Any review, retransmission,
> dissemination, copying or other use of, or taking any action in reliance
> upon, this information by persons or entities other than the intended
> recipient is strictly prohibited.
>
>
>
>
>
>
>
> On 8/23/12 9:19 AM, "Hiller, Dean"  wrote:
>
>>playOrm has a raw layer that if your columns are not defined ahead of time
>>and SQL with no limitations on <, =, <=, etc. etc. as well as joins being
>>added shortly BUT joins are for joining partitions so that your system can
>>still scale to infinity.  Also has an in-memory database as well for unit
>>testing that you can do TDD with built in.
>>
>>So if you like JQL but want infinite scale JQL, try playOrm.
>>
>>All 45 tests are passing.  We expect 100 unit tests to be in place by the
>>end of the year.
>>
>>Dean
>>
>>On 8/23/12 6:46 AM, "Brian O'Neill"  wrote:
>>
>>>
>>>
>>>We've used 'em all andŠ (IMHO)
>>>
>>>1) I would avoid Thrift directly.
>>>2) Hector is a sure bet.
>>>3) Astyanax is the up and comer.
>>>4) Kundera is good, but works like an ORM -- so not so good if your
>>>columns aren't defined ahead of time.
>>>
>>>-brian
>>>
>>>---
>>>Brian O'Neill
>>>Lead Architect, Software Development
>>>
>>>Health Market Science
>>>The Science of Better Results
>>>2700 Horizon Drive € King of Prussia, PA € 19406
>>>M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
>>>healthmarketscience.com
>>>
>>>This information transmitted in this email message is for the intended
>>>recipient only and may contain confidential and/or privileged material.
>>>If
>>>you received this email in error and are not the intended recipient, or
>>>the person responsible to deliver it to the intended recipient, please
>>>contact the sender at the email above and delete this email and any
>>>attachments and destroy any copies thereof. Any review, retransmission,
>>>dissemination, copying or other use of, or taking any action in reliance
>>>upon, this information by persons or entities other than the intended
>>>recipient is strictly prohibited.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 8/23/12 7:40 AM, "Thomas Spengler" 
>>>wrote:
>>>
>>>>4) pelops (Thrift,Java)
>>>>
>>>>On 08/23/2012 01:28 PM, Baskar Sikkayan wrote:
>>>>> I would vote for Hector :)
>>>>>
>>>>> On Thu, Aug 23, 2012 at 4:55 PM, Amit Handa 
>>>>>wrote:
>>>>>
>>>>>> hi,
>>>>>>
>>>>>> kindly let me know which java client api is more matured, and easy to
>>>>>>use
>>>>>> with all features(Super Columns, caching, pooling, etc) of Cassandra
>>>>>>1.X.
>>>>>> Right now i come to know that following client exists:
>>>>>>
>>>>>> 1) Hector(Java)
>>>>>> 2) Thrift (Java)
>>>>>> 3) Kundera (Java)
>>>>>>
>>>>>>
>>>>>> With Regards,
>>>>>> Amit
>>>>>>
>>>>>
>>>>
>>>>
>>>>--
>>>>Thomas Spengler
>>>>Chief Technology Officer
>>>>
>>>>
>>>>TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin
>>>>Tel.: (030) 2000912 0 | Fax: (030) 2000912 100
>>>>thomas.speng...@toptarif.de | www.toptarif.de
>>>>
>>>>Amtsgericht Charlottenburg, HRB 113287 B
>>>>Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor
>>>>
>>>>-
>>>
>>>
>>
>
>



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: Cassandra API Library.

2012-08-23 Thread Brian O'Neill

Thanks Dean… I hadn't played with that one.  I wonder if that would better
fit the bill for the Spring Data Cassandra module I'm hacking on.
https://github.com/boneill42/spring-data-cassandra

I'll poke around.

-brian

---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive • King of Prussia, PA • 19406
M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42>  •
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 8/23/12 9:19 AM, "Hiller, Dean"  wrote:

>playOrm has a raw layer that if your columns are not defined ahead of time
>and SQL with no limitations on <, =, <=, etc. etc. as well as joins being
>added shortly BUT joins are for joining partitions so that your system can
>still scale to infinity.  Also has an in-memory database as well for unit
>testing that you can do TDD with built in.
>
>So if you like JQL but want infinite scale JQL, try playOrm.
>
>All 45 tests are passing.  We expect 100 unit tests to be in place by the
>end of the year.
>
>Dean
>
>On 8/23/12 6:46 AM, "Brian O'Neill"  wrote:
>
>>
>>
>>We've used 'em all andŠ (IMHO)
>>
>>1) I would avoid Thrift directly.
>>2) Hector is a sure bet.
>>3) Astyanax is the up and comer.
>>4) Kundera is good, but works like an ORM -- so not so good if your
>>columns aren't defined ahead of time.
>>
>>-brian
>>
>>---
>>Brian O'Neill
>>Lead Architect, Software Development
>> 
>>Health Market Science
>>The Science of Better Results
>>2700 Horizon Drive € King of Prussia, PA € 19406
>>M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
>>healthmarketscience.com
>>
>>This information transmitted in this email message is for the intended
>>recipient only and may contain confidential and/or privileged material.
>>If
>>you received this email in error and are not the intended recipient, or
>>the person responsible to deliver it to the intended recipient, please
>>contact the sender at the email above and delete this email and any
>>attachments and destroy any copies thereof. Any review, retransmission,
>>dissemination, copying or other use of, or taking any action in reliance
>>upon, this information by persons or entities other than the intended
>>recipient is strictly prohibited.
>> 
>>
>>
>>
>>
>>
>>
>>On 8/23/12 7:40 AM, "Thomas Spengler" 
>>wrote:
>>
>>>4) pelops (Thrift,Java)
>>>
>>>On 08/23/2012 01:28 PM, Baskar Sikkayan wrote:
>>>> I would vote for Hector :)
>>>> 
>>>> On Thu, Aug 23, 2012 at 4:55 PM, Amit Handa 
>>>>wrote:
>>>> 
>>>>> hi,
>>>>>
>>>>> kindly let me know which java client api is more matured, and easy to
>>>>>use
>>>>> with all features(Super Columns, caching, pooling, etc) of Cassandra
>>>>>1.X.
>>>>> Right now i come to know that following client exists:
>>>>>
>>>>> 1) Hector(Java)
>>>>> 2) Thrift (Java)
>>>>> 3) Kundera (Java)
>>>>>
>>>>>
>>>>> With Regards,
>>>>> Amit
>>>>>
>>>> 
>>>
>>>
>>>-- 
>>>Thomas Spengler
>>>Chief Technology Officer
>>>
>>>
>>>TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin
>>>Tel.: (030) 2000912 0 | Fax: (030) 2000912 100
>>>thomas.speng...@toptarif.de | www.toptarif.de
>>>
>>>Amtsgericht Charlottenburg, HRB 113287 B
>>>Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor
>>>
>>>-
>>
>>
>




Re: Cassandra API Library.

2012-08-23 Thread Brian O'Neill


We've used 'em all andŠ (IMHO)

1) I would avoid Thrift directly.
2) Hector is a sure bet.
3) Astyanax is the up and comer.
4) Kundera is good, but works like an ORM -- so not so good if your
columns aren't defined ahead of time.

-brian

---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 8/23/12 7:40 AM, "Thomas Spengler"  wrote:

>4) pelops (Thrift,Java)
>
>On 08/23/2012 01:28 PM, Baskar Sikkayan wrote:
>> I would vote for Hector :)
>> 
>> On Thu, Aug 23, 2012 at 4:55 PM, Amit Handa 
>>wrote:
>> 
>>> hi,
>>>
>>> kindly let me know which java client api is more matured, and easy to
>>>use
>>> with all features(Super Columns, caching, pooling, etc) of Cassandra
>>>1.X.
>>> Right now i come to know that following client exists:
>>>
>>> 1) Hector(Java)
>>> 2) Thrift (Java)
>>> 3) Kundera (Java)
>>>
>>>
>>> With Regards,
>>> Amit
>>>
>> 
>
>
>-- 
>Thomas Spengler
>Chief Technology Officer
>
>
>TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin
>Tel.: (030) 2000912 0 | Fax: (030) 2000912 100
>thomas.speng...@toptarif.de | www.toptarif.de
>
>Amtsgericht Charlottenburg, HRB 113287 B
>Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor
>-




A Big Data Trifecta: Storm, Kafka and Cassandra

2012-08-04 Thread Brian O'Neill
Philip,

I figured I would reply via blog post. =)
http://brianoneill.blogspot.com/2012/08/a-big-data-trifecta-storm-kafka-and.html

That blog post shows how we pieced together Kafka and Cassandra (via Storm).
With LinkedIn behind Kafka, it is well supported.  They use it in
production. (and most likely we will too =)

Let me know if you end up using it.  Thus far, I think it pairs nicely
with Cassandra, but we don't have it in production yet.

-brian

On Fri, Aug 3, 2012 at 3:41 PM, Milind Parikh  wrote:
> Kafka is relatively stable and has a active well-supported news-group as
> well.
>
> As discussed by Brian, you would be inverting the paradigm of store-process.
> Essentially in your original approach, you are storing the messages first
> and then processing them after the fact. In the Kafka model, you would
> process the messages as they come in.
>
> Since you are thinking about parallelism anyways, I trust that your
> processing paradigm is inherently paralleizable.
>
> Regards
> Milind
>
>
>
>
>
> On Fri, Aug 3, 2012 at 12:22 PM, Philip Nelson
>  wrote:
>>
>> Brian -- thanks.
>>
>> > We were looking to do the same thing, but in the end decided
>> > to go with Kafka.
>> > Given your throughput requirements, Kafka might be a good
>> > option for you as well.
>>
>> This might be off-topic, so I'll keep it short. Kafka is reasonably
>> stable? Mature (I see it's in the Incubator)? Relative to Cassandra?
>>
>> Philip
>>
>>
>



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: How to process new rows in parallel?

2012-08-03 Thread Brian O'Neill
If you are deleting the messages after processing, it sounds like you
are using Cassandra as a work queue.

Here are some links for implementing a distributed queue in Cassandra:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Distributed-work-queues-td5226248.html
http://comments.gmane.org/gmane.comp.db.cassandra.user/16633

There is a placeholder on the use cases wiki for this, but no info:
http://wiki.apache.org/cassandra/UseCases#A_distributed_Priority_Job_Queue

We were looking to do the same thing, but in the end decided to go with Kafka.
Given your throughput requirements, Kafka might be a good option for
you as well.

-brian


On Fri, Aug 3, 2012 at 2:18 PM, Philip Nelson
 wrote:
> Hello,
>
> I am using a Column Family in Cassandra to store incoming messages, which 
> arrive at a high rate (100s of thousands per second). I then have a process 
> wake up periodically to work on those messages, and then delete them. I'd 
> like to understand how I could have multiple processes running, each pulling 
> off a bunch of messages in parallel. It would be nice to be able to add 
> processes dynamically, and not have to explicitly assign message ranges to 
> various processes.
>
> Any suggestions on how to ensure that each process pulls off a different 
> bunch of messages? Any recommended design patterns? I was going to look at 
> qsandra too, for inspiration. Would this be worthwhile?
>
> If this was a relational database, I would have the processes lock the table 
> (or perhaps a row), set flags on a row indicating that it's being 
> "processed", and then unlock. Processes would choose messages by SELECTing on 
> unflagged messages. I'm not sure how this might map to Cassandra. I realise 
> it may not. Even if I configure the cluster such that seting a flag on a row 
> requires all nodes to be written, two processes could still race setting that 
> flag, right?
>
> I am open to the idea that it might help to store the messages in wide rows, 
> if that helps.
>
> Thanks,
>
> Philip



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: How to manually build and maintain secondary indexes

2012-07-26 Thread Brian O'Neill
Alon,

We came to the same conclusion regarding secondary indexes, and instead of
using them we implemented our own wide-row indexing capability and
open-sourced it.  

Its available here:
https://github.com/hmsonline/cassandra-indexing

We still have challenges rebuilding indexes, etc.  It doesn't address all
of your concerns, but I tried to capture the motivation behind our
implementation here:
http://brianoneill.blogspot.com/2012/03/cassandra-indexing-good-bad-and-ugl
y.html

-brian

-- 
Brian O'Neill
Lead Architect, Software Development
Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406
p: 215.588.6024
www.healthmarketscience.com





On 7/26/12 2:05 PM, "Alon Pilberg"  wrote:

>Hello,
>My company is working on transition of our relational data model to
>Cassandra. Naturally, one of the basic demands is to have secondary
>indexes to answer queries quickly according to the application's
>needs.
>After looking at Cassandra's native support for secondary indexes, we
>decided not to use them due to the poor performance for
>high-cardinality values. Instead, we decide to implement secondary
>indexes manually.
>Some search led us to
>http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html which
>details a schema for such indexes. However, the method employed there
>specifically adds an index entries column family, whereas it seems
>like only 2 CFs are needed - one for the items and one for the indexes
>(assuming one has access to both old and new values when updating an
>item). The article actually mentioned that this is indeed not the
>obvious solution, "for a number of reasons related to Cassandra's
>model of eventual consistency ... will not reliably work" and "it's a
>really good idea to make sure you understand why this CF is
>necessary". However, no additional information is provided on what
>might be a critical issue, as dealing with corrupt indexes in a large
>production environment is surely to be a nightmare.
>What are the community's thoughts on this matter? Given the writer's
>credentials in the Cassandra realm, specifically regarding indexes,
>I'm inclined not to ignore his remarks.
>References to a document / system that implement similar indexes would
>be greatly appreciated as well.
>
>- alon




An experiment using Spring Data w/ Cassandra (initially via JPA/Kundera)

2012-07-18 Thread Brian O'Neill
This is just an FYI.

I experimented w/ Spring Data JPA w/ Cassandra leveraging Kundera.

It sort of worked:
https://github.com/boneill42/spring-data-jpa-cassandra
http://brianoneill.blogspot.com/2012/07/spring-data-w-cassandra-using-jpa.html

I'm now working on a pure Spring Data adapter using Astyanax:
https://github.com/boneill42/spring-data-cassandra

I'll keep you posted.

(Thanks to all those that helped out w/ advice)

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: Trigger and customized filter

2012-07-10 Thread Brian O'Neill
While Jonathan and crew work on the infrastructure to support triggers:
https://issues.apache.org/jira/browse/CASSANDRA-4285

We have a project going over here that provides a trigger-like capability:
https://github.com/hmsonline/cassandra-triggers/
https://github.com/hmsonline/cassandra-triggers/wiki/GettingStarted

We are working enhancements that would support synchronous triggers w/
javascript.
For now, they are processed asynchronously, and you implement a Java interface.

-brian

On Tue, Jul 10, 2012 at 9:24 AM, Felipe Schmidt  wrote:
> Does anyone know something about the following questions?
>
> 1. Does Cassandra support customized filter? customized filter means
> programmer can define his desired filter to select the data.
> 2. Does Cassandra support trigger? trigger has the same meaning as in
> RDBMS.
>
> Thanks in advance.
>
> Regards,
> Felipe Mathias Schmidt
> (Computer Science UFRGS, RS, Brazil)
>
>
>



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: Cassandra and Tableau

2012-07-06 Thread Brian O'Neill
Robin,

We have the same issue right now.  We use Tableau for all of our
reporting needs, but we couldn't find any acceptable bridge between it
and Cassandra.

We ended up using cassandra-triggers to replicate the data to Oracle.
https://github.com/hmsonline/cassandra-triggers/

Let us know if you get things setup with a direct connection.
We'd be *very* interested int helping out if you find a way to do it.

-brian


On Fri, Jul 6, 2012 at 5:31 AM, Robin Verlangen  wrote:
> Hi there,
>
> Is there anyone out there who's using Tableau in combination with a
> Cassandra cluster? There seems to be no standard solution to connect, at
> least I couldn't find one. Does anyone know how to tackle this problem?
>
>
> With kind regards,
>
> Robin Verlangen
> Software engineer
>
> W http://www.robinverlangen.nl
> E ro...@us2.nl
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: Any reason to limit one's self to a single high level java client?

2012-07-02 Thread Brian O'Neill
The only trouble you might run into is classpath conflicts, but as long as
they are using compatible versions of common dependencies you should be
okay.

-brian

-- 
Brian O'Neill
Lead Architect, Software Development
Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406
p: 215.588.6024
www.healthmarketscience.com

From:  David Leimbach 
Reply-To:  
Date:  Mon, 2 Jul 2012 07:34:34 -0700
To:  user 
Subject:  Any reason to limit one's self to a single high level java client?

I recognize that behind the scenes there's connection pooling and all kinds
of nice asynchronous dispatch of requests to cassandra, but is there any
sort of reason to avoid using different Java clients in the same
application?

I'm noticing that some are better suited to certain kinds activity than
others.

Dave




Re: which high level Java client

2012-06-28 Thread Brian O'Neill
FWIW,

We keep most of our system level integrations behind REST using Virgil:
https://github.com/hmsonline/virgil

When a lower-level integration is necessary we use Hector, but
recently we've started using Astyanax and plan to port our Hector
dependencies over to Astyanax when given a chance.

I've also been looking to implement a Spring Data JPA adaptor like
what is available for MongoDB.
https://github.com/boneill42/spring-data-mongodb

I've forked the SpringSource Cassandra repo here if anyone wants to help out:
https://github.com/boneill42/spring-data-cassandra

-brian


On Thu, Jun 28, 2012 at 9:02 AM, Vivek Mishra  wrote:
>
> Would like to add one more https://github.com/impetus-opensource/Kundera . 
> Next release is planned with many distinguishing features.
>
> -Vivek
>
>
> On Thu, Jun 28, 2012 at 6:23 PM, Sasha Dolgy  wrote:
>>
>> Not following this thread too much, but there is also 
>> https://github.com/Netflix/astyanax/
>>
>> "Astyanax is currently in use at Netflix. Issues generally are fixed as 
>> quickly as possbile and releases done frequently."
>>
>> -sd
>>
>> On Thu, Jun 28, 2012 at 2:39 PM, Poziombka, Wade L 
>>  wrote:
>>>
>>> I use Pelops and have been very happy.  In my opinion the interface is 
>>> cleaner than that with Hector.  I personally do like the serializer 
>>> business.
>>>
>>> -Original Message-
>>> From: Radim Kolar [mailto:h...@filez.com]
>>> Sent: Thursday, June 28, 2012 5:06 AM
>>> To: user@cassandra.apache.org
>>> Subject: Re: which high level Java client
>>>
>>> i do not have experience with other clients, only hector. But timeout 
>>> management in hector is really broken. If you expect your nodes to timeout 
>>> often (for example, if you are using WAN) better to try something else 
>>> first.
>>
>>
>>
>>
>> --
>> Sasha Dolgy
>> sasha.do...@gmail.com
>
>



--
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: Ball is rolling on High Performance Cassandra Cookbook second edition

2012-06-27 Thread Brian O'Neill
RE: API method signatures changing

That triggers another thought...

What terminology will you use in the book to describe the data model?  CQL?

When we wrote the RefCard on
DZone,
we intentionally favored/used CQL terminology.  On advisement from Jonathan
and Kris Hahn, we wanted to start the process of sunsetting the legacy
terms (keyspace, column family, etc.) in favor of the more familiar CQL
terms (schema, table, etc.). I've gone on
recordin
favor of the switch, but it is probably something worth noting in the
book since that terminology does not yet align with all the client APIs
yet. (e.g. Hector, Astyanax, etc.)

I'm not sure when the client APIs will catch up to the new terminology, but
we may want to inquire as to future proof the recipes as much as possible.

-brian




On Wed, Jun 27, 2012 at 4:18 PM, Edward Capriolo wrote:

> On Wed, Jun 27, 2012 at 3:08 PM, Courtney Robinson 
> wrote:
> > Sounds good.
> > One thing I'd like to see is more coverage on Cassandra Internals. Out of
> > the box Cassandra's great but having a little inside knowledge can be
> very
> > useful because it helps you design your applications to work with
> Cassandra;
> > rather than having to later make endless optimizations that could
> probably
> > have been avoided had you done your implementation slightly differently.
> >
> > Another thing that may be worth adding would be a recipe that showed an
> > approach to evaluating Cassandra for your organization/use case. I
> realize
> > that's going to vary on a case by case basis but one thing I've noticed
> is
> > that some people dive in without really thinking through whether
> Cassandra
> > is actually the right fit for what they're doing. It sort of becomes a
> > hammer for anything that looks like a nail.
> >
> > On Tue, Jun 26, 2012 at 10:25 PM, Edward Capriolo  >
> > wrote:
> >>
> >> Hello all,
> >>
> >> It has not been very long since the first book was published but
> >> several things have been added to Cassandra and a few things have
> >> changed. I am putting together a list of changed content, for example
> >> features like the old per Column family memtable flush settings versus
> >> the new system with the global variable.
> >>
> >> My editors have given me the green light to grow the second edition
> >> from ~200 pages currently up to 300 pages! This gives us the ability
> >> to add more items/sections to the text.
> >>
> >> Some things were missing from the first edition such as Hector
> >> support. Nate has offered to help me in this area. Please feel contact
> >> me with any ideas and suggestions of recipes you would like to see in
> >> the book. Also get in touch if you want to write a recipe. Several
> >> people added content to the first edition and it would be great to see
> >> that type of participation again.
> >>
> >> Thank you,
> >> Edward
> >
> >
> >
> >
> > --
> > Courtney Robinson
> > court...@crlog.info
> > http://crlog.info
> > 07535691628 (No private #s)
> >
>
> Thanks for the comments. Yes the "INTERNALS" chapter was a bit tricky.
> The challenge of writing about internals is they go stale fairly
> quickly. I was considering writing a partitioner for the internals
> chapter but then I thought about it more:
> 1) Its hard
> 2) The APIs can change. (They work the same way across versions but
> they may have a different signature etc)
> 3) 99.99% of people should be using the random partitioner :)
>
> But I agree the external chapter can be made much stronger then it is.
>
> The recipe format strict. It naturally conflicts with the typical use
> case style. In a use case where you write a good amount of text
> talking about problem domain, previous solutions, bragging about
> company X. We can not do that with the recipe style, but we can do our
> best to make the recipes as real world as possible. I tried to do that
> throughout the text, you do not find many examples like 'writing foo
> records to bar column families'. However the format does not allow
> extensive text blocks mentioned above so it is difficult to set the
> stage for a complex and detailed real world problem. Still, I think
> for some examples we can take the next step and make the recipe more
> real world practical and more use-case like.
>



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Indexing JSON in Cassandra

2012-06-21 Thread Brian O'Neill
I know we had this conversation over on the dev list a while back:
http://www.mail-archive.com/dev@cassandra.apache.org/msg03914.html

I just wanted to let people know that we added the capability to our
cassandra-indexing extension.
http://brianoneill.blogspot.com/2012/06/indexing-json-in-cassandra.html

Let us know if you have any trouble with it.

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: Server Side Logic/Script - Triggers / StoreProc

2012-04-22 Thread Brian O'Neill
Praveen,

We are certainly interested. To get things moving we implemented an add-on for 
Cassandra to demonstrate the viability (using AOP):
https://github.com/hmsonline/cassandra-triggers

Right now the implementation executes triggers asynchronously, allowing you to 
implement a java interface and plugin your own java class that will get called 
for every insert.

Per the discussion on 1311, we intend to extend our proof of concept to be able 
to invoke scripts as well.  (minimally we'll enable javascript, but we'll 
probably allow for ruby and groovy as well)

-brian

On Apr 22, 2012, at 12:23 PM, Praveen Baratam wrote:

> I found that Triggers are coming in Cassandra 1.2 
> (https://issues.apache.org/jira/browse/CASSANDRA-1311) but no mention of any 
> StoreProc like pattern.
> 
> I know this has been discussed so many times but never met with any 
> initiative. Even Groovy was staged out of the trunk.
> 
> Cassandra is great for logging and as such will be infinitely more useful if 
> some logic can be pushed into the Cassandra cluster nearer to the location of 
> Data to generate a materialized view useful for applications.
> 
> Server Side Scripts/Routines in Distributed Databases could soon prove to be 
> the differentiating factor.
> 
> Let me reiterate things with a use case.
> 
> In our application we store time series data in wide rows with TTL set on 
> each point to prevent data from growing beyond acceptable limits. Still the 
> data size can be a limiting factor to move all of it from the cluster node to 
> the querying node and then to the application via thrift for processing and 
> presentation.
> 
> Ideally we should process the data on the residing node and pass only the 
> materialized view of the data upstream. This should be trivial if Cassandra 
> implements some sort of server side scripting and CQL semantics to call it.
> 
> Is anybody else interested in a similar feature? Is it being worked on? Are 
> there any alternative strategies to this problem?
> 
> Praveen
> 
> 

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



Re: cassandra gui

2012-04-01 Thread Brian O'Neill
If you give Virgil a try, let me know how it goes.
The REST layer is pretty solid, but the gui is just a PoC which makes it
easy to see what's in the CFs during development/testing.
(It's only a couple hundred lines of ExtJS code built on the REST layer)

We had plans to add CQL to the gui for CRUD, but never got around to it.

-brian

On Fri, Mar 30, 2012 at 5:20 PM, Ben McCann  wrote:

> If you want a REST interface and a GUI then Virgil may be interesting.  I
> just came across it and haven't tried it myself yet.
>
> http://brianoneill.blogspot.com/2011/10/virgil-gui-and-rest-layer-for-cassandra.html
>
>
>
>
> On Fri, Mar 30, 2012 at 2:15 PM, John Liberty  wrote:
>
>> I made some updates to a cassandra-gui project I found, which seemed to
>> be stuck at version 0.7, and posted to github:
>> https://github.com/libjack/cassandra-gui
>>
>> Besides updating to work with version 1.0+, main improvements I added
>> were to obey validation types, including column metadata, when displaying
>> or accepting data. This includes support for Composite types, both keys and
>> columns.
>>
>> I often create CF with non string keys, columns, values, and especially
>> Composite types... And I need a tool to browse/verify and then add/edit
>> test data, and this works quite well for me.
>>
>> --
>> John Liberty
>> libjac...@gmail.com
>> (585) 466-4249
>>
>
>


-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Cassandra Indexing: The Good, the Bad and the Ugly

2012-03-21 Thread Brian O'Neill
Over the past 9 months, we've learned a lot about indexing in Cassandra and
we've had a few false starts.  I've tried to capture what we learned in the
hopes that we can save a few others from false starts.

http://brianoneill.blogspot.com/2012/03/cassandra-indexing-good-bad-and-ugly.html

We've also released some code in the hopes of making it easier to implement
wide-row indexes.
https://github.com/boneill42/cassandra-indexing

Its more AOP code (just like our cassandra-triggers) that implements
wide-row indexing for you.
Just configure which column families you want indexed, and on which columns
you want the index and let the AOP take care of the rest.
It is still very much in its infancy, but as always -- comments are
welcome.
(especially if we have something wrong ;)

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Cassandra Triggers Capability published out to GitHub

2012-03-02 Thread Brian O'Neill
FYI --
http://brianoneill.blogspot.com/2012/03/cassandra-triggers-for-indexing-and.html

https://github.com/hmsonline/cassandra-triggers

Feedback welcome. Contribution and involvement is even better. ;)

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Virgil Moved (and Cassandra-Triggers coming soon)

2012-02-07 Thread Brian O'Neill
FYI -- we moved Virgil to Github to make it easier for people to contribute.
https://github.com/hmsonline/virgil

Also, we created an organization profile (hmsonline) to house all of our
storm/cassandra related work.
https://github.com/hmsonline

Under that profile, we'll be releasing cassandra-triggers.
It is an AOP-based trigger solution that provides a simple
trigger/event-log that can be used for data replication and indexing
reacting to column family mutations.
https://github.com/hmsonline/cassandra-triggers

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Remote Hadoop Job Deployment

2012-01-24 Thread Brian O'Neill
FYI... we finally got around to releasing a version of Virgil that includes
the ability to deploy jobs to remote Hadoop clusters running against
Cassandra Column Families.

http://brianoneill.blogspot.com/2012/01/virgil-remote-hadoop-job-deployment-via.html

This has enabled an army of people to write and deploy Hadoop jobs against
our Cassandra cluster.
(Literally, we'll probably have 100 M/R jobs by the end of the month)

Yes, we still plan to implement a javascript engine as well, but first we
intend to tackle Triggers for indexing, data replication and "materialized
views".

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: Cassandra to Oracle?

2012-01-22 Thread Brian O'Neill

Good point Milind. (RE: Client-side AOP)

I was thinking server-side to stay with the trigger concept, but we could just 
as easily intercept on the client-side. 
We'd just need to make sure that all clients got the AOP code injected. 
(including all of our map/reduce jobs)

If we get the point-cut right (using the Cassandra.Iface), we could probably 
make it portable.  People could drop it in client-side or server-side.

-brian



On Jan 22, 2012, at 9:45 AM, Milind Parikh wrote:

> My bad ~s/X:X-Value/Y:Y-Value/ after rereading the SELECT.
> 
> /***
> sent from my android...please pardon occasional typos as I respond @ the 
> speed of thought
> /
> 
> On Jan 22, 2012 6:40 AM, "Milind Parikh"  wrote:
> 
> 
> The composite-key approach with counters would work very well in this case. 
> It will also obviate the concern of not knowing the exact column names 
> apriori...although for efficiencies, you might to look at maintaining a 
> secondary cachelike cf for lookup
> 
> Depending on your data patterns(not to hit 2b columns) and actual queries, 
> you could store each Zs as one row and composite key on Z - value + X:X-value 
> and then as counter-column. Other optimizations may be possible.
> 
> If you're using AOP, as I read it, there's really no need to intercept your 
> own writes at the C* level; instead do it (use aop)at the client level.
> 
> Your migration also needs to be attended to and might need a MR first and AOP 
> intercepted writes.
> 
> Hth
> Milind
> 
> 
> /***
> sent from my android...please pardon occasional typos as I respond @ the 
> speed of thought
> /
> 
> 
>> 
>> >
>> > On Jan 22, 2012 4:42 AM, "Brian O'Neill"  wrote:
>> >
>> 
>> >
>> > Thanks for all the ideas...
>> >
>> > Since we can't predict all the values, we actually cut to Oracle...
>> 
>> 
>> 
>> >
>> >
>> >
>> >
>> > On Jan 21, 2012, at 8:35 AM, Eric Czech wrote:
>> >
>> > > Hi Brian,
>> > >
>> 
>> > We're trying to do the exact same...
>> 
> 
> 

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



Re: Cassandra to Oracle?

2012-01-22 Thread Brian O'Neill

Eric,

Thinking even a little bit more about this...

We could go the distributed counter approach with additional column families to 
support the ad hoc queries, but use triggers to implement it.  That would allow 
us to keep the client-side code thin, but achieve the same result... without 
necessarily replicating to Oracle for the attributes we can predict.

Maybe we'll take a look at that this week as well.

thanks again,
brian


On Jan 21, 2012, at 8:35 AM, Eric Czech wrote:

> Hi Brian,
> 
> We're trying to do the exact same thing and I find myself asking very similar 
> questions.
> 
> Our solution though has been to find what kind of queries we need to satisfy 
> on a preemptive basis and leverage cassandra's built-in indexing features to 
> build those result sets beforehand.  The whole point here then is that our 
> gain in cost efficiency comes from the fact that disk space is really cheap 
> and serving up result sets from disk is fast provided that those result sets 
> are pre-calculated and reasonable in size (even if we don't know all the 
> values upfront).  For example, when you're writing to your CF "X", you could 
> also make writes to column family "A" like this:
> 
> - write A[Z][Y] = 1
> where A = CF, Z = key, Y = column
> 
> Answering the question "select count(distinct Y) from X group by Z" then is 
> as simple as getting a list of rows for CF A and counting the distinct values 
> of Y and grouping them by Z on the client side.
> 
> Alternatively, there are much better ways to do this with composite 
> keys/columns and distributed counters but it's hard for me to tell what makes 
> the most sense without knowing more about your data / product requirements.
> 
> Either way, I feel your pain in getting things like this to work with 
> Cassandra when the domain of values for a particular key or column is unknown 
> and secondary indexing doesn't apply, but I'm positive there's a much cheaper 
> way to make it work than paying for Oracle if you have at least a decent idea 
> about what kinds of queries you need to satisfy (which it sounds like you 
> do).  To Maxim's "death by index" point, you could certainly go overboard 
> with this concept and cross a pricing threshold with some other database 
> technology, but I can't imagine you're even close to being in that boat given 
> how concise your query needs seem to be.
> 
> If you're interested, I'd be happy to share how we do these things to save 
> lots of money over commercial databases and try to relate that to your use 
> case, but if not, then I hope at least some of that this useful for you.
> 
> Good luck either way!
> 
> On Fri, Jan 20, 2012 at 9:27 PM, Maxim Potekhin  wrote:
> I certainly agree with "difficult to predict". There is a Danish
> proverb, which goes "it's difficult to make predictions, especially
> about the future".
> 
> My point was that it's equally difficult with noSQL and RDBMS.
> The latter requires indexing to operate well, and that's a potential
> performance problem.
> 
> 
> On 1/20/2012 7:55 PM, Mohit Anchlia wrote:
> I think the problem stems when you have data in a column that you need
> to run adhoc query on which is not denormalized. In most cases it's
> difficult to predict the type of query that would be required.
> 
> Another way of solving this could be to index the fields in search engine.
> 
> On Fri, Jan 20, 2012 at 7:37 PM, Maxim Potekhin  wrote:
> What makes you think that RDBMS will give you acceptable performance?
> 
> I guess you will try to index it to death (because otherwise the "ad hoc"
> queries won't work well if at all), and at this point you may be hit with a
> performance penalty.
> 
> It may be a good idea to interview users and build denormalized views in
> Cassandra, maybe on a separate "look-up" cluster. A few percent of users
> will be unhappy, but you'll find it hard to do better. I'm talking from my
> experience with an industrial strength RDBMS which doesn't scale very well
> for what you call "ad-hoc" queries.
> 
> Regards,
> Maxim
> 
> 
> 
> 
> 
> On 1/20/2012 9:28 AM, Brian O'Neill wrote:
> 
> I can't remember if I asked this question before, but
> 
> We're using Cassandra as our transactional system, and building up quite a
> library of map/reduce jobs that perform data quality analysis, statistics,
> etc.
> (>  100 jobs now)
> 
> But... we are still struggling to provide an "ad-hoc" query mechanism for
> our users.
> 
> To fill that gap, I believe we still need to mat

Re: Cassandra to Oracle?

2012-01-22 Thread Brian O'Neill
her way!
> 
> On Fri, Jan 20, 2012 at 9:27 PM, Maxim Potekhin  wrote:
> I certainly agree with "difficult to predict". There is a Danish
> proverb, which goes "it's difficult to make predictions, especially
> about the future".
> 
> My point was that it's equally difficult with noSQL and RDBMS.
> The latter requires indexing to operate well, and that's a potential
> performance problem.
> 
> 
> On 1/20/2012 7:55 PM, Mohit Anchlia wrote:
> I think the problem stems when you have data in a column that you need
> to run adhoc query on which is not denormalized. In most cases it's
> difficult to predict the type of query that would be required.
> 
> Another way of solving this could be to index the fields in search engine.
> 
> On Fri, Jan 20, 2012 at 7:37 PM, Maxim Potekhin  wrote:
> What makes you think that RDBMS will give you acceptable performance?
> 
> I guess you will try to index it to death (because otherwise the "ad hoc"
> queries won't work well if at all), and at this point you may be hit with a
> performance penalty.
> 
> It may be a good idea to interview users and build denormalized views in
> Cassandra, maybe on a separate "look-up" cluster. A few percent of users
> will be unhappy, but you'll find it hard to do better. I'm talking from my
> experience with an industrial strength RDBMS which doesn't scale very well
> for what you call "ad-hoc" queries.
> 
> Regards,
> Maxim
> 
> 
> 
> 
> 
> On 1/20/2012 9:28 AM, Brian O'Neill wrote:
> 
> I can't remember if I asked this question before, but
> 
> We're using Cassandra as our transactional system, and building up quite a
> library of map/reduce jobs that perform data quality analysis, statistics,
> etc.
> (>  100 jobs now)
> 
> But... we are still struggling to provide an "ad-hoc" query mechanism for
> our users.
> 
> To fill that gap, I believe we still need to materialize our data in an
> RDBMS.
> 
> Anyone have any ideas?  Better ways to support ad-hoc queries?
> 
> Effectively, our users want to be able to select count(distinct Y) from X
> group by Z.
> Where Y and Z are arbitrary columns of rows in X.
> 
> We believe we can create column families with different key structures
> (using Y an Z as row keys), but some column names we don't know / can't
> predict ahead of time.
> 
> Are people doing bulk exports?
> Anyone trying to keep an RDBMS in synch in real-time?
> 
> -brian
> 
> --
> Brian ONeill
> Lead Architect, Health Market Science (http://healthmarketscience.com)
> mobile:215.588.6024
> blog: http://weblogs.java.net/blog/boneill42/
> blog: http://brianoneill.blogspot.com/
> 
> 
> 

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



Triggers?

2012-01-20 Thread Brian O'Neill
Anyone know if there is any activity to deliver triggers?

I saw this quote:

http://www.readwriteweb.com/cloud/2011/10/cassandra-reaches-10-whats-nex.php

"Ellis says that he's just starting to think about the post-1.0 world for
Cassandra. Two features do come to mind, though, that missed the boat for
1.0 and that were on a lot of wishlists. The first is triggers.

Database triggers let you define rules in the database, such as updating
table X when table Y is updated. Ellis says that triggers will be necessary
for Cassandra as it grows in popularity. "As more tools use it, that's
something more users are going to be asking for."

But grepping the trunk code, I don't see any work on triggers.

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


  1   2   >