Re: Thanks for all the fish.

2016-08-19 Thread Brian O'Neill
+1, props to the giant on whose shoulders we stand.
-- 
Brian O'Neill
Principal Architect @ Monetate
m: 215.588.6024
bone...@monetate.com <mailto:bone...@monetate.com>
Is desktop dead?  Find out in Monetate's Ecommerce Quarterly Report (Q1 2016) 
<http://info.monetate.com/EQ1_2016.html?utm_source=ibm&utm_medium=email-footer&utm_campaign=organic>
> On Aug 19, 2016, at 4:29 PM, Brandon Williams  wrote:
> 
> If there is one thing I am damn sure of, it's that I wouldn't be here
> without Jonathan's leadership and friendship.  Thank you for all you've
> done, old buddy.
> 
> Kind Regards,
> Brandon
> 
> On Fri, Aug 19, 2016 at 2:20 PM, Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
> 
>> Just wanted to say thank you publicly to Jonathan Ellis for his tireless
>> work making this community and software what it is. He's always been level
>> headed and I certainly wouldn't be where I am without his leadership.
>> 
>> So, Jonathan, thanks for all the fish.
>> 
>> best,
>> kjellman
>> 
>> Sent from my iPhone
>> 



Re: Wrap around CQL queries for token ranges?

2015-05-11 Thread Brian O'Neill
Looks like the java-driver supplies the hack I need.  (TokenRange.unwrap)

I¹ll leave it to you guys to decide if it is more elegant to support
wrapping natively in CQL.

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Brian O'Neill 
Date:  Monday, May 11, 2015 at 12:32 PM
To:  "dev@cassandra.apache.org" 
Subject:  Wrap around CQL queries for token ranges?


I was doing some testing around data locality today (and adding it to our
distributed processing layer).
I retrieved all of the TokenRanges back using:
tokenRanges = metadata.getTokenRanges(keyspace, localhost)


And when I spun through the token ranges returned, I ended up missing
records.  
The root cause was the ³edge case² where the ring wraps around.

It generated the following CQL query: (using the last token range)

cqlsh> SELECT token(id),id,name FROM test_keyspace.test_table WHERE
token(id)>8743874685407455894 AND token(id)<=-8851282698028303387;

(0 rows)

cqlsh> SELECT token(id),id,name FROM test_keyspace.test_table WHERE
token(id)<=-8851282698028303387 AND token(id)>-9223372036854775808;

 token(id)| id | name
--++
 -9157060164899361011 | 23 | name23
 -9108684050423740263 | 53 | name53
 -9084883821289052775 | 91 | name91
(3 rows)

NOTE: If I use Long.MAX_VALUE instead, I get the records.

I can hack this at the app layer, to issue separate queries for the wrap
around case, but I wonder if CQL should support wrap around queries???

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 




Wrap around CQL queries for token ranges?

2015-05-11 Thread Brian O'Neill

I was doing some testing around data locality today (and adding it to our
distributed processing layer).
I retrieved all of the TokenRanges back using:
tokenRanges = metadata.getTokenRanges(keyspace, localhost)


And when I spun through the token ranges returned, I ended up missing
records.  
The root cause was the ³edge case² where the ring wraps around.

It generated the following CQL query: (using the last token range)

cqlsh> SELECT token(id),id,name FROM test_keyspace.test_table WHERE
token(id)>8743874685407455894 AND token(id)<=-8851282698028303387;

(0 rows)

cqlsh> SELECT token(id),id,name FROM test_keyspace.test_table WHERE
token(id)<=-8851282698028303387 AND token(id)>-9223372036854775808;

 token(id)| id | name
--++
 -9157060164899361011 | 23 | name23
 -9108684050423740263 | 53 | name53
 -9084883821289052775 | 91 | name91
(3 rows)

NOTE: If I use Long.MAX_VALUE instead, I get the records.

I can hack this at the app layer, to issue separate queries for the wrap
around case, but I wonder if CQL should support wrap around queries???

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 




Re: Conditional Update Code?

2015-03-04 Thread Brian O'Neill

Interesting, I just saw the function definition stuff in AggregationTest.

I’ll dig in there.  It seems like we could re-use those functions for
conditional updates?

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile • @boneill42 <http://www.twitter.com/boneill42>

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 




On 3/4/15, 12:50 PM, "Brian O'Neill"  wrote:

>
>Finally getting to this...
>
>For the UDF, javascript?
>
>-brian
>
>---
>Brian O'Neill 
>Chief Technology Officer
>Health Market Science, a LexisNexis Company
>215.588.6024 Mobile • @boneill42 <http://www.twitter.com/boneill42>
>
>This information transmitted in this email message is for the intended
>recipient only and may contain confidential and/or privileged material.
>If 
>you received this email in error and are not the intended recipient, or
>the person responsible to deliver it to the intended recipient, please
>contact the sender at the email above and delete this email and any
>attachments and destroy any copies thereof. Any review, retransmission,
>dissemination, copying or other use of, or taking any action in reliance
>upon, this information by persons or entities other than the intended
>recipient is strictly prohibited.
> 
>
>
>
>
>On 2/6/15, 9:50 AM, "Benedict Elliott Smith" 
>wrote:
>
>>It's quite possible support could be added to evaluate a UDF as part of
>>the
>>condition check. The code you're looking for are implementors of
>>CASRequest.appliesTo(), in CQL3CasRequest and
>>CassandraServer.ThriftCASRequest
>>
>>It seems like https://issues.apache.org/jira/browse/CASSANDRA-8488 would
>>offer the basic functionality, except that it is expected to require
>>ALLOW
>>FILTERING, which is unlikely to be permitted for a CAS operation, since
>>the
>>implication is that the work is too expensive for normal use. Such a
>>constraint is probably not necessary if a clustering prefix is provided,
>>though (i.e. a full CQL row key).
>>
>>On Fri, Feb 6, 2015 at 2:38 PM, Brian O'Neill 
>>wrote:
>>
>>>
>>> All,
>>>
>>> I¹m just looking for a little directionŠ
>>>
>>> Anyone know where I can find the code that checks the condition in a
>>> conditional update?
>>> We¹d love to have more expressive conditions, beyond just equality.
>>>(e.g.
>>> column contains? value)
>>>
>>> I wanted to see how hard this would be to contribute.
>>> Is such a JIRA already open?
>>>
>>> -brian
>>>
>>> ---
>>> Brian O'Neill
>>> Chief Technology Officer
>>> Health Market Science, a LexisNexis Company
>>> 215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>
>>>
>>>
>>> This information transmitted in this email message is for the intended
>>> recipient only and may contain confidential and/or privileged
>>>material. 
>>>If
>>> you received this email in error and are not the intended recipient,
>>>or 
>>>the
>>> person responsible to deliver it to the intended recipient, please
>>>contact
>>> the sender at the email above and delete this email and any
>>>attachments 
>>>and
>>> destroy any copies thereof. Any review, retransmission, dissemination,
>>> copying or other use of, or taking any action in reliance upon, this
>>> information by persons or entities other than the intended recipient is
>>> strictly prohibited.
>>>
>>>
>>>




Re: Conditional Update Code?

2015-03-04 Thread Brian O'Neill

Finally getting to this...

For the UDF, javascript?

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile • @boneill42 <http://www.twitter.com/boneill42>

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 




On 2/6/15, 9:50 AM, "Benedict Elliott Smith" 
wrote:

>It's quite possible support could be added to evaluate a UDF as part of
>the
>condition check. The code you're looking for are implementors of
>CASRequest.appliesTo(), in CQL3CasRequest and
>CassandraServer.ThriftCASRequest
>
>It seems like https://issues.apache.org/jira/browse/CASSANDRA-8488 would
>offer the basic functionality, except that it is expected to require ALLOW
>FILTERING, which is unlikely to be permitted for a CAS operation, since
>the
>implication is that the work is too expensive for normal use. Such a
>constraint is probably not necessary if a clustering prefix is provided,
>though (i.e. a full CQL row key).
>
>On Fri, Feb 6, 2015 at 2:38 PM, Brian O'Neill 
>wrote:
>
>>
>> All,
>>
>> I¹m just looking for a little directionŠ
>>
>> Anyone know where I can find the code that checks the condition in a
>> conditional update?
>> We¹d love to have more expressive conditions, beyond just equality.
>>(e.g.
>> column contains? value)
>>
>> I wanted to see how hard this would be to contribute.
>> Is such a JIRA already open?
>>
>> -brian
>>
>> ---
>> Brian O'Neill
>> Chief Technology Officer
>> Health Market Science, a LexisNexis Company
>> 215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>
>>
>>
>> This information transmitted in this email message is for the intended
>> recipient only and may contain confidential and/or privileged material.
>>If
>> you received this email in error and are not the intended recipient, or
>>the
>> person responsible to deliver it to the intended recipient, please
>>contact
>> the sender at the email above and delete this email and any attachments
>>and
>> destroy any copies thereof. Any review, retransmission, dissemination,
>> copying or other use of, or taking any action in reliance upon, this
>> information by persons or entities other than the intended recipient is
>> strictly prohibited.
>>
>>
>>
>>




Re: Conditional Update Code?

2015-02-06 Thread Brian O'Neill

Perfect. Thanks.

Let me see what I can cook up as a PoC.

The specific use case we are looking to address is for real-time
aggregations, done in memory, then periodically flushed to C*.  (e.g.
every 30 seconds)
(similar to what Druid does, but native on top of C*)

In this scenario, we aggregate app-side for a specific time
slice/partition of data.  We want to update the aggregate value only if
that time slice/partition has not already been incorporated into the
value.  If we have a native way to check to see if the partition was
already incorporated as part of the conditional update, it will simplify
the app layer.

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile • @boneill42 <http://www.twitter.com/boneill42>

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 




On 2/6/15, 9:50 AM, "Benedict Elliott Smith" 
wrote:

>It's quite possible support could be added to evaluate a UDF as part of
>the
>condition check. The code you're looking for are implementors of
>CASRequest.appliesTo(), in CQL3CasRequest and
>CassandraServer.ThriftCASRequest
>
>It seems like https://issues.apache.org/jira/browse/CASSANDRA-8488 would
>offer the basic functionality, except that it is expected to require ALLOW
>FILTERING, which is unlikely to be permitted for a CAS operation, since
>the
>implication is that the work is too expensive for normal use. Such a
>constraint is probably not necessary if a clustering prefix is provided,
>though (i.e. a full CQL row key).
>
>On Fri, Feb 6, 2015 at 2:38 PM, Brian O'Neill 
>wrote:
>
>>
>> All,
>>
>> I¹m just looking for a little directionŠ
>>
>> Anyone know where I can find the code that checks the condition in a
>> conditional update?
>> We¹d love to have more expressive conditions, beyond just equality.
>>(e.g.
>> column contains? value)
>>
>> I wanted to see how hard this would be to contribute.
>> Is such a JIRA already open?
>>
>> -brian
>>
>> ---
>> Brian O'Neill
>> Chief Technology Officer
>> Health Market Science, a LexisNexis Company
>> 215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>
>>
>>
>> This information transmitted in this email message is for the intended
>> recipient only and may contain confidential and/or privileged material.
>>If
>> you received this email in error and are not the intended recipient, or
>>the
>> person responsible to deliver it to the intended recipient, please
>>contact
>> the sender at the email above and delete this email and any attachments
>>and
>> destroy any copies thereof. Any review, retransmission, dissemination,
>> copying or other use of, or taking any action in reliance upon, this
>> information by persons or entities other than the intended recipient is
>> strictly prohibited.
>>
>>
>>
>>




Conditional Update Code?

2015-02-06 Thread Brian O'Neill

All,

I¹m just looking for a little directionŠ

Anyone know where I can find the code that checks the condition in a
conditional update?
We¹d love to have more expressive conditions, beyond just equality.  (e.g.
column contains? value)

I wanted to see how hard this would be to contribute.
Is such a JIRA already open?

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42>


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 




Re: Refactoring cassandra service package

2014-06-03 Thread Brian O'Neill

Interesting proposition.  We¹ve embedded Cassandra a few times, so I¹d be
interested in an approach that makes that easier.

Is there a way to do it incrementally?  Introduce the injection framework,
and convert a few classes (those required for startup), then slowly
convert the remainder?

peanut gallery,
-brian

---
Brian O'Neill
Chief Technology Officer

Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 6/3/14, 1:59 PM, "Gary Dusbabek"  wrote:

>On Tue, Jun 3, 2014 at 3:52 AM, Simon Chemouil 
>wrote:
>
>> Hi,
>>
>> I'm new to Cassandra and felt like exploring and hacking on the code. I
>> was surprised to see the usage of so many mutable/global state statics
>> all over the service package (basically global variables/singletons).
>>
>> While I understand it can be practical to work with singletons, and that
>> in any case I'm not sure multi-tenant Cassandra (as in two different
>> Cassandra instances within the same process) would make sense at all (or
>> even work considering there is some native access going on with JNA), I
>> find static state can easily lead to tangled 'spaghetti' code (accessing
>> the singletons from anywhere, even where one shouldn't), and in general
>> it ties the code to the VM instance, rather than to the class.
>>
>> I tried to find if it was an actual design choice, but from my
>> understanding this is more something inherited from the early Cassandra
>> times at Facebook. I just found this thread[1] pointing to issue
>> CASSANDRA-741 (slightly more limited scope) that was marked as WONTFIX
>> because no one took it (but still marked as open for contribution). The
>> current code conventions also don't mention the usage of singletons
>> except by stating:  "Do not extract interfaces (or abstract classes)
>> unless you actually need multiple implementations of it" (switching to a
>> "service"-style design doesn't require passing interfaces but it's
>> highly encouraged to help testability).
>>
>> So, I'd like to try to make this refactoring happen and remove all (or
>> most) mutable static state. It would be an easy way in for me in
>> Cassandra's internals (maybe to contribute further). I think it would
>> help testing (ability to unit test components without going to the
>> storage for instance) and in general modernize the code. It would also
>> make hacking on Cassandra easier because people could pick different
>> pieces without pulling the whole thing.
>>
>> It would definitely break backwards compatibility with current Java code
>> that directly embeds Cassandra / uses it as a library, but I would keep
>> the same abstraction so the refactoring would be easy. In any case,
>> backwards compatibility can be broken by many more changes than just
>> refactoring, and once this is done it will be easier to deal with
>> backwards compatibility.
>>
>> Obviously all ".instance" fields would be gone, and I'd try to fix
>> potential cyclic class dependencies and generally make sure classes
>> dependencies form a direct acyclic graph with CassandraDaemon as its
>> root. The basic idea is to have each 'service' component require all its
>> service dependencies in their constructor (and keeping them as a final
>> field), rather than getting them via the global namespace (singleton
>> instances).
>>
>> If I had it my way, I'd probably use a dependency injection framework,
>> namely Dagger which is as far as I knpw the lightest Java DI framework
>> actively developed (jointly developed by Square and Google's Java team
>> responsible for Guice & Guava), which has a neat compile-time annotation
>> processor that detects missing dependencies early on. It works with both
>> Android and J2SE and is very fast, simple and light (65kB vs 710kB for
>> Guice).
>>
>> So, the

NPE in conditional updates w/ collections in 2.0.7

2014-05-16 Thread Brian O'Neill

OK ‹ we¹ve got some hyper data modeling going on, taking advantage of all
the latest toys in CQL 2.  And we ran into some trouble using maps within
conditional updates.  Specifically, when testing to see if a key exists in a
map (with =null?), we encounter an NPE server-side.  We believe this worked
in 2.0.4.

With this schema:
CREATE TABLE progress (
key text,
count int,
partitions map,
primary key (key)
);

When executing the following:
cqlsh:hms> UPDATE foo SET count=4962 WHERE key='PA' IF partitions['a']=null;

 [applied]
---
 False

cqlsh:hms> UPDATE foo SET count=4962 WHERE key='PA';
cqlsh:hms> UPDATE foo SET count=4962 WHERE key='PA' IF partitions['a']=null;
TSocket read 0 bytes

We see the following NPE server-side:
ERROR [Native-Transport-Requests:13353] 2014-05-15 15:10:00,154
QueryMessage.java (line 131) Unexpected error during query
java.lang.NullPointerException
at 
org.apache.cassandra.cql3.ColumnCondition$WithVariables.collectionAppliesTo(
ColumnCondition.java:168)
at 
org.apache.cassandra.cql3.ColumnCondition$WithVariables.appliesTo(ColumnCond
ition.java:142)
at 
org.apache.cassandra.cql3.statements.CQL3CasConditions$ColumnsConditions.app
liesTo(CQL3CasConditions.java:197)
at 
org.apache.cassandra.cql3.statements.CQL3CasConditions.appliesTo(CQL3CasCond
itions.java:108)

Is there a better way to test for existence of a key?
Or is this a bug?  (Regardless, we may want to protect against the NPE)
Or am I missing something entirely?

-brian

---
Brian O'Neill
Chief Technology Officer


Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 




Re: NPE in conditional updates w/ collections in 2.0.7

2014-05-16 Thread Brian O'Neill
Perfect.  Thanks Tyler.

Great to hear you guys are already on top of it.  I’ll watch for the
resolution.

-brian

---
Brian O'Neill
Chief Technology Officer

Health Market Science
The Science of Better Results
2700 Horizon Drive • King of Prussia, PA • 19406
M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42>  •
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 5/16/14, 12:25 PM, "Tyler Hobbs"  wrote:

>Hi Brian,
>
>Thanks for the report.  This looks like
>https://issues.apache.org/jira/browse/CASSANDRA-7155, which should be
>fixed
>shortly.
>
>
>On Thu, May 15, 2014 at 3:23 PM, Brian O'Neill
>wrote:
>
>>
>> OK ‹ we¹ve got some hyper data modeling going on, taking advantage of
>>all
>> the latest toys in CQL 2.  And we ran into some trouble using maps
>>within
>> conditional updates.  Specifically, when testing to see if a key exists
>>in
>> a
>> map (with =null?), we encounter an NPE server-side.  We believe this
>>worked
>> in 2.0.4.
>>
>> With this schema:
>> CREATE TABLE progress (
>> key text,
>> count int,
>> partitions map,
>> primary key (key)
>> );
>>
>> When executing the following:
>> cqlsh:hms> UPDATE foo SET count=4962 WHERE key='PA' IF
>> partitions['a']=null;
>>
>>  [applied]
>> ---
>>  False
>>
>> cqlsh:hms> UPDATE foo SET count=4962 WHERE key='PA';
>> cqlsh:hms> UPDATE foo SET count=4962 WHERE key='PA' IF
>> partitions['a']=null;
>> TSocket read 0 bytes
>>
>> We see the following NPE server-side:
>> ERROR [Native-Transport-Requests:13353] 2014-05-15 15:10:00,154
>> QueryMessage.java (line 131) Unexpected error during query
>> java.lang.NullPointerException
>> at
>>
>> 
>>org.apache.cassandra.cql3.ColumnCondition$WithVariables.collectionApplies
>>To(
>> ColumnCondition.java:168)
>> at
>>
>> 
>>org.apache.cassandra.cql3.ColumnCondition$WithVariables.appliesTo(ColumnC
>>ond
>> ition.java:142)
>> at
>>
>> 
>>org.apache.cassandra.cql3.statements.CQL3CasConditions$ColumnsConditions.
>>app
>> liesTo(CQL3CasConditions.java:197)
>> at
>>
>> 
>>org.apache.cassandra.cql3.statements.CQL3CasConditions.appliesTo(CQL3CasC
>>ond
>> itions.java:108)
>>
>> Is there a better way to test for existence of a key?
>> Or is this a bug?  (Regardless, we may want to protect against the NPE)
>> Or am I missing something entirely?
>>
>> -brian
>>
>> ---
>> Brian O'Neill
>> Chief Technology Officer
>>
>>
>> Health Market Science
>> The Science of Better Results
>> 2700 Horizon Drive € King of Prussia, PA € 19406
>> M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
>> healthmarketscience.com
>>
>>
>> This information transmitted in this email message is for the intended
>> recipient only and may contain confidential and/or privileged material.
>>If
>> you received this email in error and are not the intended recipient, or
>>the
>> person responsible to deliver it to the intended recipient, please
>>contact
>> the sender at the email above and delete this email and any attachments
>>and
>> destroy any copies thereof. Any review, retransmission, dissemination,
>> copying or other use of, or taking any action in reliance upon, this
>> information by persons or entities other than the intended recipient is
>> strictly prohibited.
>>
>>
>>
>>
>
>
>-- 
>Tyler Hobbs
>DataStax <http://datastax.com/>




Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Brian O'Neill
I¹m +1.  

We¹ve had one foot out the door for a while now.

We are throwing resources at CQL. (e.g. storm-cassandra-cql)  And we are
slowing support for the thrift-based implementation (e.g. storm-cassandra).

Alas poor Thrift, I knew him (well).

-brian

---
Brian O'Neill
Chief Technology Officer

Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 3/11/14, 12:27 PM, "sankalp kohli"  wrote:

>RIP Thrift :)
>+1 with "We will retain it for backwards compatibility". Hopefully most
>people will move out of thrift by 2.1
>
>
>On Tue, Mar 11, 2014 at 10:18 AM, Brandon Williams 
>wrote:
>
>> As someone who has written a thrift wrapper, +1
>>
>>
>> On Tue, Mar 11, 2014 at 12:00 PM, Jonathan Ellis 
>> wrote:
>>
>> > CQL3 is almost two years old now and has proved to be the better API
>> > that Cassandra needed.  CQL drivers have caught up with and passed the
>> > Thrift ones in terms of features, performance, and usability.  CQL is
>> > easier to learn and more productive than Thrift.
>> >
>> > With static columns and LWT batch support [1] landing in 2.0.6, and
>> > UDT in 2.1 [2], I don't know of any use cases for Thrift that can't be
>> > done in CQL.  Contrawise, CQL makes many things easy that are
>> > difficult to impossible in Thrift.  New development is overwhelmingly
>> > done using CQL.
>> >
>> > To date we have had an unofficial and poorly defined policy of "add
>> > support for new features to Thrift when that is 'easy.'"  However,
>> > even relatively simple Thrift changes can create subtle complications
>> > for the rest of the server; for instance, allowing Thrift range
>> > tombtones would make filter conversion for CASSANDRA-6506 more
>> > difficult.
>> >
>> > Thus, I think it's time to officially close the book on Thrift.  We
>> > will retain it for backwards compatibility, but we will commit to
>> > adding no new features or changes to the Thrift API after 2.1.0.  This
>> > will help send an unambiguous message to users and eliminate any
>> > remaining confusion from supporting two APIs.  If any new use cases
>> > come to light that can be done with Thrift but not CQL, we will commit
>> > to supporting those in CQL.
>> >
>> > (To a large degree, this merely formalizes what is already de facto
>> > reality.  Most thrift clients have not even added support for
>> > atomic_batch_mutate and cas from 2.0, and popular clients like
>> > Astyanax are migrating to the native protocol.)
>> >
>> > Reasonable?
>> >
>> > [1] https://issues.apache.org/jira/browse/CASSANDRA-6561
>> > [2] https://issues.apache.org/jira/browse/CASSANDRA-5590
>> >
>> > --
>> > Jonathan Ellis
>> > Project Chair, Apache Cassandra
>> > co-founder, http://www.datastax.com
>> > @spyced
>> >
>>




Re: "[applied]" column in ModificationStatement?

2014-02-06 Thread Brian O'Neill
Thanks Jonathan.  
It feels a little weird, but that will work.

Not a big deal, but maybe we could include a wasApplied() method on the
ResultSet in the future that would insulate clients from the ResultSet
schema/column name.

-brian


---
Brian O'Neill
Chief Technology Officer

Health Market Science
The Science of Better Results
2700 Horizon Drive • King of Prussia, PA • 19406
M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42>  •
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 2/6/14, 2:58 PM, "Jonathan Ellis"  wrote:

>In Cassandra, it's ModificationStatement.CAS_RESULT_COLUMN.text
>
>On Thu, Feb 6, 2014 at 10:22 AM, Brian O'Neill 
>wrote:
>> Silly questionŠ
>>
>> Using the CQL driver for conditional updates, I¹m looking into the
>>ResultSet
>> that comes back:
>> for (ColumnDefinitions.Definition definition :
>> results.getColumnDefinitions().asList()) {
>> for (Row row : results.all()) {
>> LOG.debug("UPDATE APPLIED = [{}]=[{}]",
>> definition.getName(), row.getBool(definition.getName()));
>> }
>> }
>>
>> I noticed that the ResultSet of a conditional update contains a column
>> ³[applied]², with a boolean indicating whether or not the update was
>> applied.
>>
>> I assume this column name comes from:
>> 
>>src/java/org/apache/cassandra/cql3/statements/ModificationStatement.java:
>>50
>>private static final ColumnIdentifier CAS_RESULT_COLUMN = new
>> ColumnIdentifier("[applied]", false);
>>
>> Does it make sense to expose this column name as a String constant
>> somewhere?
>> Either in the CQL java-driver, or Cassandra itself?
>>
>> -brian
>>
>> ---
>> Brian O'Neill
>> Chief Technology Officer
>>
>>
>> Health Market Science
>> The Science of Better Results
>> 2700 Horizon Drive € King of Prussia, PA € 19406
>> M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
>> healthmarketscience.com
>>
>>
>> This information transmitted in this email message is for the intended
>> recipient only and may contain confidential and/or privileged material.
>>If
>> you received this email in error and are not the intended recipient, or
>>the
>> person responsible to deliver it to the intended recipient, please
>>contact
>> the sender at the email above and delete this email and any attachments
>>and
>> destroy any copies thereof. Any review, retransmission, dissemination,
>> copying or other use of, or taking any action in reliance upon, this
>> information by persons or entities other than the intended recipient is
>> strictly prohibited.
>>
>>
>>
>
>
>
>-- 
>Jonathan Ellis
>Project Chair, Apache Cassandra
>co-founder, http://www.datastax.com
>@spyced




"[applied]" column in ModificationStatement?

2014-02-06 Thread Brian O'Neill
Silly questionŠ

Using the CQL driver for conditional updates, I¹m looking into the ResultSet
that comes back:
for (ColumnDefinitions.Definition definition :
results.getColumnDefinitions().asList()) {
for (Row row : results.all()) {
LOG.debug("UPDATE APPLIED = [{}]=[{}]",
definition.getName(), row.getBool(definition.getName()));
}
}

I noticed that the ResultSet of a conditional update contains a column
³[applied]², with a boolean indicating whether or not the update was
applied.

I assume this column name comes from:
src/java/org/apache/cassandra/cql3/statements/ModificationStatement.java:50
   private static final ColumnIdentifier CAS_RESULT_COLUMN = new
ColumnIdentifier("[applied]", false);

Does it make sense to expose this column name as a String constant
somewhere? 
Either in the CQL java-driver, or Cassandra itself?

-brian

---
Brian O'Neill
Chief Technology Officer


Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 




Re: Dimensional SUM, COUNT, & DISTINCT in C* (replacing Acunu)

2013-12-18 Thread Brian O'Neill

Thanks for the pointer Alain.

At a quick glance, it looks like people are looking for query time
filtering/aggregation, which will suffice for small data sets.  Hopefully we
might be able to extend that to perform pre-computations as well. (which
would support much larger data sets / volumes)

I¹ll continue the discussion on the issue.

thanks again,
brian


---
Brian O'Neill
Chief Architect
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Alain RODRIGUEZ 
Reply-To:  
Date:  Wednesday, December 18, 2013 at 5:13 AM
To:  
Cc:  "dev@cassandra.apache.org" 
Subject:  Re: Dimensional SUM, COUNT, & DISTINCT in C* (replacing Acunu)

Hi, this would indeed be much appreciated by a lot of people.

There is this issue, existing about this subject

 https://issues.apache.org/jira/browse/CASSANDRA-4914

Maybe could you help commiters there.

Hope this will be usefull to you.

Please let us know when you find a way to do these operations.

Cheers.


2013/12/18 Brian O'Neill 
> We are seeking to replace Acunu in our technology stack / platform.  It is the
> only component in our stack that is not open source.
> 
> In preparation, over the last few weeks I¹ve migrated Virgil to CQL.   The
> vision is that Virgil could receive a REST request to upsert/delete data
> (hierarchical JSON to support collections).  Virgil would lookup the
> dimensions/aggregations for that table, add the key to the pertinent
> dimensional tables (e.g. DISTINCT), incorporate values into aggregations (e.g.
> SUMs) and increment/decrement relevant counters (COUNT).  (using additional
> CF¹s)
> 
> This seems straight-forward, but appears to require a read-before-write.
> (e.g. read the current value of a SUM, incorporate the new value, then use the
> lightweight transactions of C* 2.0 to conditionally update the value.)
> 
> Before I go down this path, I was wondering if anyone is designing/working on
> the same, perhaps at a lower level?  (CQL?)
> 
> Is there any intent to support aggregations/filters (COUNT, SUM, DISTINCT,
> etc) at the CQL level?  If so, is there a preliminary design?
> 
> I can see a lower-level approach, which would leverage the commit logs (and
> mem/sstables) and perform the aggregation during read-operations (and
> flush/compaction).
> 
> thoughts?  i'm open to all ideas.
> 
> -brian
> -- 
> Brian ONeill
> Chief Architect, Health Market Science (http://healthmarketscience.com)
> mobile:215.588.6024 
> blog: http://brianoneill.blogspot.com/
> twitter: @boneill42





Dimensional SUM, COUNT, & DISTINCT in C* (replacing Acunu)

2013-12-17 Thread Brian O'Neill
We are seeking to replace Acunu in our technology stack / platform.  It is
the only component in our stack that is not open source.

In preparation, over the last few weeks I’ve migrated Virgil to CQL.   The
vision is that Virgil could receive a REST request to upsert/delete data
(hierarchical JSON to support collections).  Virgil would lookup the
dimensions/aggregations for that table, add the key to the pertinent
dimensional tables (e.g. DISTINCT), incorporate values into aggregations
(e.g. SUMs) and increment/decrement relevant counters (COUNT).  (using
additional CF’s)

This seems straight-forward, but appears to require a read-before-write.
 (e.g. read the current value of a SUM, incorporate the new value, then use
the lightweight transactions of C* 2.0 to conditionally update the value.)

Before I go down this path, I was wondering if anyone is designing/working
on the same, perhaps at a lower level?  (CQL?)

Is there any intent to support aggregations/filters (COUNT, SUM, DISTINCT,
etc) at the CQL level?  If so, is there a preliminary design?

I can see a lower-level approach, which would leverage the commit logs (and
mem/sstables) and perform the aggregation during read-operations (and
flush/compaction).

thoughts?  i'm open to all ideas.

-brian
-- 
Brian ONeill
Chief Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: Submit enhancements via pull requests?

2013-12-05 Thread Brian O'Neill
Thanks Jeremiah.  Done.
https://issues.apache.org/jira/browse/CASSANDRA-6453


-brian

---
Brian O'Neill
Chief Architect
Health Market Science
The Science of Better Results
2700 Horizon Drive • King of Prussia, PA • 19406
M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42>  •
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 12/5/13, 10:47 AM, "Jeremiah D Jordan"  wrote:

>JIRA + patch or link to git branch
>
>-Jeremiah
>
>On Dec 5, 2013, at 9:44 AM, Brian O'Neill  wrote:
>
>> 
>> Sorry guys, it¹s been a while since I submitted a patch.
>> 
>> I see there are a number of outstanding pull requests:
>> https://github.com/apache/cassandra/pulls
>> 
>> Are we able to submit enhancements via pull requests on github now?
>> Or are we still using JIRA + patches?
>> 
>> (I have a very minor change to an error message that I¹d like to get in
>> there)
>> 
>> thanks,
>> brian
>> 
>> ---
>> Brian O'Neill
>> Chief Architect
>> Health Market Science
>> The Science of Better Results
>> 2700 Horizon Drive € King of Prussia, PA € 19406
>> M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
>> healthmarketscience.com
>> 
>> 
>> This information transmitted in this email message is for the intended
>> recipient only and may contain confidential and/or privileged material.
>>If
>> you received this email in error and are not the intended recipient, or
>>the
>> person responsible to deliver it to the intended recipient, please
>>contact
>> the sender at the email above and delete this email and any attachments
>>and
>> destroy any copies thereof. Any review, retransmission, dissemination,
>> copying or other use of, or taking any action in reliance upon, this
>> information by persons or entities other than the intended recipient is
>> strictly prohibited.
>> 
>> 
>> 
>




Submit enhancements via pull requests?

2013-12-05 Thread Brian O'Neill

Sorry guys, it¹s been a while since I submitted a patch.

I see there are a number of outstanding pull requests:
https://github.com/apache/cassandra/pulls

Are we able to submit enhancements via pull requests on github now?
Or are we still using JIRA + patches?

(I have a very minor change to an error message that I¹d like to get in
there)

thanks,
brian

---
Brian O'Neill
Chief Architect
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 




Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-12 Thread Brian O'Neill
@Jason,

I have a lot of experience with SOLR + ES, but mainly for search.  (i.e.
Finding the most relevant records given a query)
That's been working well, but now we have requirements to support
dashboards.  Those dashboards have aggregations in them (sum, average,
count(s), etc).  I have limited experience using filter functions and
facets to achieve similar things w/ Lucene, but they never seemed to
perform well when the sets were large.

If Lucene/SOLR/ES can support this kind of functionality, we'd gladly use
it instead. (Let me know!)

When we looked around, Druid seemed to fit the bill exactly: (and it was
open source)
http://metamarkets.com/2011/druid-part-i-real-time-analytics-at-a-billion-r
ows-per-second/

BTW, here is more information on the compression that Druid uses:
http://metamarkets.com/2012/druid-bitmap-compression/


To echo Matt's sentiment, we'd love to leverage a C* native capability for
this.
(Acunu provides most of the capability, but it isn't open source)

I think once we have the "conditional write" semantics that are coming, we
could layer this on top of C*. (extending the secondary indexes
functionality)

-brian



---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 4/12/13 12:46 AM, "Matt Stump"  wrote:

>You could embed Lucene, but then you pretty much have DSE search, and
>there
>are people on this list in a better position than I to describe
>the difficulty in making that scale. By rolling your own you get
>simplicity
>and control. If you use a uniform index size you can just assign chunks of
>it to the cassandra ring making it easy to distribute queries. I think
>that
>using Lucene in this way would cause most of the benefit of the library to
>be lost, and add unnecessary complexity. If Lucene were easy, then I think
>given the team's experience with both Lucene and C* it would have been
>done
>already.
>
>Sorry if it's a fuzzy answer, but I haven't run down every technical angle
>on the integration with C* yet. The idea was still very much in the
>wouldn't it be very cool if this thing lived in Cassandra. It would be the
>nail in the coffin for impala, redshift, et al.
>
>
>On Thu, Apr 11, 2013 at 3:15 PM, Jason Rutherglen <
>jason.rutherg...@gmail.com> wrote:
>
>> What's the advantage over Lucene?
>>
>>
>> On Wed, Apr 10, 2013 at 10:43 PM, Matt Stump 
>> wrote:
>>
>> > Druid was our inspiration to layer bitmap indexes on top of Cassandra.
>> > Druid doesn't work for us because or data set is too large. We would
>>need
>> > many hundreds of nodes just for the pre-processed data. What I
>>envisioned
>> > was the ability to perform druid style queries (no aggregation)
>>without
>> the
>> > limitations imposed by having the entire dataset in memory. I
>>primarily
>> > need to query whether a user performed some event, but I also intend
>>to
>> add
>> > trigram indexes for LIKE, ILIKE or possibly regex style matching.
>> >
>> > I wasn't aware of CONCISE, thanks for the pointer. We are currently
>> > evaluating fastbit, which is a very similar project:
>> > https://sdm.lbl.gov/fastbit/
>> >
>> >
>> > On Wed, Apr 10, 2013 at 5:49 PM, Brian O'Neill > > >wrote:
>> >
>> > >
>> > > How does this compare with Druid?
>> > > https://github.com/metamx/druid
>> > >
>> > > We're currently evaluating Acunu, Vertica and Druid...
>> > >
>> > >
>> >
>> 
>>http://brianoneill.blogspot.com/2013/04/bianalytics-on-big-datacassandra.
>>html
>> > >
>> > > With its bitmapped indexes, Druid appears to have the most
>>potential.
>> > > They boast some pretty impressive stats, especially WRT handling
>> > > "real-time" updates and adding

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-10 Thread Brian O'Neill

How does this compare with Druid?
https://github.com/metamx/druid

We're currently evaluating Acunu, Vertica and Druid...
http://brianoneill.blogspot.com/2013/04/bianalytics-on-big-datacassandra.html

With its bitmapped indexes, Druid appears to have the most potential.  
They boast some pretty impressive stats, especially WRT handling "real-time" 
updates and adding new dimensions.

They also use a compression algorithm, CONCISE, to cut down on the space 
requirements.
http://ricerca.mat.uniroma3.it/users/colanton/concise.html

I haven't looked too deep into the Druid code, but I've been meaning to see if 
it could be backed by C*.

We'd be game to join the hunt if you pursue such a beast. (with your code, or 
with portions of Druid)

-brian


On Apr 10, 2013, at 5:40 PM, mrevilgnome wrote:

> What do you think about set manipulation via indexes in Cassandra? I'm
> interested in answering queries such as give me all users that performed
> event 1, 2, and 3, but not 4. If the answer is yes than I can make a case
> for spending my time on C*. The only downside for us would be our current
> prototype is in C++ so we would loose some performance and the ability to
> dedicate an entire machine to caching/performing queries.
> 
> 
> On Wed, Apr 10, 2013 at 11:57 AM, Jonathan Ellis  wrote:
> 
>> If you mean, "Can someone help me figure out how to get started updating
>> these old patches to trunk and cleaning out the Avro?" then yes, I've been
>> knee-deep in indexing code recently.
>> 
>> 
>> On Wed, Apr 10, 2013 at 11:34 AM, mrevilgnome 
>> wrote:
>> 
>>> I'm currently building a distributed cluster on top of cassandra to
>> perform
>>> fast set manipulation via bitmap indexes. This gives me the ability to
>>> perform unions, intersections, and set subtraction across sub-queries.
>>> Currently I'm storing index information for thousands of dimensions as
>>> cassandra rows, and my cluster keeps this information cached, distributed
>>> and replicated in order to answer queries.
>>> 
>>> Every couple of days I think to myself this should really exist in C*.
>>> Given all the benifits would there be any interest in
>>> reviving CASSANDRA-1472?
>>> 
>>> Some downsides are that this is very memory intensive, even for sparse
>>> bitmaps.
>>> 
>> 
>> 
>> 
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder, http://www.datastax.com
>> @spyced
>> 

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



Re: Compund/Composite column names

2013-01-09 Thread Brian O'Neill
Sorry, just got time to submit it.

Here you go:
https://issues.apache.org/jira/browse/CASSANDRA-5138

-brian

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Sylvain Lebresne 
Date:  Monday, December 17, 2012 10:35 AM
To:  
Cc:  Vivek Mishra , Brian O'Neill

Subject:  Re: Compund/Composite column names

Feel free to open a ticket with steps to reproduce. We can certainly throw a
more meaningful exception.


On Mon, Dec 17, 2012 at 4:11 PM, Edward Capriolo 
wrote:
> This was discussed in one of the tickets. The problem is that CQL3's sparse
> tables is it has different metadata that has NOT been added to thrift's
> CFMetaData. Thus thrift is unaware of exactly how to verify the insert.
> 
> Originally it was made impossible for thrift to see a sparse table (but
> that restriction has been lifted) it seems. It is probably a bad idea to
> thrift insert into a sparse table until Cassandra does not have two
> distinct sources of meta information.
> 
> 
> 
> 
> 
> On Mon, Dec 17, 2012 at 9:52 AM, Vivek Mishra wrote:
> 
>> > Looks like Thrift API is not working as expected?
>> >
>> > -Vivek
>> >
>> >
>> >
>> >
>> > 
>> >  From: Brian O'Neill 
>> > To: dev@cassandra.apache.org
>> > Cc: Vivek Mishra 
>> > Sent: Monday, December 17, 2012 8:12 PM
>> > Subject: Re: Compund/Composite column names
>> >
>> > FYI -- I'm still seeing this on 1.2-beta1.
>> >
>> > If you create a table via CQL, then insert into it (via Java API) with
>> > an incorrect number of components.  The insert works, but select *
>> > from CQL results in a TSocket read error.
>> >
>> > I showed this in the webinar last week, just in case people ran into
>> > it.  It would be great to translate the ArrayIndexOutofBoundsException
>> > from the server side into something meaningful in cqlsh to help people
>> > diagnose the problem.  (a regular user probably doesn't have access to
>> > the server-side logs)
>> >
>> > You can see it at minute 41 in the video from the webinar:
>> > http://www.youtube.com/watch?v=AdfugJxfd0o&feature=youtu.be
>> >
>> > -brian
>> >
>> >
>> > On Tue, Oct 9, 2012 at 9:39 AM, Jonathan Ellis  wrote:
>>> > > Sounds like you're running into the keyspace drop bug.  It's "mostly"
>> > fixed
>>> > > in 1.1.5 but you might need the latest from 1.1 branch.  1.1.6 will be
>>> > > released soon with the final fix.
>>> > > On Oct 9, 2012 1:58 AM, "Vivek Mishra"  wrote:
>>> > >
>>>> > >>
>>>> > >>
>>>> > >> Ok. I am able to understand the problem now. Issue is:
>>>> > >>
>>>> > >> If i create a column family altercations as:
>>>> > >>
>>>> > >>
>>>> > >>
>> > 
>> *
>> *8
>>>> > >> CREATE TABLE altercations (
>>>> > >>instigator text,
>>>> > >>started_at timestamp,
>>>> > >>ships_destroyed int,
>>>> > >>energy_used float,
>>>> > >>alliance_involvement boolean,
>>>> > >>PRIMARY KEY (instigator,started_at,ships_destroyed)
>>>> > >>);
>>>> > >> /
>>>> > >>INSERT INTO altercations (instigator, started_at, ships_destroyed,
>>&g

Re: Compund/Composite column names

2012-12-17 Thread Brian O'Neill

Will do.  

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Sylvain Lebresne 
Date:  Monday, December 17, 2012 10:35 AM
To:  
Cc:  Vivek Mishra , Brian O'Neill

Subject:  Re: Compund/Composite column names

Feel free to open a ticket with steps to reproduce. We can certainly throw a
more meaningful exception.


On Mon, Dec 17, 2012 at 4:11 PM, Edward Capriolo 
wrote:
> This was discussed in one of the tickets. The problem is that CQL3's sparse
> tables is it has different metadata that has NOT been added to thrift's
> CFMetaData. Thus thrift is unaware of exactly how to verify the insert.
> 
> Originally it was made impossible for thrift to see a sparse table (but
> that restriction has been lifted) it seems. It is probably a bad idea to
> thrift insert into a sparse table until Cassandra does not have two
> distinct sources of meta information.
> 
> 
> 
> 
> 
> On Mon, Dec 17, 2012 at 9:52 AM, Vivek Mishra wrote:
> 
>> > Looks like Thrift API is not working as expected?
>> >
>> > -Vivek
>> >
>> >
>> >
>> >
>> > 
>> >  From: Brian O'Neill 
>> > To: dev@cassandra.apache.org
>> > Cc: Vivek Mishra 
>> > Sent: Monday, December 17, 2012 8:12 PM
>> > Subject: Re: Compund/Composite column names
>> >
>> > FYI -- I'm still seeing this on 1.2-beta1.
>> >
>> > If you create a table via CQL, then insert into it (via Java API) with
>> > an incorrect number of components.  The insert works, but select *
>> > from CQL results in a TSocket read error.
>> >
>> > I showed this in the webinar last week, just in case people ran into
>> > it.  It would be great to translate the ArrayIndexOutofBoundsException
>> > from the server side into something meaningful in cqlsh to help people
>> > diagnose the problem.  (a regular user probably doesn't have access to
>> > the server-side logs)
>> >
>> > You can see it at minute 41 in the video from the webinar:
>> > http://www.youtube.com/watch?v=AdfugJxfd0o&feature=youtu.be
>> >
>> > -brian
>> >
>> >
>> > On Tue, Oct 9, 2012 at 9:39 AM, Jonathan Ellis  wrote:
>>> > > Sounds like you're running into the keyspace drop bug.  It's "mostly"
>> > fixed
>>> > > in 1.1.5 but you might need the latest from 1.1 branch.  1.1.6 will be
>>> > > released soon with the final fix.
>>> > > On Oct 9, 2012 1:58 AM, "Vivek Mishra"  wrote:
>>> > >
>>>> > >>
>>>> > >>
>>>> > >> Ok. I am able to understand the problem now. Issue is:
>>>> > >>
>>>> > >> If i create a column family altercations as:
>>>> > >>
>>>> > >>
>>>> > >>
>> > 
>> *
>> *8
>>>> > >> CREATE TABLE altercations (
>>>> > >>instigator text,
>>>> > >>started_at timestamp,
>>>> > >>ships_destroyed int,
>>>> > >>energy_used float,
>>>> > >>alliance_involvement boolean,
>>>> > >>PRIMARY KEY (instigator,started_at,ships_destroyed)
>>>> > >>);
>>>> > >> /
>>>> > >>INSERT INTO altercations (instigator, started_at, ships_destroyed,
>>>> > >>  energy_used, alliance_involvement)
>>>> > >>  VALUES ('Jayne Cobb', '2012-07-23', 2, 4.6,
>> > '

Re: Compund/Composite column names

2012-12-17 Thread Brian O'Neill
FYI -- I'm still seeing this on 1.2-beta1.

If you create a table via CQL, then insert into it (via Java API) with
an incorrect number of components.  The insert works, but select *
from CQL results in a TSocket read error.

I showed this in the webinar last week, just in case people ran into
it.  It would be great to translate the ArrayIndexOutofBoundsException
from the server side into something meaningful in cqlsh to help people
diagnose the problem.  (a regular user probably doesn't have access to
the server-side logs)

You can see it at minute 41 in the video from the webinar:
http://www.youtube.com/watch?v=AdfugJxfd0o&feature=youtu.be

-brian


On Tue, Oct 9, 2012 at 9:39 AM, Jonathan Ellis  wrote:
> Sounds like you're running into the keyspace drop bug.  It's "mostly" fixed
> in 1.1.5 but you might need the latest from 1.1 branch.  1.1.6 will be
> released soon with the final fix.
> On Oct 9, 2012 1:58 AM, "Vivek Mishra"  wrote:
>
>>
>>
>> Ok. I am able to understand the problem now. Issue is:
>>
>> If i create a column family altercations as:
>>
>>
>> **8
>> CREATE TABLE altercations (
>>instigator text,
>>started_at timestamp,
>>ships_destroyed int,
>>energy_used float,
>>alliance_involvement boolean,
>>PRIMARY KEY (instigator,started_at,ships_destroyed)
>>);
>> /
>>INSERT INTO altercations (instigator, started_at, ships_destroyed,
>>  energy_used, alliance_involvement)
>>  VALUES ('Jayne Cobb', '2012-07-23', 2, 4.6, 'false');
>>
>> *
>>
>> It works!
>>
>> But if i create a column family with compound primary key with 2 composite
>> column as:
>>
>>
>> *
>> CREATE TABLE altercations (
>>instigator text,
>>started_at timestamp,
>>ships_destroyed int,
>>energy_used float,
>>alliance_involvement boolean,
>>PRIMARY KEY (instigator,started_at)
>>);
>>
>>
>> *
>> and Then drop this column family:
>>
>>
>> *
>> drop columnfamily altercations;
>>
>> *
>>
>> and then try to create same one with primary compound key with 3 composite
>> column:
>>
>>
>> *
>>
>> CREATE TABLE altercations (
>>instigator text,
>>started_at timestamp,
>>ships_destroyed int,
>>energy_used float,
>>alliance_involvement boolean,
>>PRIMARY KEY (instigator,started_at,ships_destroyed)
>>);
>>
>> *
>>
>> it gives me error: "TSocket read 0 bytes"
>>
>> Rest, as no column family is created, so nothing onwards will work.
>>
>> Is this an issue?
>>
>> -Vivek
>>
>>
>> 
>>  From: Jonathan Ellis 
>> To: dev@cassandra.apache.org; Vivek Mishra 
>> Sent: Tuesday, October 9, 2012 9:08 AM
>> Subject: Re: Compund/Composite column names
>>
>> Works for me on latest 1.1 in cql3 mode.  cql2 mode gives a parse error.
>>
>> On Mon, Oct 8, 2012 at 9:18 PM, Vivek Mishra 
>> wrote:
>> > Hi All,
>> >
>> > I am trying to use compound primary key column name and i am referring
>> to:
>> > http://www.datastax.com/dev/blog/whats-new-in-cql-3-0
>> >
>> >
>> > As mentioned on this example, i tried to create a column family
>> containing compound primary key (one or more) as:
>> >
>> >  CREATE TABLE altercations (
>> >instigator text,
>> >started_at timestamp,
>> >ships_destroyed int,
>> >energy_used float,
>> >alliance_involvement boolean,
>> >PRIMARY KEY (instigator,started_at,ships_destroyed)
>> >);
>> >
>> > And i am getting:
>> >
>> >
>> > **
>> > TSocket read 0 bytes
>> > cqlsh:testcomp>
>> > **
>> >
>> >
>> > Then followed by insert and select statements giving me following errors:
>> >
>> >
>> 
>> >
>> > cqlsh:testcomp>INSERT INTO altercations (instigator, started_at,
>> ships_destroyed,
>> > 

Re: CQL/CLI Experiments w/ 1.2

2012-12-10 Thread Brian O'Neill

Thanks for the explanation(s).

I'm going to give a "Create your first java app for Cassandra" webinar on
Wednesday, and I was trying to embrace schema creation in CQL, but didn't
want to have to use CompositeType's right off the bat.  (I'll go with
compact storage)

I think I can explain away the empty row/column, but we should probably
publicize that.  I can see that question coming up on every client/api
user list. (hector, astyanax, etc.)

-brian


---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 12/10/12 3:16 AM, "Sylvain Lebresne"  wrote:

>There is some more details in
>http://www.datastax.com/dev/blog/thrift-to-cql3 but to answer your
>questions:
>
>
>> Question 1:
>> What is the empty column/value?
>
>
>The technical reasons are here:
>https://issues.apache.org/jira/browse/CASSANDRA-4361. But basically, it's
>a
>CQL3 implementation detail.
>
>Question 2:
>> It also appears as though the column names are CompositeType even
>> though there is only one component:
>
>
>Yes, it is the case, and the reason is that this is required to accept
>collections (even if you don't use collection initially, not using a
>composite means you wouldn't be able to add some later). If you explicitly
>don't want a compositeType underneath, you'll need to use the 'WITH
>COMPACT
>STORAGE' option (in which case you will not be able to use collections
>obviously).
>
>--
>Sylvain




CQL/CLI Experiments w/ 1.2

2012-12-09 Thread Brian O'Neill
I'm using the following schema and data:
CREATE TABLE children ( childId varchar, firstName varchar, lastName
varchar, timezone varchar, PRIMARY KEY (childId ) );
insert into children (childId, firstName, lastName, timezone) values
('bart.simpson', 'Bart', 'Simpson', 'PST');
insert into children (childId, firstName, lastName, timezone) values
('dennis.menace', 'Dennis', 'Menace', 'PST');

All is well on the CQL side of things, but when I go over into CLI, I
see the following:

[default@northpole] list children;
Using default limit of 100
Using default column limit of 100
---
RowKey: bart.simpson
=> (column=, value=, timestamp=1355116106465000)
=> (column=firstname, value=42617274, timestamp=1355116106465000)
=> (column=lastname, value=53696d70736f6e, timestamp=1355116106465000)
=> (column=timezone, value=505354, timestamp=1355116106465000)
---
RowKey: dennis.menace
=> (column=, value=, timestamp=1355116106466000)
=> (column=firstname, value=44656e6e6973, timestamp=1355116106466000)
=> (column=lastname, value=4d656e616365, timestamp=1355116106466000)
=> (column=timezone, value=505354, timestamp=1355116106466000)

Question 1:
What is the empty column/value?   I ask because it causes
confusion/issues when accessing it from a Java API. (like Astyanax)
That column and value are in the result set.  Should clients start
ignoring empty column names/values?

Question 2:
It also appears as though the column names are CompositeType even
though there is only one component:  (below is from CLI)
  Columns sorted by:
org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type)

Because of that, I would need to use CompositeTypes in my java app to
insert into the table.
Is there any way to create a table via CQL3 that doesn't force me to
use Composite types in my Java app?
(In CQL2, we could specify comparators, but I don't see that in CQL3)

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: TSocket read 0 bytes from cqlsh

2012-10-04 Thread Brian O'Neill
Scratch that it can change on a per column basis.

Strange world this Java API vs. CQL.

-brian

On Thu, Oct 4, 2012 at 3:57 PM, Brian O'Neill  wrote:
> Actually, I found the underlying issue...
>
> CQL appends the *name* of the "value" column into the compound key.
>
> Using the previous schema:
> insert into data (uid, t, foo, bar) values ('PI7JC8KRF6',
> '1349110576', 'foovalue', 'barvalue')
>
> list data;
> RowKey: PI7JC8KRF6
> => (column=1970-01-16 09:45:10-0500:foovalue:bar, value=barvalue,
> timestamp=1349380029082000)
>
> Notice "bar" is on the end of the column name.
>
> If you don't have that element represented from the Java API (in this
> case, w/ Astyanax), you end up with misaligned interpretation of the
> compound key.  I'll add an extra element to the composite type in
> Astyanax, which should fix things.  I'll also add this to my blog so
> other people don't get tripped up.
>
> Any insight into why CQL puts that in column name?
> Where does it store the metadata related to compound key
> interpretation? Wouldn't that be a better place for that since it
> shouldn't change within a table?
>
> -brian
>
>
> On Thu, Oct 4, 2012 at 3:39 PM, Brian O'Neill  wrote:
>> Perfect. Tnx.
>>
>> On Thu, Oct 4, 2012 at 3:37 PM, Jonathan Ellis  wrote:
>>> Oh, I see.  I misunderstood at first.  Yes, the thrift side in 1.1
>>> doesn't validate cql3 composites.  This should be fixed in 1.2 beta1;
>>> see 
>>> https://issues.apache.org/jira/browse/CASSANDRA-4377?focusedCommentId=13436817&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13436817
>>>
>>> On Thu, Oct 4, 2012 at 2:31 PM, Brian O'Neill  wrote:
>>>> I was able to reproduce with CLI.  I'll send over the example as soon
>>>> as I can obfuscate it.
>>>>
>>>> -brian
>>>>
>>>> On Thu, Oct 4, 2012 at 3:19 PM, Jonathan Ellis  wrote:
>>>>> Nothing jumps out at me, varchar should be pretty straightforward.
>>>>> Probably going to need a test case.  (Even better if you can repro w/
>>>>> cli instead of needing Astyanax.)
>>>>>
>>>>> On Thu, Oct 4, 2012 at 2:15 PM, Brian O'Neill  
>>>>> wrote:
>>>>>> Obfuscated slightly
>>>>>>
>>>>>> The table is something simliar to:
>>>>>>
>>>>>> CREATE TABLE data (
>>>>>>   uid varchar,
>>>>>>   t timestamp,
>>>>>>   foo varchar,
>>>>>>   bar varchar,
>>>>>>   PRIMARY KEY (uid, t, foo, bar)
>>>>>> );
>>>>>>
>>>>>> Then I can insert just fine via Astyanax and I can see the row via
>>>>>> cli, but the select statement fails in cqlsh.
>>>>>>
>>>>>> The table is fine, when I only interact with it through CQL. I can
>>>>>> insert and select fine, until I insert a row from Asytanax.
>>>>>>
>>>>>> If needed, I can probably create a small test for this that I can share.
>>>>>>
>>>>>> -brian
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 4, 2012 at 3:08 PM, Jonathan Ellis  wrote:
>>>>>>> What kind of data did you insert, and what was expected?  Expected
>>>>>>> behavior would be to reject nonconforming data at insert time.
>>>>>>>
>>>>>>> On Thu, Oct 4, 2012 at 2:04 PM, Brian O'Neill  
>>>>>>> wrote:
>>>>>>>> This is probably already on your radar, but we could use a better
>>>>>>>> error message from cqlsh when the column key doesn't conform to the
>>>>>>>> expected schema...
>>>>>>>>
>>>>>>>> I accidentally inserted data using Astyanax that didn't conform to the
>>>>>>>> schema.  After that, selects from that table via cqlsh return no
>>>>>>>> useful information.
>>>>>>>> (CLI shows the data just fine)
>>>>>>>>
>>>>>>>>
>>>>>>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli
>>>>>>>> Connected to: "Test Cluster" on 127.0.0.1/9160
>>>>>>>> Welcome to Cassandra 

Re: TSocket read 0 bytes from cqlsh

2012-10-04 Thread Brian O'Neill
Actually, I found the underlying issue...

CQL appends the *name* of the "value" column into the compound key.

Using the previous schema:
insert into data (uid, t, foo, bar) values ('PI7JC8KRF6',
'1349110576', 'foovalue', 'barvalue')

list data;
RowKey: PI7JC8KRF6
=> (column=1970-01-16 09:45:10-0500:foovalue:bar, value=barvalue,
timestamp=1349380029082000)

Notice "bar" is on the end of the column name.

If you don't have that element represented from the Java API (in this
case, w/ Astyanax), you end up with misaligned interpretation of the
compound key.  I'll add an extra element to the composite type in
Astyanax, which should fix things.  I'll also add this to my blog so
other people don't get tripped up.

Any insight into why CQL puts that in column name?
Where does it store the metadata related to compound key
interpretation? Wouldn't that be a better place for that since it
shouldn't change within a table?

-brian


On Thu, Oct 4, 2012 at 3:39 PM, Brian O'Neill  wrote:
> Perfect. Tnx.
>
> On Thu, Oct 4, 2012 at 3:37 PM, Jonathan Ellis  wrote:
>> Oh, I see.  I misunderstood at first.  Yes, the thrift side in 1.1
>> doesn't validate cql3 composites.  This should be fixed in 1.2 beta1;
>> see 
>> https://issues.apache.org/jira/browse/CASSANDRA-4377?focusedCommentId=13436817&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13436817
>>
>> On Thu, Oct 4, 2012 at 2:31 PM, Brian O'Neill  wrote:
>>> I was able to reproduce with CLI.  I'll send over the example as soon
>>> as I can obfuscate it.
>>>
>>> -brian
>>>
>>> On Thu, Oct 4, 2012 at 3:19 PM, Jonathan Ellis  wrote:
>>>> Nothing jumps out at me, varchar should be pretty straightforward.
>>>> Probably going to need a test case.  (Even better if you can repro w/
>>>> cli instead of needing Astyanax.)
>>>>
>>>> On Thu, Oct 4, 2012 at 2:15 PM, Brian O'Neill  
>>>> wrote:
>>>>> Obfuscated slightly
>>>>>
>>>>> The table is something simliar to:
>>>>>
>>>>> CREATE TABLE data (
>>>>>   uid varchar,
>>>>>   t timestamp,
>>>>>   foo varchar,
>>>>>   bar varchar,
>>>>>   PRIMARY KEY (uid, t, foo, bar)
>>>>> );
>>>>>
>>>>> Then I can insert just fine via Astyanax and I can see the row via
>>>>> cli, but the select statement fails in cqlsh.
>>>>>
>>>>> The table is fine, when I only interact with it through CQL. I can
>>>>> insert and select fine, until I insert a row from Asytanax.
>>>>>
>>>>> If needed, I can probably create a small test for this that I can share.
>>>>>
>>>>> -brian
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Oct 4, 2012 at 3:08 PM, Jonathan Ellis  wrote:
>>>>>> What kind of data did you insert, and what was expected?  Expected
>>>>>> behavior would be to reject nonconforming data at insert time.
>>>>>>
>>>>>> On Thu, Oct 4, 2012 at 2:04 PM, Brian O'Neill  
>>>>>> wrote:
>>>>>>> This is probably already on your radar, but we could use a better
>>>>>>> error message from cqlsh when the column key doesn't conform to the
>>>>>>> expected schema...
>>>>>>>
>>>>>>> I accidentally inserted data using Astyanax that didn't conform to the
>>>>>>> schema.  After that, selects from that table via cqlsh return no
>>>>>>> useful information.
>>>>>>> (CLI shows the data just fine)
>>>>>>>
>>>>>>>
>>>>>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli
>>>>>>> Connected to: "Test Cluster" on 127.0.0.1/9160
>>>>>>> Welcome to Cassandra CLI version 1.1.5
>>>>>>>
>>>>>>> Type 'help;' or '?' for help.
>>>>>>> Type 'quit;' or 'exit;' to quit.
>>>>>>>
>>>>>>> [default@unknown] use cirrus;
>>>>>>> Authenticated to keyspace: cirrus
>>>>>>> [default@cirrus] list data;
>>>>>>> Using default limit of 100
>>>>>>> Using default column limit of 100
&

Re: TSocket read 0 bytes from cqlsh

2012-10-04 Thread Brian O'Neill
Perfect. Tnx.

On Thu, Oct 4, 2012 at 3:37 PM, Jonathan Ellis  wrote:
> Oh, I see.  I misunderstood at first.  Yes, the thrift side in 1.1
> doesn't validate cql3 composites.  This should be fixed in 1.2 beta1;
> see 
> https://issues.apache.org/jira/browse/CASSANDRA-4377?focusedCommentId=13436817&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13436817
>
> On Thu, Oct 4, 2012 at 2:31 PM, Brian O'Neill  wrote:
>> I was able to reproduce with CLI.  I'll send over the example as soon
>> as I can obfuscate it.
>>
>> -brian
>>
>> On Thu, Oct 4, 2012 at 3:19 PM, Jonathan Ellis  wrote:
>>> Nothing jumps out at me, varchar should be pretty straightforward.
>>> Probably going to need a test case.  (Even better if you can repro w/
>>> cli instead of needing Astyanax.)
>>>
>>> On Thu, Oct 4, 2012 at 2:15 PM, Brian O'Neill  wrote:
>>>> Obfuscated slightly
>>>>
>>>> The table is something simliar to:
>>>>
>>>> CREATE TABLE data (
>>>>   uid varchar,
>>>>   t timestamp,
>>>>   foo varchar,
>>>>   bar varchar,
>>>>   PRIMARY KEY (uid, t, foo, bar)
>>>> );
>>>>
>>>> Then I can insert just fine via Astyanax and I can see the row via
>>>> cli, but the select statement fails in cqlsh.
>>>>
>>>> The table is fine, when I only interact with it through CQL. I can
>>>> insert and select fine, until I insert a row from Asytanax.
>>>>
>>>> If needed, I can probably create a small test for this that I can share.
>>>>
>>>> -brian
>>>>
>>>>
>>>>
>>>> On Thu, Oct 4, 2012 at 3:08 PM, Jonathan Ellis  wrote:
>>>>> What kind of data did you insert, and what was expected?  Expected
>>>>> behavior would be to reject nonconforming data at insert time.
>>>>>
>>>>> On Thu, Oct 4, 2012 at 2:04 PM, Brian O'Neill  
>>>>> wrote:
>>>>>> This is probably already on your radar, but we could use a better
>>>>>> error message from cqlsh when the column key doesn't conform to the
>>>>>> expected schema...
>>>>>>
>>>>>> I accidentally inserted data using Astyanax that didn't conform to the
>>>>>> schema.  After that, selects from that table via cqlsh return no
>>>>>> useful information.
>>>>>> (CLI shows the data just fine)
>>>>>>
>>>>>>
>>>>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli
>>>>>> Connected to: "Test Cluster" on 127.0.0.1/9160
>>>>>> Welcome to Cassandra CLI version 1.1.5
>>>>>>
>>>>>> Type 'help;' or '?' for help.
>>>>>> Type 'quit;' or 'exit;' to quit.
>>>>>>
>>>>>> [default@unknown] use cirrus;
>>>>>> Authenticated to keyspace: cirrus
>>>>>> [default@cirrus] list data;
>>>>>> Using default limit of 100
>>>>>> Using default column limit of 100
>>>>>> ---
>>>>>> RowKey: PI7JC8
>>>>>> => (column=*, value=2014-07-31, timestamp=1349376866686000)
>>>>>> ---
>>>>>> RowKey: PI1234
>>>>>> => (column=*, value=Y, timestamp=1349372660453000)
>>>>>>
>>>>>> 2 Rows Returned.
>>>>>> Elapsed time: 212 msec(s).
>>>>>> [default@cirrus] quit;
>>>>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cqlsh -3
>>>>>> Connected to Test Cluster at localhost:9160.
>>>>>> [cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 3.0.0 | Thrift protocol 
>>>>>> 19.32.0]
>>>>>> Use HELP for help.
>>>>>> cqlsh> use cirrus;
>>>>>> cqlsh:cirrus> select * from data;
>>>>>> TSocket read 0 bytes
>>>>>> cqlsh:cirrus>
>>>>>>
>>>>>> --
>>>>>> Brian ONeill
>>>>>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>>>>>> mobile:215.588.6024
>>>>>> blog: http://brianoneill.blogspot.com/
>>>>>> twitter: @boneill42
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jonathan Ellis
>>>>> Project Chair, Apache Cassandra
>>>>> co-founder of DataStax, the source for professional Cassandra support
>>>>> http://www.datastax.com
>>>>
>>>>
>>>>
>>>> --
>>>> Brian ONeill
>>>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>>>>
>>>> mobile:215.588.6024
>>>> blog: http://brianoneill.blogspot.com/
>>>> twitter: @boneill42
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>
>>
>>
>> --
>> Brian ONeill
>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>>
>> mobile:215.588.6024
>> blog: http://brianoneill.blogspot.com/
>> twitter: @boneill42
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)

mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: TSocket read 0 bytes from cqlsh

2012-10-04 Thread Brian O'Neill
Here you go...

// 
//  IN CQLSH
// 
CREATE KEYSPACE cirrus WITH strategy_class = 'NetworkTopologyStrategy'
AND strategy_options:datacenter1 = '1';

use cirrus;

CREATE TABLE data (
  uid varchar,
  t timestamp,
  foo varchar,
  bar varchar,
  PRIMARY KEY (uid, t, foo)
);


// 
// Then in CLI
// 
use cirrus;
set data['PI7JC8KRF6']['1349110576']='2014-07-31';
list data;

// Note, I intentially didn't supply a value for "foo" in the primary
key definition.
// Listing works.


// 
// Then in CLI
// 
select * from data;

// The result is...
cqlsh:cirrus> select * from data;
TSocket read 0 bytes

On Thu, Oct 4, 2012 at 3:31 PM, Brian O'Neill  wrote:
> I was able to reproduce with CLI.  I'll send over the example as soon
> as I can obfuscate it.
>
> -brian
>
> On Thu, Oct 4, 2012 at 3:19 PM, Jonathan Ellis  wrote:
>> Nothing jumps out at me, varchar should be pretty straightforward.
>> Probably going to need a test case.  (Even better if you can repro w/
>> cli instead of needing Astyanax.)
>>
>> On Thu, Oct 4, 2012 at 2:15 PM, Brian O'Neill  wrote:
>>> Obfuscated slightly
>>>
>>> The table is something simliar to:
>>>
>>> CREATE TABLE data (
>>>   uid varchar,
>>>   t timestamp,
>>>   foo varchar,
>>>   bar varchar,
>>>   PRIMARY KEY (uid, t, foo, bar)
>>> );
>>>
>>> Then I can insert just fine via Astyanax and I can see the row via
>>> cli, but the select statement fails in cqlsh.
>>>
>>> The table is fine, when I only interact with it through CQL. I can
>>> insert and select fine, until I insert a row from Asytanax.
>>>
>>> If needed, I can probably create a small test for this that I can share.
>>>
>>> -brian
>>>
>>>
>>>
>>> On Thu, Oct 4, 2012 at 3:08 PM, Jonathan Ellis  wrote:
>>>> What kind of data did you insert, and what was expected?  Expected
>>>> behavior would be to reject nonconforming data at insert time.
>>>>
>>>> On Thu, Oct 4, 2012 at 2:04 PM, Brian O'Neill  
>>>> wrote:
>>>>> This is probably already on your radar, but we could use a better
>>>>> error message from cqlsh when the column key doesn't conform to the
>>>>> expected schema...
>>>>>
>>>>> I accidentally inserted data using Astyanax that didn't conform to the
>>>>> schema.  After that, selects from that table via cqlsh return no
>>>>> useful information.
>>>>> (CLI shows the data just fine)
>>>>>
>>>>>
>>>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli
>>>>> Connected to: "Test Cluster" on 127.0.0.1/9160
>>>>> Welcome to Cassandra CLI version 1.1.5
>>>>>
>>>>> Type 'help;' or '?' for help.
>>>>> Type 'quit;' or 'exit;' to quit.
>>>>>
>>>>> [default@unknown] use cirrus;
>>>>> Authenticated to keyspace: cirrus
>>>>> [default@cirrus] list data;
>>>>> Using default limit of 100
>>>>> Using default column limit of 100
>>>>> ---
>>>>> RowKey: PI7JC8
>>>>> => (column=*, value=2014-07-31, timestamp=1349376866686000)
>>>>> ---
>>>>> RowKey: PI1234
>>>>> => (column=*, value=Y, timestamp=1349372660453000)
>>>>>
>>>>> 2 Rows Returned.
>>>>> Elapsed time: 212 msec(s).
>>>>> [default@cirrus] quit;
>>>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cqlsh -3
>>>>> Connected to Test Cluster at localhost:9160.
>>>>> [cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
>>>>> Use HELP for help.
>>>>> cqlsh> use cirrus;
>>>>> cqlsh:cirrus> select * from data;
>>>>> TSocket read 0 bytes
>>>>> cqlsh:cirrus>
>>>>>
>>>>> --
>>>>> Brian ONeill
>>>>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>>>>> mobile:215.588.6024
>>>>> blog: http://brianoneill.blogspot.com/
>>>>> twitter: @boneill42
>>>>
>>>>
>>>>
>>>> --
>>>> Jonathan Ellis
>>>> Project Chair, Apache Cassandra
>>>> co-founder of DataStax, the source for professional Cassandra support
>>>> http://www.datastax.com
>>>
>>>
>>>
>>> --
>>> Brian ONeill
>>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>>>
>>> mobile:215.588.6024
>>> blog: http://brianoneill.blogspot.com/
>>> twitter: @boneill42
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>
>
> --
> Brian ONeill
> Lead Architect, Health Market Science (http://healthmarketscience.com)
>
> mobile:215.588.6024
> blog: http://brianoneill.blogspot.com/
> twitter: @boneill42



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)

mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: TSocket read 0 bytes from cqlsh

2012-10-04 Thread Brian O'Neill
I was able to reproduce with CLI.  I'll send over the example as soon
as I can obfuscate it.

-brian

On Thu, Oct 4, 2012 at 3:19 PM, Jonathan Ellis  wrote:
> Nothing jumps out at me, varchar should be pretty straightforward.
> Probably going to need a test case.  (Even better if you can repro w/
> cli instead of needing Astyanax.)
>
> On Thu, Oct 4, 2012 at 2:15 PM, Brian O'Neill  wrote:
>> Obfuscated slightly
>>
>> The table is something simliar to:
>>
>> CREATE TABLE data (
>>   uid varchar,
>>   t timestamp,
>>   foo varchar,
>>   bar varchar,
>>   PRIMARY KEY (uid, t, foo, bar)
>> );
>>
>> Then I can insert just fine via Astyanax and I can see the row via
>> cli, but the select statement fails in cqlsh.
>>
>> The table is fine, when I only interact with it through CQL. I can
>> insert and select fine, until I insert a row from Asytanax.
>>
>> If needed, I can probably create a small test for this that I can share.
>>
>> -brian
>>
>>
>>
>> On Thu, Oct 4, 2012 at 3:08 PM, Jonathan Ellis  wrote:
>>> What kind of data did you insert, and what was expected?  Expected
>>> behavior would be to reject nonconforming data at insert time.
>>>
>>> On Thu, Oct 4, 2012 at 2:04 PM, Brian O'Neill  wrote:
>>>> This is probably already on your radar, but we could use a better
>>>> error message from cqlsh when the column key doesn't conform to the
>>>> expected schema...
>>>>
>>>> I accidentally inserted data using Astyanax that didn't conform to the
>>>> schema.  After that, selects from that table via cqlsh return no
>>>> useful information.
>>>> (CLI shows the data just fine)
>>>>
>>>>
>>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli
>>>> Connected to: "Test Cluster" on 127.0.0.1/9160
>>>> Welcome to Cassandra CLI version 1.1.5
>>>>
>>>> Type 'help;' or '?' for help.
>>>> Type 'quit;' or 'exit;' to quit.
>>>>
>>>> [default@unknown] use cirrus;
>>>> Authenticated to keyspace: cirrus
>>>> [default@cirrus] list data;
>>>> Using default limit of 100
>>>> Using default column limit of 100
>>>> ---
>>>> RowKey: PI7JC8
>>>> => (column=*, value=2014-07-31, timestamp=1349376866686000)
>>>> ---
>>>> RowKey: PI1234
>>>> => (column=*, value=Y, timestamp=1349372660453000)
>>>>
>>>> 2 Rows Returned.
>>>> Elapsed time: 212 msec(s).
>>>> [default@cirrus] quit;
>>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cqlsh -3
>>>> Connected to Test Cluster at localhost:9160.
>>>> [cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
>>>> Use HELP for help.
>>>> cqlsh> use cirrus;
>>>> cqlsh:cirrus> select * from data;
>>>> TSocket read 0 bytes
>>>> cqlsh:cirrus>
>>>>
>>>> --
>>>> Brian ONeill
>>>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>>>> mobile:215.588.6024
>>>> blog: http://brianoneill.blogspot.com/
>>>> twitter: @boneill42
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>
>>
>>
>> --
>> Brian ONeill
>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>>
>> mobile:215.588.6024
>> blog: http://brianoneill.blogspot.com/
>> twitter: @boneill42
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)

mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: TSocket read 0 bytes from cqlsh

2012-10-04 Thread Brian O'Neill
>From this, I assume I inserted the wrong number of values into the
compound key from Astyanax.  It would be nice to carry this error
across to the CQL client.

-brian

On Thu, Oct 4, 2012 at 3:17 PM, Brian O'Neill  wrote:
> Here you go...
>
> ERROR 14:57:37,270 Error occurred during processing of message.
> java.lang.ArrayIndexOutOfBoundsException: 4
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:773)
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:137)
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:108)
> at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:121)
> at 
> org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1237)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3542)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3530)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:680)
>
>
> On Thu, Oct 4, 2012 at 3:15 PM, Brian O'Neill  wrote:
>> Obfuscated slightly
>>
>> The table is something simliar to:
>>
>> CREATE TABLE data (
>>   uid varchar,
>>   t timestamp,
>>   foo varchar,
>>   bar varchar,
>>   PRIMARY KEY (uid, t, foo, bar)
>> );
>>
>> Then I can insert just fine via Astyanax and I can see the row via
>> cli, but the select statement fails in cqlsh.
>>
>> The table is fine, when I only interact with it through CQL. I can
>> insert and select fine, until I insert a row from Asytanax.
>>
>> If needed, I can probably create a small test for this that I can share.
>>
>> -brian
>>
>>
>>
>> On Thu, Oct 4, 2012 at 3:08 PM, Jonathan Ellis  wrote:
>>> What kind of data did you insert, and what was expected?  Expected
>>> behavior would be to reject nonconforming data at insert time.
>>>
>>> On Thu, Oct 4, 2012 at 2:04 PM, Brian O'Neill  wrote:
>>>> This is probably already on your radar, but we could use a better
>>>> error message from cqlsh when the column key doesn't conform to the
>>>> expected schema...
>>>>
>>>> I accidentally inserted data using Astyanax that didn't conform to the
>>>> schema.  After that, selects from that table via cqlsh return no
>>>> useful information.
>>>> (CLI shows the data just fine)
>>>>
>>>>
>>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli
>>>> Connected to: "Test Cluster" on 127.0.0.1/9160
>>>> Welcome to Cassandra CLI version 1.1.5
>>>>
>>>> Type 'help;' or '?' for help.
>>>> Type 'quit;' or 'exit;' to quit.
>>>>
>>>> [default@unknown] use cirrus;
>>>> Authenticated to keyspace: cirrus
>>>> [default@cirrus] list data;
>>>> Using default limit of 100
>>>> Using default column limit of 100
>>>> ---
>>>> RowKey: PI7JC8
>>>> => (column=*, value=2014-07-31, timestamp=1349376866686000)
>>>> ---
>>>> RowKey: PI1234
>>>> => (column=*, value=Y, timestamp=1349372660453000)
>>>>
>>>> 2 Rows Returned.
>>>> Elapsed time: 212 msec(s).
>>>> [default@cirrus] quit;
>>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cqlsh -3
>>>> Connected to Test Cluster at localhost:9160.
>>>> [cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
>>>> Use HELP for help.
>>>> cqlsh> use cirrus;
>>>> cqlsh:cirrus> select * from data;
>>>> TSocket read 0 bytes
>>>> cqlsh:cirrus>
>>>>
>>>> --
>>>> Brian ONeill
>>>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>>>> mobile:215.588.6024
>>>> blog: http://brianoneill.blogspot.com/
>>>> twitter: @boneill42
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>
>>
>>
>> --
>> Brian ONeill
>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>>
>> mobile:215.588.6024
>> blog: http://brianoneill.blogspot.com/
>> twitter: @boneill42
>
>
>
> --
> Brian ONeill
> Lead Architect, Health Market Science (http://healthmarketscience.com)
>
> mobile:215.588.6024
> blog: http://brianoneill.blogspot.com/
> twitter: @boneill42



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)

mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: TSocket read 0 bytes from cqlsh

2012-10-04 Thread Brian O'Neill
Here you go...

ERROR 14:57:37,270 Error occurred during processing of message.
java.lang.ArrayIndexOutOfBoundsException: 4
at 
org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:773)
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:137)
at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:108)
at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:121)
at 
org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1237)
at 
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3542)
at 
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3530)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)


On Thu, Oct 4, 2012 at 3:15 PM, Brian O'Neill  wrote:
> Obfuscated slightly
>
> The table is something simliar to:
>
> CREATE TABLE data (
>   uid varchar,
>   t timestamp,
>   foo varchar,
>   bar varchar,
>   PRIMARY KEY (uid, t, foo, bar)
> );
>
> Then I can insert just fine via Astyanax and I can see the row via
> cli, but the select statement fails in cqlsh.
>
> The table is fine, when I only interact with it through CQL. I can
> insert and select fine, until I insert a row from Asytanax.
>
> If needed, I can probably create a small test for this that I can share.
>
> -brian
>
>
>
> On Thu, Oct 4, 2012 at 3:08 PM, Jonathan Ellis  wrote:
>> What kind of data did you insert, and what was expected?  Expected
>> behavior would be to reject nonconforming data at insert time.
>>
>> On Thu, Oct 4, 2012 at 2:04 PM, Brian O'Neill  wrote:
>>> This is probably already on your radar, but we could use a better
>>> error message from cqlsh when the column key doesn't conform to the
>>> expected schema...
>>>
>>> I accidentally inserted data using Astyanax that didn't conform to the
>>> schema.  After that, selects from that table via cqlsh return no
>>> useful information.
>>> (CLI shows the data just fine)
>>>
>>>
>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli
>>> Connected to: "Test Cluster" on 127.0.0.1/9160
>>> Welcome to Cassandra CLI version 1.1.5
>>>
>>> Type 'help;' or '?' for help.
>>> Type 'quit;' or 'exit;' to quit.
>>>
>>> [default@unknown] use cirrus;
>>> Authenticated to keyspace: cirrus
>>> [default@cirrus] list data;
>>> Using default limit of 100
>>> Using default column limit of 100
>>> ---
>>> RowKey: PI7JC8
>>> => (column=*, value=2014-07-31, timestamp=1349376866686000)
>>> ---
>>> RowKey: PI1234
>>> => (column=*, value=Y, timestamp=1349372660453000)
>>>
>>> 2 Rows Returned.
>>> Elapsed time: 212 msec(s).
>>> [default@cirrus] quit;
>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cqlsh -3
>>> Connected to Test Cluster at localhost:9160.
>>> [cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
>>> Use HELP for help.
>>> cqlsh> use cirrus;
>>> cqlsh:cirrus> select * from data;
>>> TSocket read 0 bytes
>>> cqlsh:cirrus>
>>>
>>> --
>>> Brian ONeill
>>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>>> mobile:215.588.6024
>>> blog: http://brianoneill.blogspot.com/
>>> twitter: @boneill42
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>
>
> --
> Brian ONeill
> Lead Architect, Health Market Science (http://healthmarketscience.com)
>
> mobile:215.588.6024
> blog: http://brianoneill.blogspot.com/
> twitter: @boneill42



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)

mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: TSocket read 0 bytes from cqlsh

2012-10-04 Thread Brian O'Neill
Obfuscated slightly

The table is something simliar to:

CREATE TABLE data (
  uid varchar,
  t timestamp,
  foo varchar,
  bar varchar,
  PRIMARY KEY (uid, t, foo, bar)
);

Then I can insert just fine via Astyanax and I can see the row via
cli, but the select statement fails in cqlsh.

The table is fine, when I only interact with it through CQL. I can
insert and select fine, until I insert a row from Asytanax.

If needed, I can probably create a small test for this that I can share.

-brian



On Thu, Oct 4, 2012 at 3:08 PM, Jonathan Ellis  wrote:
> What kind of data did you insert, and what was expected?  Expected
> behavior would be to reject nonconforming data at insert time.
>
> On Thu, Oct 4, 2012 at 2:04 PM, Brian O'Neill  wrote:
>> This is probably already on your radar, but we could use a better
>> error message from cqlsh when the column key doesn't conform to the
>> expected schema...
>>
>> I accidentally inserted data using Astyanax that didn't conform to the
>> schema.  After that, selects from that table via cqlsh return no
>> useful information.
>> (CLI shows the data just fine)
>>
>>
>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli
>> Connected to: "Test Cluster" on 127.0.0.1/9160
>> Welcome to Cassandra CLI version 1.1.5
>>
>> Type 'help;' or '?' for help.
>> Type 'quit;' or 'exit;' to quit.
>>
>> [default@unknown] use cirrus;
>> Authenticated to keyspace: cirrus
>> [default@cirrus] list data;
>> Using default limit of 100
>> Using default column limit of 100
>> ---
>> RowKey: PI7JC8
>> => (column=*, value=2014-07-31, timestamp=1349376866686000)
>> ---
>> RowKey: PI1234
>> => (column=*, value=Y, timestamp=1349372660453000)
>>
>> 2 Rows Returned.
>> Elapsed time: 212 msec(s).
>> [default@cirrus] quit;
>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cqlsh -3
>> Connected to Test Cluster at localhost:9160.
>> [cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
>> Use HELP for help.
>> cqlsh> use cirrus;
>> cqlsh:cirrus> select * from data;
>> TSocket read 0 bytes
>> cqlsh:cirrus>
>>
>> --
>> Brian ONeill
>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>> mobile:215.588.6024
>> blog: http://brianoneill.blogspot.com/
>> twitter: @boneill42
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)

mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


TSocket read 0 bytes from cqlsh

2012-10-04 Thread Brian O'Neill
This is probably already on your radar, but we could use a better
error message from cqlsh when the column key doesn't conform to the
expected schema...

I accidentally inserted data using Astyanax that didn't conform to the
schema.  After that, selects from that table via cqlsh return no
useful information.
(CLI shows the data just fine)


bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli
Connected to: "Test Cluster" on 127.0.0.1/9160
Welcome to Cassandra CLI version 1.1.5

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown] use cirrus;
Authenticated to keyspace: cirrus
[default@cirrus] list data;
Using default limit of 100
Using default column limit of 100
---
RowKey: PI7JC8
=> (column=*, value=2014-07-31, timestamp=1349376866686000)
---
RowKey: PI1234
=> (column=*, value=Y, timestamp=1349372660453000)

2 Rows Returned.
Elapsed time: 212 msec(s).
[default@cirrus] quit;
bone@boneill-macbook-wired:~/tools/cassandra-> bin/cqlsh -3
Connected to Test Cluster at localhost:9160.
[cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
Use HELP for help.
cqlsh> use cirrus;
cqlsh:cirrus> select * from data;
TSocket read 0 bytes
cqlsh:cirrus>

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: Document storage

2012-05-28 Thread Brian O'Neill
Just following up on this age-old thread because we've recently done some
development

Ben, we recently had the exact need you outline.  We are storing JSON
documents in Cassandra. We needed to index based on a field in the JSON.
 We ended up extending our cassandra-indexing code to accomodate this.
https://github.com/hmsonline/cassandra-indexing

You can now configure the indexing to accomodate a field within the JSON
document.

We're going to update the wiki to make this more usable, but it triggered
the same kind of debate/thought process on this thread.  In the coming
weeks/months, we'll probably consider a switch to protobuf with an update
to our indexing code to understand the internal structure of documents
stored in Cassandra.

just an update for now,
brian

On Fri, Mar 30, 2012 at 1:33 PM, Ben McCann  wrote:

> >
> > If you don't need selected updates and having something as compact as
> > possible on disk make a important difference for you, sure, do use blobs.
> > The only argument is that you can already do that without any change to
> > the core.
>
>
> The thing that we can't do today without changes to the core is index on
> subparts of some document format like Protobuf/JSON/etc.  If cassandra were
> to understand one of these formats, it could remove the need for manual
> management of an index.
>
>
> On Fri, Mar 30, 2012 at 10:23 AM, Sylvain Lebresne  >wrote:
>
> > On Fri, Mar 30, 2012 at 6:01 PM, Daniel Doubleday
> >  wrote:
> > > But decomposing into columns will lead to more of that:
> > >
> > > - Total amount of serialized data is (in most cases a lot) larger than
> > protobuffed / compressed version
> >
> > At least with sstable compression, I would expect the difference to
> > not be too big in practice.
> >
> > > - If you do selective updates the document will be scattered over
> > multiple ssts plus if you do sliced reads you can't optimize reads as
> > opposed to the single column version that when updated is automatically
> > superseding older versions so most reads will hit only one sst
> >
> > But if you need to do selective updates, then a blob just doesn't work
> > so that comparison is moot.
> >
> > Now I don't think anyone pretended that you should never use blobs
> > (whether that's protobuffed, jsoned, ...). If you don't need selected
> > updates and having something as compact as possible on disk make a
> > important difference for you, sure, do use blobs. The only argument is
> > that you can already do that without any change to the core. What we
> > are saying is that for the case where you care more about schema
> > flexibility (being able to do selective updates, to index on some
> > subpart, etc...) then we think that something like the map and list
> > idea of CASSANDRA-3647 will probably be a more natural fit to the
> > current CQL API.
> >
> > --
> > Sylvain
> >
> > >
> > > All these reads make the hot dataset. If it fits the page cache your
> > fine. If it doesn't you need to buy more iron.
> > >
> > > Really could not resist because your statement seems to be contrary to
> > all our tests / learnings.
> > >
> > > Cheers,
> > > Daniel
> > >
> > > From dev list:
> > >
> > > Re: Document storage
> > > On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian 
> > wrote:
> > >>> I think this is a much better approach because that gives you the
> > >>> ability to update or retrieve just parts of objects efficiently,
> > >>> rather than making column values just blobs with a bunch of special
> > >>> case logic to introspect them.  Which feels like a big step backwards
> > >>> to me.
> > >>
> > >> Unless your access pattern involves reading/writing the whole document
> > each time. In
> > > that case you're better off serializing the whole document and storing
> > it in a column as a
> > > byte[] without incurring the overhead of column indexes. Right?
> > >
> > > Hmm, not sure what you're thinking of there.
> > >
> > > If you mean the "index" that's part of the row header for random
> > > access within a row, then no, serializing to byte[] doesn't save you
> > > anything.
> > >
> > > If you mean secondary indexes, don't declare any if you don't want any.
> > :)
> > >
> > > Just telling C* to store a byte[] *will* be slightly lighter-weight
> > > than giving it named columns, but we're talking negligible compared to
> > > the overhead of actually moving the data on or off disk in the first
> > > place.  Not even close to being worth giving up being able to deal
> > > with your data from standard tools like cqlsh, IMO.
> > >
> > > --
> > > Jonathan Ellis
> > > Project Chair, Apache Cassandra
> > > co-founder of DataStax, the source for professional Cassandra support
> > > http://www.datastax.com
> > >
> >
>



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: Server Side Logic/Script - Triggers / StoreProc

2012-04-22 Thread Brian O'Neill
Praveen,

We are certainly interested. To get things moving we implemented an add-on for 
Cassandra to demonstrate the viability (using AOP):
https://github.com/hmsonline/cassandra-triggers

Right now the implementation executes triggers asynchronously, allowing you to 
implement a java interface and plugin your own java class that will get called 
for every insert.

Per the discussion on 1311, we intend to extend our proof of concept to be able 
to invoke scripts as well.  (minimally we'll enable javascript, but we'll 
probably allow for ruby and groovy as well)

-brian

On Apr 22, 2012, at 12:23 PM, Praveen Baratam wrote:

> I found that Triggers are coming in Cassandra 1.2 
> (https://issues.apache.org/jira/browse/CASSANDRA-1311) but no mention of any 
> StoreProc like pattern.
> 
> I know this has been discussed so many times but never met with any 
> initiative. Even Groovy was staged out of the trunk.
> 
> Cassandra is great for logging and as such will be infinitely more useful if 
> some logic can be pushed into the Cassandra cluster nearer to the location of 
> Data to generate a materialized view useful for applications.
> 
> Server Side Scripts/Routines in Distributed Databases could soon prove to be 
> the differentiating factor.
> 
> Let me reiterate things with a use case.
> 
> In our application we store time series data in wide rows with TTL set on 
> each point to prevent data from growing beyond acceptable limits. Still the 
> data size can be a limiting factor to move all of it from the cluster node to 
> the querying node and then to the application via thrift for processing and 
> presentation.
> 
> Ideally we should process the data on the residing node and pass only the 
> materialized view of the data upstream. This should be trivial if Cassandra 
> implements some sort of server side scripting and CQL semantics to call it.
> 
> Is anybody else interested in a similar feature? Is it being worked on? Are 
> there any alternative strategies to this problem?
> 
> Praveen
> 
> 

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



kudos...

2012-04-02 Thread Brian O'Neill
I just wanted to let you guys know that I gave you a shout out...
http://brianoneill.blogspot.com/2012/04/cassandra-vs-couchdb-mongodb-riak-hbase.html

thanks for all the support,
brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: Document storage

2012-03-30 Thread Brian O'Neill

Do we also need to consider the client API?
If we don't adjust thrift, the client just gets bytes right?
The client is on their own to marshal back into a structure.  In this
case, it seems like we would want to chose a standard that is efficient
and for which there are common libraries.  Protobuf seems to fit the bill
here.  

Or do we pass back some other structure?  (Native lists/maps? JSON
strings?)

Do we ignore sorting/comparators?
(similar to SOLR, I'm not sure people have defined a good sort for
multi-valued items)

-brian

---- 
Brian O'Neill
Lead Architect, Software Development
Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406
p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



On 3/30/12 12:01 PM, "Daniel Doubleday"  wrote:

>> Just telling C* to store a byte[] *will* be slightly lighter-weight
>> than giving it named columns, but we're talking negligible compared to
>> the overhead of actually moving the data on or off disk in the first
>> place. 
>Hm - but isn't this exactly the point? You don't want to move data off
>disk.
>But decomposing into columns will lead to more of that:
>
>- Total amount of serialized data is (in most cases a lot) larger than
>protobuffed / compressed version
>- If you do selective updates the document will be scattered over
>multiple ssts plus if you do sliced reads you can't optimize reads as
>opposed to the single column version that when updated is automatically
>superseding older versions so most reads will hit only one sst
>
>All these reads make the hot dataset. If it fits the page cache your
>fine. If it doesn't you need to buy more iron.
>
>Really could not resist because your statement seems to be contrary to
>all our tests / learnings.
>
>Cheers,
>Daniel
>
>From dev list:
>
>Re: Document storage
>On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian  wrote:
>>> I think this is a much better approach because that gives you the
>>> ability to update or retrieve just parts of objects efficiently,
>>> rather than making column values just blobs with a bunch of special
>>> case logic to introspect them.  Which feels like a big step backwards
>>> to me.
>>
>> Unless your access pattern involves reading/writing the whole document
>>each time. In
>that case you're better off serializing the whole document and storing it
>in a column as a
>byte[] without incurring the overhead of column indexes. Right?
>
>Hmm, not sure what you're thinking of there.
>
>If you mean the "index" that's part of the row header for random
>access within a row, then no, serializing to byte[] doesn't save you
>anything.
>
>If you mean secondary indexes, don't declare any if you don't want any. :)
>
>Just telling C* to store a byte[] *will* be slightly lighter-weight
>than giving it named columns, but we're talking negligible compared to
>the overhead of actually moving the data on or off disk in the first
>place.  Not even close to being worth giving up being able to deal
>with your data from standard tools like cqlsh, IMO.
>
>-- 
>Jonathan Ellis
>Project Chair, Apache Cassandra
>co-founder of DataStax, the source for professional Cassandra support
>http://www.datastax.com
>




Re: Document storage

2012-03-29 Thread Brian O'Neill

Jonathan,

We store JSON as our column values.  I'd love to see support for maps and
lists.  If I get some time this weekend, I'll take a look to see what is
required.  I doesn't seem like it would be that hard.

-brian

 
Brian O'Neill
Lead Architect, Software Development
Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406
p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/







On 3/29/12 3:18 PM, "Jonathan Ellis"  wrote:

>On Thu, Mar 29, 2012 at 2:06 PM, Ben McCann  wrote:
>> As far as I can tell, Cassandra
>> doesn't support maps and lists in a standardized way today, which is the
>> root of my problem.
>
>I'm pretty serious about adding those for 1.2, for what that's worth.
>(If you want to jump in and help code that up, so much the better.)
>
>-- 
>Jonathan Ellis
>Project Chair, Apache Cassandra
>co-founder of DataStax, the source for professional Cassandra support
>http://www.datastax.com




Re: Document storage

2012-03-29 Thread Brian O'Neill
Jonathan, 

I was actually going to take this up with Nate McCall a few weeks back.  I
think it might make sense to get the client development community together
(Netflix w/ Astyanax, Hector, Pycassa, Virgil, etc.)

I agree whole-heartedly that it shouldn't go into the database for all the
reasons you point out.

If we can all decide on some standards for data storage (e.g. composite
types), indexing strategies, etc.  We can provide higher-level functions
through the client libraries and also provide interoperability between
them.  (without bloating Cassandra)

CCing Nate.  Nate, thoughts?
I wouldn't mind coordinating/facilitating the conversation.  If we know
who should be involved.

-brian

---- 
Brian O'Neill
Lead Architect, Software Development
Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406
p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/







On 3/29/12 3:06 PM, "Ben McCann"  wrote:

>Jonathan, I asked Brian about his REST
>API<https://groups.google.com/forum/?fromgroups#!topic/virgil-users/oncBas
>9C8Us>and
>he said he does not take the json objects and split them because the
>client libraries do not agree on implementations.  This was exactly my
>concern as well with this solution.  I would be perfectly happy to do it
>this way instead of using JSON if it were standardized.  The reason I
>suggested JSON is that it is standardized.  As far as I can tell,
>Cassandra
>doesn't support maps and lists in a standardized way today, which is the
>root of my problem.
>
>-Ben
>
>
>On Thu, Mar 29, 2012 at 11:30 AM, Drew Kutcharian  wrote:
>
>> Yes, I meant the "row header index". What I have done is that I'm
>>storing
>> an object (i.e. UserProfile) where you read or write it as a whole (a
>>user
>> updates their user details in a single page in the UI). So I serialize
>>that
>> object into a binary JSON using SMILE format. I then compress it using
>> Snappy on the client side. So as far as Cassandra cares it's storing a
>> byte[].
>>
>> Now on the client side, I'm using cassandra-cli with a custom type that
>> knows how to turn a byte[] into a JSON text and back. The only issue was
>> CASSANDRA-4081 where "assume" doesn't work with custom types. If
>> CASSANDRA-4081 gets fixed, I'll get the best of both worlds.
>>
>> Also advantages of this vs. the thrift based Super Column families are:
>>
>> 1. Saving extra CPU usage on the Cassandra nodes. Since
>> serialize/deserialize and compression/decompression happens on the
>>client
>> nodes where there is plenty idle CPU time
>>
>> 2. Saving network bandwidth since I'm sending over a compressed byte[]
>>
>>
>> -- Drew
>>
>>
>>
>> On Mar 29, 2012, at 11:16 AM, Jonathan Ellis wrote:
>>
>> > On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian 
>> wrote:
>> >>> I think this is a much better approach because that gives you the
>> >>> ability to update or retrieve just parts of objects efficiently,
>> >>> rather than making column values just blobs with a bunch of special
>> >>> case logic to introspect them.  Which feels like a big step
>>backwards
>> >>> to me.
>> >>
>> >> Unless your access pattern involves reading/writing the whole
>>document
>> each time. In that case you're better off serializing the whole document
>> and storing it in a column as a byte[] without incurring the overhead of
>> column indexes. Right?
>> >
>> > Hmm, not sure what you're thinking of there.
>> >
>> > If you mean the "index" that's part of the row header for random
>> > access within a row, then no, serializing to byte[] doesn't save you
>> > anything.
>> >
>> > If you mean secondary indexes, don't declare any if you don't want
>>any.
>> :)
>> >
>> > Just telling C* to store a byte[] *will* be slightly lighter-weight
>> > than giving it named columns, but we're talking negligible compared to
>> > the overhead of actually moving the data on or off disk in the first
>> > place.  Not even close to being worth giving up being able to deal
>> > with your data from standard tools like cqlsh, IMO.
>> >
>> > --
>> > Jonathan Ellis
>> > Project Chair, Apache Cassandra
>> > co-founder of DataStax, the source for professional Cassandra support
>> > http://www.datastax.com
>>
>>




Re: OoM querying very wide-row in CLI

2012-03-28 Thread Brian O'Neill
Sorry, I didn't realize we weren't hip to pulls yet.

I created a JIRA and attached the patch.
https://issues.apache.org/jira/browse/CASSANDRA-4098

-brian

On Tue, Mar 27, 2012 at 10:42 PM, Brian O'Neill wrote:

> Here she is:
> https://github.com/apache/cassandra/pull/8
>
> Verified functionally with the attached data script.
>
> -brian
>
>
>
> On Tue, Mar 27, 2012 at 9:49 PM, Brian O'Neill wrote:
>
>> 10-4.  I'll see if I can track it down and submit a pull request that
>> specifies a default if one does not exist.
>>
>> -brian
>>
>> 
>> Brian O'Neill
>> Lead Architect, Software Development
>> Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406
>> p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/
>> blog: http://brianoneill.blogspot.com/
>>
>>
>>
>>
>>
>>
>>
>> On 3/27/12 9:45 PM, "Jonathan Ellis"  wrote:
>>
>> >I believe we added support for specifying a column range to the cli
>> >recently.  I don't know if there is a default limit.
>> >
>> >On Tue, Mar 27, 2012 at 8:40 PM, Brian O'Neill 
>> >wrote:
>> >> Today, running 1.0.7, we saw a node crash with an OutOfMemory.
>> >> We have a single row with ~10million columns in it. (using it as an
>> >>index)
>> >> Accidentally, we attempted to list the CF in CLI that had the wide-row.
>> >>  This caused the CLI to hang and then eventually crashed Cassandra with
>> >>an
>> >> OoM.
>> >>
>> >> I know this is a case of "If it hurts when you do that, don't do that",
>> >>but
>> >> we may want to better protect against it in the CLI and/or the DB.  I
>> >>know
>> >> we limit row counts on lists in CLI.  Do we also limit column counts?
>> >>If
>> >> not, I don't mind submitting a patch for this.
>> >>
>> >> let me know,
>> >> brian
>> >>
>> >> --
>> >> Brian ONeill
>> >> Lead Architect, Health Market Science (http://healthmarketscience.com)
>> >> mobile:215.588.6024
>> >> blog: http://weblogs.java.net/blog/boneill42/
>> >> blog: http://brianoneill.blogspot.com/
>> >
>> >
>> >
>> >--
>> >Jonathan Ellis
>> >Project Chair, Apache Cassandra
>> >co-founder of DataStax, the source for professional Cassandra support
>> >http://www.datastax.com
>>
>>
>>
>
>
> --
> Brian ONeill
> Lead Architect, Health Market Science (http://healthmarketscience.com)
> mobile:215.588.6024
> blog: http://weblogs.java.net/blog/boneill42/
> blog: http://brianoneill.blogspot.com/
>
>


-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: OoM querying very wide-row in CLI

2012-03-27 Thread Brian O'Neill
Here she is:
https://github.com/apache/cassandra/pull/8

Verified functionally with the attached data script.

-brian



On Tue, Mar 27, 2012 at 9:49 PM, Brian O'Neill wrote:

> 10-4.  I'll see if I can track it down and submit a pull request that
> specifies a default if one does not exist.
>
> -brian
>
> 
> Brian O'Neill
> Lead Architect, Software Development
> Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406
> p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/
> blog: http://brianoneill.blogspot.com/
>
>
>
>
>
>
>
> On 3/27/12 9:45 PM, "Jonathan Ellis"  wrote:
>
> >I believe we added support for specifying a column range to the cli
> >recently.  I don't know if there is a default limit.
> >
> >On Tue, Mar 27, 2012 at 8:40 PM, Brian O'Neill 
> >wrote:
> >> Today, running 1.0.7, we saw a node crash with an OutOfMemory.
> >> We have a single row with ~10million columns in it. (using it as an
> >>index)
> >> Accidentally, we attempted to list the CF in CLI that had the wide-row.
> >>  This caused the CLI to hang and then eventually crashed Cassandra with
> >>an
> >> OoM.
> >>
> >> I know this is a case of "If it hurts when you do that, don't do that",
> >>but
> >> we may want to better protect against it in the CLI and/or the DB.  I
> >>know
> >> we limit row counts on lists in CLI.  Do we also limit column counts?
> >>If
> >> not, I don't mind submitting a patch for this.
> >>
> >> let me know,
> >> brian
> >>
> >> --
> >> Brian ONeill
> >> Lead Architect, Health Market Science (http://healthmarketscience.com)
> >> mobile:215.588.6024
> >> blog: http://weblogs.java.net/blog/boneill42/
> >> blog: http://brianoneill.blogspot.com/
> >
> >
> >
> >--
> >Jonathan Ellis
> >Project Chair, Apache Cassandra
> >co-founder of DataStax, the source for professional Cassandra support
> >http://www.datastax.com
>
>
>


-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: OoM querying very wide-row in CLI

2012-03-27 Thread Brian O'Neill
10-4.  I'll see if I can track it down and submit a pull request that
specifies a default if one does not exist.

-brian

---- 
Brian O'Neill
Lead Architect, Software Development
Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406
p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/







On 3/27/12 9:45 PM, "Jonathan Ellis"  wrote:

>I believe we added support for specifying a column range to the cli
>recently.  I don't know if there is a default limit.
>
>On Tue, Mar 27, 2012 at 8:40 PM, Brian O'Neill 
>wrote:
>> Today, running 1.0.7, we saw a node crash with an OutOfMemory.
>> We have a single row with ~10million columns in it. (using it as an
>>index)
>> Accidentally, we attempted to list the CF in CLI that had the wide-row.
>>  This caused the CLI to hang and then eventually crashed Cassandra with
>>an
>> OoM.
>>
>> I know this is a case of "If it hurts when you do that, don't do that",
>>but
>> we may want to better protect against it in the CLI and/or the DB.  I
>>know
>> we limit row counts on lists in CLI.  Do we also limit column counts?
>>If
>> not, I don't mind submitting a patch for this.
>>
>> let me know,
>> brian
>>
>> --
>> Brian ONeill
>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>> mobile:215.588.6024
>> blog: http://weblogs.java.net/blog/boneill42/
>> blog: http://brianoneill.blogspot.com/
>
>
>
>-- 
>Jonathan Ellis
>Project Chair, Apache Cassandra
>co-founder of DataStax, the source for professional Cassandra support
>http://www.datastax.com




OoM querying very wide-row in CLI

2012-03-27 Thread Brian O'Neill
Today, running 1.0.7, we saw a node crash with an OutOfMemory.
We have a single row with ~10million columns in it. (using it as an index)
Accidentally, we attempted to list the CF in CLI that had the wide-row.
 This caused the CLI to hang and then eventually crashed Cassandra with an
OoM.

I know this is a case of "If it hurts when you do that, don't do that", but
we may want to better protect against it in the CLI and/or the DB.  I know
we limit row counts on lists in CLI.  Do we also limit column counts?  If
not, I don't mind submitting a patch for this.

let me know,
brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Triggers?

2012-01-20 Thread Brian O'Neill
I just posted to the user list, but figured I would post here as well.

We had a big session today designing application-level triggers using a new
column family as a distributed commit log.
When I got back to my desk, I re-googled Cassandra triggers, and re-read:
https://issues.apache.org/jira/browse/CASSANDRA-1311

We had planned to implement something similar to the "crack smoking"
concept...
Keeping a separate column family that logged the mutation, which a trigger
could then act on and write-back upon success.

Conceptually, this doesn't seem too difficult to implement.  Is anyone
working on this already?
If not, is it worth working it and contributing as a patch?
Or should we just keep it to our app layer?

-brian


-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: Cassandra has moved to Git

2012-01-05 Thread Brian O'Neill
I'm by no means a git guru, but just happened to attend a meeting last
night where the presenter addressed this exact issue.  He has a pretty
slick process that kept the master/trunk clean without rebasing by
squashing a set of commits into a single commit when merged to trunk.
(using git squash?)

I'm CCing the guru, Nicholas Hance.

Nicholas, can you share that process handout from last night?

-brian


On Thu, Jan 5, 2012 at 11:58 AM, Sylvain Lebresne wrote:

> > This discourages collaboration because anyone that might fork
> > github.com/author/666 is sitting on a powder keg.
>
> Alright, but then what is it you're proposing?
>
> > At best it's yak shaving.  At worst it's going to result in some very
> > frustrated contributors.  This is one of the major reasons why rebase
> > is so contentious, and it's exactly why you hear so many people saying
> > "don't rebase branches that have been published".
>
> Again, I was more talking about the only reasonable solution I saw.
> Because to be clear, if the history for some issue 666 in say trunk looks
> like:
>
> commit : last nits from reviewer
> commit : oops, typo that prevented commit
> commit : some more fix found during review
> commit : refactor half of preceding patch following reviewer comments
> commit : Do something awesome - patch for #666
>
> then imho that's a big regression from current patch based development.
>
> So basically my question is how do we meld all those commits that will
> necessarily happen due to the nature of distributed reviews so that our
> main history don't look like shit? And if the answer is "we don't" then
> I'm not too fond of that solution.
>
> --
> Sylvain
>



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


FYI -- BufferOverflowException out of CommitLog on trunk

2011-12-12 Thread Brian O'Neill
I haven't had time to look into it yet, but just wanted to let you guys
know that I hit this in case someone was in that code.

ERROR 14:07:31,215 Fatal exception in thread
Thread[COMMIT-LOG-WRITER,5,main]
java.nio.BufferOverflowException
at java.nio.Buffer.nextPutIndex(Buffer.java:501)
at java.nio.DirectByteBuffer.putInt(DirectByteBuffer.java:654)
at
org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:259)
at
org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:568)
at
org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:49)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.lang.Thread.run(Thread.java:662)
 INFO 14:07:31,504 flushing high-traffic column family CFS(Keyspace='***',
ColumnFamily='***') (estimated 103394287 bytes)

It happened during a fairly standard load process using M/R.

After that, the server refused to come down with a standard kill.

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: How is Cassandra being used?

2011-11-16 Thread Brian O'Neill
Lively thread...

+1 opt-in
+1 in separate module

I'll just substantiate Rick Shaw's comments.  If this is on by default, I
can see it making its way into production at a large corporation, at which
time the traffic would sound an alarm as suspicious activity, which would
immediately get the server's plug pulled and trigger an investigation.
 That would land the architect responsible for deploying that server in the
proverbial principal's office.  In the extreme case, that might
"black-list" the technology and add fuel to any debate that the corporation
should just stick with the 'proven enterprise' solutions.  That is not my
perspective, just be aware that in some large corporations it is an uphill
battle to deploy Cassandra  in the first place given incumbent systems.

In every situation I've been in, even outside of large corporations, we
would need to disable this feature given the sensitivity of the data.

All that said... I would love to see this data. ;)
I'd love to know where our deployment lies on the spectrum of use.

Maybe a good old fashioned web form that allows companies to submit their
usage scenarios might accomplish the same goal? (and you could get
additional context information about the industry, etc.)  It wouldn't be
comprehensive, but it may be sufficiently representative.  Maybe you could
just output a couple lines at server start that said something like "Go
here http://... to see how your usage compares to others."

I personally wouldn't throw to big a hissy if it was incorporated into the
actual server and on by default, but I certainly know others that would.

-brian


On Wed, Nov 16, 2011 at 7:17 AM, Eric Evans  wrote:

> On Wed, Nov 16, 2011 at 2:01 AM, Jonathan Ellis  wrote:
> > On Tue, Nov 15, 2011 at 7:02 PM, Eric Evans  wrote:
> >> I think this is potentially quite dangerous; There are a lot people
> >> who get very twitchy at the idea of software that Phones Home.  I've
> >> seen this so many times, and in all cases it was for software a lot
> >> less sensitive than a database.
> >
> > True, but unlike most Home Phoners, ours will be out there in the open
> > and you can see exactly what it's sending (or not, if you disable it).
> >  I'm sure there's other examples in the wild of this, but the only one
> > I can think of is popcorn [1].
>
> I don't think the transparency of the implementation changes things
> much.  It's still going to be opaque to a lot of folks, and more
> importantly is the precedence it sets and the way it changes the
> project/user trust relationship.
>
> Even if you're satisfied with the implementation, and trust that it
> won't be extended to transmit additional data later (unintentionally
> or otherwise), there are still very valid privacy concerns.  For
> example, seeing as how this must be transmitted over an IP network,
> there are only so many guarantees you can make with respect to
> anonymity.  There will always be *someone* that can tie the data to a
> unique IP, and an IP can almost always be tied to an individual or
> organization.  Imagine an organization that doesn't want *anyone* to
> know it uses Cassandra, and isn't willing to accept the risk that one
> of their admins might accidentally enable this reporting.
>
> It's also interesting that you mention popcon because it has always
> been contentious.  It's taken years for it to transition from the
> point where it required users to install it themselves, to a prompt at
> install-time that defaulted to "No", to the current state of an
> install-time prompt that defaults to "Yes".  And, the installer asks
> *very* few questions; Whether or not popcon is enabled is on par with
> partitioning and the assignment of a root password.
>
> Also, there should be no shame in the admission that we haven't earned
> anywhere near the level of trust and respect that the Debian project
> has.
>
> > More broadly, my sense is that people are getting used to the idea
> > that it's okay to give away anonymous statistics as part of the price
> > of "free," although YMMclearlyV. I am, after all, a Windows user. :)
>
> As privacy becomes more threatened people are either capitulating, or
> becoming even more defensive; Whether that makes it better or worse
> for us if we do this is debatable.
>
> >> I'm sure you've already considered this though, you're already talking
> >> about anonymity, and transparency, and what I assume is neutrality of
> >> the collection endpoint (can apache actually provide a VM; is that a
> >> thing?).
> >
> > Yes, they provide Ubuntu or FreeBSD VMs.
> >
> >> I'm just afraid that we'll scare people off before they can
> >> be properly convinced that it's all on the up-and-up.
> >
> > How would you propose addressing this?
>
> Honestly?  The best way to convince people that we take the privacy of
> their data seriously is to not transmit any of it to a machine outside
> their control.
>
> >> I'm curious to see what others think, but at the moment I'm hovering
> >> somewhere around

Re: AOP for SOLR Integration with Cassandra

2011-11-05 Thread Brian O'Neill
Understandable.  I'll leave it as is then in the REST layer.

-brian

On Fri, Nov 4, 2011 at 11:24 PM, Jonathan Ellis  wrote:

> On Fri, Nov 4, 2011 at 3:57 PM, Brian O'Neill 
> wrote:
> > Doing it with AOP will also allow us to move it into
> > the main codebase if/when we want to.
>
> I'm not sure I understand.  I'm definitely -1 about adding an AspectJ
> dependency or similar to core C*.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


AOP for SOLR Integration with Cassandra

2011-11-04 Thread Brian O'Neill
I just sent an email out over the users list.

Over a couple nights this week, I added SOLR integration into Virgil.
(Virgil is that REST layer that we've been building out over in Apache
Extras)

I just wanted to through an idea out to the dev list...
I plan to migrate the current implementation in Virgil to use AOP.  That
will provide a good separation of concerns between Cassandra Storage and
the SOLR indexing.  Doing it with AOP will also allow us to move it into
the main codebase if/when we want to.  We would simply move the AOP to
surround CassandraServer. (or lower... even down into Storage)

Let me know if you think that is worth exploring further.

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Contribution: Native REST Layer for Cassandra

2011-10-18 Thread Brian O'Neill
Jeremy/Jonathan,

When you finish celebrating the 1.0 release, I just submitted a native rest
layer for Cassandra.
https://issues.apache.org/jira/browse/CASSANDRA-3380

It uses JAX-RS and Apache CXF supporting the following operations (JSON over
HTTP):


   - Create keyspace
   - Drop keyspace
   - Create column family
   - Drop column family
   - Insert row
   - Fetch row
   - Delete row
   - Insert column
   - Fetch column
   - Delete column

This is a new module under contrib/rest. It builds using ant and ivy.  I
also included a maven pom.xml file that makes it easier to get setup in
Eclipse for those that use m2eclipse.  You start the server with
bin/rest_cassandra.  After that, you can issue all commands over HTTP on
port 8080.  I included example curl commands in the README.txt.  There are
junit tests that provide good code coverage of the JSON marshalling, the
system and data operations as well as the REST layer.

Let me know if you have any trouble building / using it.  In the meantime,
I'll start work on some additional todo's. Specifically we should add:
- Better exception handling
- Host/Port configuration
- Security
- XML support
- Binary object / Byte support (assumes String's right now)

(kudos to Gary Dusbabek for the initial thought to implement this as a
native layer)

all the best,
brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: Eclipse style/formatting file?

2011-10-13 Thread Brian O'Neill
Perfect. Thanks.

-brian

On Thu, Oct 13, 2011 at 11:13 AM, Feiyi Wang  wrote:

> How about this?
> https://github.com/tjake/cassandra-style-eclipse
>
> Feiyi
>
>
> On Thu, Oct 13, 2011 at 10:58 AM, Brian O'Neill  >wrote:
>
> > All,
> >
> > Anyone have an eclipse style/formatting file compatible with the
> Cassandra
> > code?
> >
> > I don't see one here:
> > http://wiki.apache.org/cassandra/RunningCassandraInEclipse
> >
> > (I'm trying to get the REST API in a good state for contribution)
> >
> > thanks,
> > brian
> >
> > --
> > Brian ONeill
> > Lead Architect, Health Market Science (http://healthmarketscience.com)
> > mobile:215.588.6024
> > blog: http://weblogs.java.net/blog/boneill42/
> > blog: http://brianoneill.blogspot.com/
> >
>



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Eclipse style/formatting file?

2011-10-13 Thread Brian O'Neill
All,

Anyone have an eclipse style/formatting file compatible with the Cassandra
code?

I don't see one here:
http://wiki.apache.org/cassandra/RunningCassandraInEclipse

(I'm trying to get the REST API in a good state for contribution)

thanks,
brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: REST API?

2011-10-11 Thread Brian O'Neill
To give everyone an update...

I was able to take what Gary had and update it to run on trunk.
I like the native integration, as opposed to layering it on top of Hector.
 It's working out well.
I layered in JAX-RS to replace the hand parsing of the url, and the
handlers.
I have reads and writes working through the StorageProxy, but I think I'm
going to raise it up one layer to take advantage of ThriftValidations.
(but still using direct method invocation instead of the thift client)
I added unit tests for the read/write of columns.

I'm going to add a few other operations (add/drop keyspace, add/drop CF).
Then it should be in a state where I can share it.

-brian



On Mon, Oct 10, 2011 at 10:06 PM, Jeremy Hanna
wrote:

> Brian,
>
> If you end up doing something with the rest api and making it
> available/open source, please post again either here or on the user list.  I
> think others would be interested and may contribute to it.
>
> Cheers,
>
> Jeremy
>
> On Oct 10, 2011, at 8:42 PM, Brian O'Neill wrote:
>
> > Thanks Gary. Perfect.  Checking it out now.
> >
> > Performance isn't much of a concern for us through the REST interface.
>  We
> > are using the Hadoop/PIG integration to do the heavy lifting.  This will
> be
> > mostly for reads and small number of writes.
> >
> > I'll definitely give this a try.  Thanks again.  I'll let you know how it
> > turns out.
> >
> > -brian
> >
> > On Mon, Oct 10, 2011 at 9:35 PM, Gary Dusbabek 
> wrote:
> >
> >> It turns out that it is pretty easy (or it was a year ago) to replace
> >> the native Cassandra transport with your own.  I wrote about it on my
> >> blog (http://www.onemanclapping.org/2010/09/restful-cassandra.html),
> >> using REST as an example.
> >>
> >>
> >> On Mon, Oct 10, 2011 at 20:12, Brian O'Neill 
> >> wrote:
> >>> My team desperately needs a REST API for Cassandra.
> >>>
> >>> I saw the following:
> >>> http://code.google.com/p/restish/
> >>> from
> >>>
> >>
> http://crlog.info/2011/01/29/restish-wrapper-for-hectorcassandra-data-manipulation/
> >>>
> >>> But it appears to have little activity and documentation.
> >>>
> >>> That lead me to start work on a contrib/rest module, but before I get
> to
> >> far
> >>> I wanted to ask if there was any effort underway for a REST Server/API.
> >>> If not, I'll continue developing the REST server.  Any preference for a
> >> REST
> >>> stack?  (JAX-RS on Apache-CXF?  Raw Servlets? Netty? etc.)
> >>>
> >>> Until I hear back, I'll continue with the JAX-RS / Apache CXF
> >> implementation
> >>> I have cooking.
> >>>
> >>> -brian
> >>>
> >>> --
> >>> Brian ONeill
> >>> Lead Architect, Health Market Science (http://healthmarketscience.com)
> >>> mobile:215.588.6024
> >>> blog: http://weblogs.java.net/blog/boneill42/
> >>> blog: http://brianoneill.blogspot.com/
> >>>
> >>
> >
> >
> >
> > --
> > Brian ONeill
> > Lead Architect, Health Market Science (http://healthmarketscience.com)
> > mobile:215.588.6024
> > blog: http://weblogs.java.net/blog/boneill42/
> > blog: http://brianoneill.blogspot.com/
>
>


-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: REST API?

2011-10-10 Thread Brian O'Neill
Will do.   I've picked up where Gary left off.  It is good starting point,
with a good mapping between REST and get/set/mutations. (kudos to Gary)
 I'll update it to accomodate any changes and see if I can add some tests on
top of it.

I may look to add in JAX-RS (on either Jersey or Apache CXF).  We use it for
all of our REST services, and it may provide a good abstraction layer that
we can build on.

Give me a couple days.
I have to get back into the "ant mentality". I've been doing maven too long.
BTW -- Does anyone know if there are plans to move to maven?
(Not trying to start a religious war, just curious. ;)

-brian

On Mon, Oct 10, 2011 at 10:06 PM, Jeremy Hanna
wrote:

> Brian,
>
> If you end up doing something with the rest api and making it
> available/open source, please post again either here or on the user list.  I
> think others would be interested and may contribute to it.
>
> Cheers,
>
> Jeremy
>
> On Oct 10, 2011, at 8:42 PM, Brian O'Neill wrote:
>
> > Thanks Gary. Perfect.  Checking it out now.
> >
> > Performance isn't much of a concern for us through the REST interface.
>  We
> > are using the Hadoop/PIG integration to do the heavy lifting.  This will
> be
> > mostly for reads and small number of writes.
> >
> > I'll definitely give this a try.  Thanks again.  I'll let you know how it
> > turns out.
> >
> > -brian
> >
> > On Mon, Oct 10, 2011 at 9:35 PM, Gary Dusbabek 
> wrote:
> >
> >> It turns out that it is pretty easy (or it was a year ago) to replace
> >> the native Cassandra transport with your own.  I wrote about it on my
> >> blog (http://www.onemanclapping.org/2010/09/restful-cassandra.html),
> >> using REST as an example.
> >>
> >>
> >> On Mon, Oct 10, 2011 at 20:12, Brian O'Neill 
> >> wrote:
> >>> My team desperately needs a REST API for Cassandra.
> >>>
> >>> I saw the following:
> >>> http://code.google.com/p/restish/
> >>> from
> >>>
> >>
> http://crlog.info/2011/01/29/restish-wrapper-for-hectorcassandra-data-manipulation/
> >>>
> >>> But it appears to have little activity and documentation.
> >>>
> >>> That lead me to start work on a contrib/rest module, but before I get
> to
> >> far
> >>> I wanted to ask if there was any effort underway for a REST Server/API.
> >>> If not, I'll continue developing the REST server.  Any preference for a
> >> REST
> >>> stack?  (JAX-RS on Apache-CXF?  Raw Servlets? Netty? etc.)
> >>>
> >>> Until I hear back, I'll continue with the JAX-RS / Apache CXF
> >> implementation
> >>> I have cooking.
> >>>
> >>> -brian
> >>>
> >>> --
> >>> Brian ONeill
> >>> Lead Architect, Health Market Science (http://healthmarketscience.com)
> >>> mobile:215.588.6024
> >>> blog: http://weblogs.java.net/blog/boneill42/
> >>> blog: http://brianoneill.blogspot.com/
> >>>
> >>
> >
> >
> >
> > --
> > Brian ONeill
> > Lead Architect, Health Market Science (http://healthmarketscience.com)
> > mobile:215.588.6024
> > blog: http://weblogs.java.net/blog/boneill42/
> > blog: http://brianoneill.blogspot.com/
>
>


-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: REST API?

2011-10-10 Thread Brian O'Neill
Thanks Gary. Perfect.  Checking it out now.

Performance isn't much of a concern for us through the REST interface.  We
are using the Hadoop/PIG integration to do the heavy lifting.  This will be
mostly for reads and small number of writes.

I'll definitely give this a try.  Thanks again.  I'll let you know how it
turns out.

-brian

On Mon, Oct 10, 2011 at 9:35 PM, Gary Dusbabek  wrote:

> It turns out that it is pretty easy (or it was a year ago) to replace
> the native Cassandra transport with your own.  I wrote about it on my
> blog (http://www.onemanclapping.org/2010/09/restful-cassandra.html),
> using REST as an example.
>
>
> On Mon, Oct 10, 2011 at 20:12, Brian O'Neill 
> wrote:
> > My team desperately needs a REST API for Cassandra.
> >
> > I saw the following:
> > http://code.google.com/p/restish/
> > from
> >
> http://crlog.info/2011/01/29/restish-wrapper-for-hectorcassandra-data-manipulation/
> >
> > But it appears to have little activity and documentation.
> >
> > That lead me to start work on a contrib/rest module, but before I get to
> far
> > I wanted to ask if there was any effort underway for a REST Server/API.
> > If not, I'll continue developing the REST server.  Any preference for a
> REST
> > stack?  (JAX-RS on Apache-CXF?  Raw Servlets? Netty? etc.)
> >
> > Until I hear back, I'll continue with the JAX-RS / Apache CXF
> implementation
> > I have cooking.
> >
> > -brian
> >
> > --
> > Brian ONeill
> > Lead Architect, Health Market Science (http://healthmarketscience.com)
> > mobile:215.588.6024
> > blog: http://weblogs.java.net/blog/boneill42/
> > blog: http://brianoneill.blogspot.com/
> >
>



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


REST API?

2011-10-10 Thread Brian O'Neill
My team desperately needs a REST API for Cassandra.

I saw the following:
http://code.google.com/p/restish/
from
http://crlog.info/2011/01/29/restish-wrapper-for-hectorcassandra-data-manipulation/

But it appears to have little activity and documentation.

That lead me to start work on a contrib/rest module, but before I get to far
I wanted to ask if there was any effort underway for a REST Server/API.
If not, I'll continue developing the REST server.  Any preference for a REST
stack?  (JAX-RS on Apache-CXF?  Raw Servlets? Netty? etc.)

Until I hear back, I'll continue with the JAX-RS / Apache CXF implementation
I have cooking.

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Patch for Contrib/Pig to accommodate refactoring of hexToBytes

2011-10-10 Thread Brian O'Neill
Jonathan,

We need a small update to contrib/pig to accommodate pulling hexToBytes out
of FBUtilities into Hex.
I raised an issue, and attached is the patch for trunk.

https://issues.apache.org/jira/browse/CASSANDRA-3341

-brian


-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/
Index: src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java
===
--- src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java  
(revision 1181048)
+++ src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java  
(working copy)
@@ -26,7 +26,7 @@
 import org.apache.cassandra.db.marshal.IntegerType;
 import org.apache.cassandra.db.marshal.TypeParser;
 import org.apache.cassandra.thrift.*;
-import org.apache.cassandra.utils.FBUtilities;
+import org.apache.cassandra.utils.Hex;
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
 
@@ -601,7 +601,7 @@
 TSerializer serializer = new TSerializer(new 
TBinaryProtocol.Factory());
 try
 {
-return FBUtilities.bytesToHex(serializer.serialize(cfDef));
+return Hex.bytesToHex(serializer.serialize(cfDef));
 }
 catch (TException e)
 {
@@ -616,7 +616,7 @@
 CfDef cfDef = new CfDef();
 try
 {
-deserializer.deserialize(cfDef, FBUtilities.hexToBytes(st));
+deserializer.deserialize(cfDef, Hex.hexToBytes(st));
 }
 catch (TException e)
 {