Re: Thanks for all the fish.
+1, props to the giant on whose shoulders we stand. -- Brian O'Neill Principal Architect @ Monetate m: 215.588.6024 bone...@monetate.com <mailto:bone...@monetate.com> Is desktop dead? Find out in Monetate's Ecommerce Quarterly Report (Q1 2016) <http://info.monetate.com/EQ1_2016.html?utm_source=ibm&utm_medium=email-footer&utm_campaign=organic> > On Aug 19, 2016, at 4:29 PM, Brandon Williams wrote: > > If there is one thing I am damn sure of, it's that I wouldn't be here > without Jonathan's leadership and friendship. Thank you for all you've > done, old buddy. > > Kind Regards, > Brandon > > On Fri, Aug 19, 2016 at 2:20 PM, Michael Kjellman < > mkjell...@internalcircle.com> wrote: > >> Just wanted to say thank you publicly to Jonathan Ellis for his tireless >> work making this community and software what it is. He's always been level >> headed and I certainly wouldn't be where I am without his leadership. >> >> So, Jonathan, thanks for all the fish. >> >> best, >> kjellman >> >> Sent from my iPhone >>
Re: Wrap around CQL queries for token ranges?
Looks like the java-driver supplies the hack I need. (TokenRange.unwrap) I¹ll leave it to you guys to decide if it is more elegant to support wrapping natively in CQL. -brian --- Brian O'Neill Chief Technology Officer Health Market Science, a LexisNexis Company 215.588.6024 Mobile @boneill42 <http://www.twitter.com/boneill42> This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. From: Brian O'Neill Date: Monday, May 11, 2015 at 12:32 PM To: "dev@cassandra.apache.org" Subject: Wrap around CQL queries for token ranges? I was doing some testing around data locality today (and adding it to our distributed processing layer). I retrieved all of the TokenRanges back using: tokenRanges = metadata.getTokenRanges(keyspace, localhost) And when I spun through the token ranges returned, I ended up missing records. The root cause was the ³edge case² where the ring wraps around. It generated the following CQL query: (using the last token range) cqlsh> SELECT token(id),id,name FROM test_keyspace.test_table WHERE token(id)>8743874685407455894 AND token(id)<=-8851282698028303387; (0 rows) cqlsh> SELECT token(id),id,name FROM test_keyspace.test_table WHERE token(id)<=-8851282698028303387 AND token(id)>-9223372036854775808; token(id)| id | name --++ -9157060164899361011 | 23 | name23 -9108684050423740263 | 53 | name53 -9084883821289052775 | 91 | name91 (3 rows) NOTE: If I use Long.MAX_VALUE instead, I get the records. I can hack this at the app layer, to issue separate queries for the wrap around case, but I wonder if CQL should support wrap around queries??? -brian --- Brian O'Neill Chief Technology Officer Health Market Science, a LexisNexis Company 215.588.6024 Mobile @boneill42 <http://www.twitter.com/boneill42> This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.
Wrap around CQL queries for token ranges?
I was doing some testing around data locality today (and adding it to our distributed processing layer). I retrieved all of the TokenRanges back using: tokenRanges = metadata.getTokenRanges(keyspace, localhost) And when I spun through the token ranges returned, I ended up missing records. The root cause was the ³edge case² where the ring wraps around. It generated the following CQL query: (using the last token range) cqlsh> SELECT token(id),id,name FROM test_keyspace.test_table WHERE token(id)>8743874685407455894 AND token(id)<=-8851282698028303387; (0 rows) cqlsh> SELECT token(id),id,name FROM test_keyspace.test_table WHERE token(id)<=-8851282698028303387 AND token(id)>-9223372036854775808; token(id)| id | name --++ -9157060164899361011 | 23 | name23 -9108684050423740263 | 53 | name53 -9084883821289052775 | 91 | name91 (3 rows) NOTE: If I use Long.MAX_VALUE instead, I get the records. I can hack this at the app layer, to issue separate queries for the wrap around case, but I wonder if CQL should support wrap around queries??? -brian --- Brian O'Neill Chief Technology Officer Health Market Science, a LexisNexis Company 215.588.6024 Mobile @boneill42 <http://www.twitter.com/boneill42> This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.
Re: Conditional Update Code?
Interesting, I just saw the function definition stuff in AggregationTest. I’ll dig in there. It seems like we could re-use those functions for conditional updates? -brian --- Brian O'Neill Chief Technology Officer Health Market Science, a LexisNexis Company 215.588.6024 Mobile • @boneill42 <http://www.twitter.com/boneill42> This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. On 3/4/15, 12:50 PM, "Brian O'Neill" wrote: > >Finally getting to this... > >For the UDF, javascript? > >-brian > >--- >Brian O'Neill >Chief Technology Officer >Health Market Science, a LexisNexis Company >215.588.6024 Mobile • @boneill42 <http://www.twitter.com/boneill42> > >This information transmitted in this email message is for the intended >recipient only and may contain confidential and/or privileged material. >If >you received this email in error and are not the intended recipient, or >the person responsible to deliver it to the intended recipient, please >contact the sender at the email above and delete this email and any >attachments and destroy any copies thereof. Any review, retransmission, >dissemination, copying or other use of, or taking any action in reliance >upon, this information by persons or entities other than the intended >recipient is strictly prohibited. > > > > > >On 2/6/15, 9:50 AM, "Benedict Elliott Smith" >wrote: > >>It's quite possible support could be added to evaluate a UDF as part of >>the >>condition check. The code you're looking for are implementors of >>CASRequest.appliesTo(), in CQL3CasRequest and >>CassandraServer.ThriftCASRequest >> >>It seems like https://issues.apache.org/jira/browse/CASSANDRA-8488 would >>offer the basic functionality, except that it is expected to require >>ALLOW >>FILTERING, which is unlikely to be permitted for a CAS operation, since >>the >>implication is that the work is too expensive for normal use. Such a >>constraint is probably not necessary if a clustering prefix is provided, >>though (i.e. a full CQL row key). >> >>On Fri, Feb 6, 2015 at 2:38 PM, Brian O'Neill >>wrote: >> >>> >>> All, >>> >>> I¹m just looking for a little directionŠ >>> >>> Anyone know where I can find the code that checks the condition in a >>> conditional update? >>> We¹d love to have more expressive conditions, beyond just equality. >>>(e.g. >>> column contains? value) >>> >>> I wanted to see how hard this would be to contribute. >>> Is such a JIRA already open? >>> >>> -brian >>> >>> --- >>> Brian O'Neill >>> Chief Technology Officer >>> Health Market Science, a LexisNexis Company >>> 215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42> >>> >>> >>> This information transmitted in this email message is for the intended >>> recipient only and may contain confidential and/or privileged >>>material. >>>If >>> you received this email in error and are not the intended recipient, >>>or >>>the >>> person responsible to deliver it to the intended recipient, please >>>contact >>> the sender at the email above and delete this email and any >>>attachments >>>and >>> destroy any copies thereof. Any review, retransmission, dissemination, >>> copying or other use of, or taking any action in reliance upon, this >>> information by persons or entities other than the intended recipient is >>> strictly prohibited. >>> >>> >>>
Re: Conditional Update Code?
Finally getting to this... For the UDF, javascript? -brian --- Brian O'Neill Chief Technology Officer Health Market Science, a LexisNexis Company 215.588.6024 Mobile • @boneill42 <http://www.twitter.com/boneill42> This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. On 2/6/15, 9:50 AM, "Benedict Elliott Smith" wrote: >It's quite possible support could be added to evaluate a UDF as part of >the >condition check. The code you're looking for are implementors of >CASRequest.appliesTo(), in CQL3CasRequest and >CassandraServer.ThriftCASRequest > >It seems like https://issues.apache.org/jira/browse/CASSANDRA-8488 would >offer the basic functionality, except that it is expected to require ALLOW >FILTERING, which is unlikely to be permitted for a CAS operation, since >the >implication is that the work is too expensive for normal use. Such a >constraint is probably not necessary if a clustering prefix is provided, >though (i.e. a full CQL row key). > >On Fri, Feb 6, 2015 at 2:38 PM, Brian O'Neill >wrote: > >> >> All, >> >> I¹m just looking for a little directionŠ >> >> Anyone know where I can find the code that checks the condition in a >> conditional update? >> We¹d love to have more expressive conditions, beyond just equality. >>(e.g. >> column contains? value) >> >> I wanted to see how hard this would be to contribute. >> Is such a JIRA already open? >> >> -brian >> >> --- >> Brian O'Neill >> Chief Technology Officer >> Health Market Science, a LexisNexis Company >> 215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42> >> >> >> This information transmitted in this email message is for the intended >> recipient only and may contain confidential and/or privileged material. >>If >> you received this email in error and are not the intended recipient, or >>the >> person responsible to deliver it to the intended recipient, please >>contact >> the sender at the email above and delete this email and any attachments >>and >> destroy any copies thereof. Any review, retransmission, dissemination, >> copying or other use of, or taking any action in reliance upon, this >> information by persons or entities other than the intended recipient is >> strictly prohibited. >> >> >> >>
Re: Conditional Update Code?
Perfect. Thanks. Let me see what I can cook up as a PoC. The specific use case we are looking to address is for real-time aggregations, done in memory, then periodically flushed to C*. (e.g. every 30 seconds) (similar to what Druid does, but native on top of C*) In this scenario, we aggregate app-side for a specific time slice/partition of data. We want to update the aggregate value only if that time slice/partition has not already been incorporated into the value. If we have a native way to check to see if the partition was already incorporated as part of the conditional update, it will simplify the app layer. -brian --- Brian O'Neill Chief Technology Officer Health Market Science, a LexisNexis Company 215.588.6024 Mobile • @boneill42 <http://www.twitter.com/boneill42> This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. On 2/6/15, 9:50 AM, "Benedict Elliott Smith" wrote: >It's quite possible support could be added to evaluate a UDF as part of >the >condition check. The code you're looking for are implementors of >CASRequest.appliesTo(), in CQL3CasRequest and >CassandraServer.ThriftCASRequest > >It seems like https://issues.apache.org/jira/browse/CASSANDRA-8488 would >offer the basic functionality, except that it is expected to require ALLOW >FILTERING, which is unlikely to be permitted for a CAS operation, since >the >implication is that the work is too expensive for normal use. Such a >constraint is probably not necessary if a clustering prefix is provided, >though (i.e. a full CQL row key). > >On Fri, Feb 6, 2015 at 2:38 PM, Brian O'Neill >wrote: > >> >> All, >> >> I¹m just looking for a little directionŠ >> >> Anyone know where I can find the code that checks the condition in a >> conditional update? >> We¹d love to have more expressive conditions, beyond just equality. >>(e.g. >> column contains? value) >> >> I wanted to see how hard this would be to contribute. >> Is such a JIRA already open? >> >> -brian >> >> --- >> Brian O'Neill >> Chief Technology Officer >> Health Market Science, a LexisNexis Company >> 215.588.6024 Mobile € @boneill42 <http://www.twitter.com/boneill42> >> >> >> This information transmitted in this email message is for the intended >> recipient only and may contain confidential and/or privileged material. >>If >> you received this email in error and are not the intended recipient, or >>the >> person responsible to deliver it to the intended recipient, please >>contact >> the sender at the email above and delete this email and any attachments >>and >> destroy any copies thereof. Any review, retransmission, dissemination, >> copying or other use of, or taking any action in reliance upon, this >> information by persons or entities other than the intended recipient is >> strictly prohibited. >> >> >> >>
Conditional Update Code?
All, I¹m just looking for a little direction Anyone know where I can find the code that checks the condition in a conditional update? We¹d love to have more expressive conditions, beyond just equality. (e.g. column contains? value) I wanted to see how hard this would be to contribute. Is such a JIRA already open? -brian --- Brian O'Neill Chief Technology Officer Health Market Science, a LexisNexis Company 215.588.6024 Mobile @boneill42 <http://www.twitter.com/boneill42> This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.
Re: Refactoring cassandra service package
Interesting proposition. We¹ve embedded Cassandra a few times, so I¹d be interested in an approach that makes that easier. Is there a way to do it incrementally? Introduce the injection framework, and convert a few classes (those required for startup), then slowly convert the remainder? peanut gallery, -brian --- Brian O'Neill Chief Technology Officer Health Market Science The Science of Better Results 2700 Horizon Drive King of Prussia, PA 19406 M: 215.588.6024 @boneill42 <http://www.twitter.com/boneill42> healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. On 6/3/14, 1:59 PM, "Gary Dusbabek" wrote: >On Tue, Jun 3, 2014 at 3:52 AM, Simon Chemouil >wrote: > >> Hi, >> >> I'm new to Cassandra and felt like exploring and hacking on the code. I >> was surprised to see the usage of so many mutable/global state statics >> all over the service package (basically global variables/singletons). >> >> While I understand it can be practical to work with singletons, and that >> in any case I'm not sure multi-tenant Cassandra (as in two different >> Cassandra instances within the same process) would make sense at all (or >> even work considering there is some native access going on with JNA), I >> find static state can easily lead to tangled 'spaghetti' code (accessing >> the singletons from anywhere, even where one shouldn't), and in general >> it ties the code to the VM instance, rather than to the class. >> >> I tried to find if it was an actual design choice, but from my >> understanding this is more something inherited from the early Cassandra >> times at Facebook. I just found this thread[1] pointing to issue >> CASSANDRA-741 (slightly more limited scope) that was marked as WONTFIX >> because no one took it (but still marked as open for contribution). The >> current code conventions also don't mention the usage of singletons >> except by stating: "Do not extract interfaces (or abstract classes) >> unless you actually need multiple implementations of it" (switching to a >> "service"-style design doesn't require passing interfaces but it's >> highly encouraged to help testability). >> >> So, I'd like to try to make this refactoring happen and remove all (or >> most) mutable static state. It would be an easy way in for me in >> Cassandra's internals (maybe to contribute further). I think it would >> help testing (ability to unit test components without going to the >> storage for instance) and in general modernize the code. It would also >> make hacking on Cassandra easier because people could pick different >> pieces without pulling the whole thing. >> >> It would definitely break backwards compatibility with current Java code >> that directly embeds Cassandra / uses it as a library, but I would keep >> the same abstraction so the refactoring would be easy. In any case, >> backwards compatibility can be broken by many more changes than just >> refactoring, and once this is done it will be easier to deal with >> backwards compatibility. >> >> Obviously all ".instance" fields would be gone, and I'd try to fix >> potential cyclic class dependencies and generally make sure classes >> dependencies form a direct acyclic graph with CassandraDaemon as its >> root. The basic idea is to have each 'service' component require all its >> service dependencies in their constructor (and keeping them as a final >> field), rather than getting them via the global namespace (singleton >> instances). >> >> If I had it my way, I'd probably use a dependency injection framework, >> namely Dagger which is as far as I knpw the lightest Java DI framework >> actively developed (jointly developed by Square and Google's Java team >> responsible for Guice & Guava), which has a neat compile-time annotation >> processor that detects missing dependencies early on. It works with both >> Android and J2SE and is very fast, simple and light (65kB vs 710kB for >> Guice). >> >> So, the
NPE in conditional updates w/ collections in 2.0.7
OK we¹ve got some hyper data modeling going on, taking advantage of all the latest toys in CQL 2. And we ran into some trouble using maps within conditional updates. Specifically, when testing to see if a key exists in a map (with =null?), we encounter an NPE server-side. We believe this worked in 2.0.4. With this schema: CREATE TABLE progress ( key text, count int, partitions map, primary key (key) ); When executing the following: cqlsh:hms> UPDATE foo SET count=4962 WHERE key='PA' IF partitions['a']=null; [applied] --- False cqlsh:hms> UPDATE foo SET count=4962 WHERE key='PA'; cqlsh:hms> UPDATE foo SET count=4962 WHERE key='PA' IF partitions['a']=null; TSocket read 0 bytes We see the following NPE server-side: ERROR [Native-Transport-Requests:13353] 2014-05-15 15:10:00,154 QueryMessage.java (line 131) Unexpected error during query java.lang.NullPointerException at org.apache.cassandra.cql3.ColumnCondition$WithVariables.collectionAppliesTo( ColumnCondition.java:168) at org.apache.cassandra.cql3.ColumnCondition$WithVariables.appliesTo(ColumnCond ition.java:142) at org.apache.cassandra.cql3.statements.CQL3CasConditions$ColumnsConditions.app liesTo(CQL3CasConditions.java:197) at org.apache.cassandra.cql3.statements.CQL3CasConditions.appliesTo(CQL3CasCond itions.java:108) Is there a better way to test for existence of a key? Or is this a bug? (Regardless, we may want to protect against the NPE) Or am I missing something entirely? -brian --- Brian O'Neill Chief Technology Officer Health Market Science The Science of Better Results 2700 Horizon Drive King of Prussia, PA 19406 M: 215.588.6024 @boneill42 <http://www.twitter.com/boneill42> healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.
Re: NPE in conditional updates w/ collections in 2.0.7
Perfect. Thanks Tyler. Great to hear you guys are already on top of it. I’ll watch for the resolution. -brian --- Brian O'Neill Chief Technology Officer Health Market Science The Science of Better Results 2700 Horizon Drive • King of Prussia, PA • 19406 M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42> • healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. On 5/16/14, 12:25 PM, "Tyler Hobbs" wrote: >Hi Brian, > >Thanks for the report. This looks like >https://issues.apache.org/jira/browse/CASSANDRA-7155, which should be >fixed >shortly. > > >On Thu, May 15, 2014 at 3:23 PM, Brian O'Neill >wrote: > >> >> OK ‹ we¹ve got some hyper data modeling going on, taking advantage of >>all >> the latest toys in CQL 2. And we ran into some trouble using maps >>within >> conditional updates. Specifically, when testing to see if a key exists >>in >> a >> map (with =null?), we encounter an NPE server-side. We believe this >>worked >> in 2.0.4. >> >> With this schema: >> CREATE TABLE progress ( >> key text, >> count int, >> partitions map, >> primary key (key) >> ); >> >> When executing the following: >> cqlsh:hms> UPDATE foo SET count=4962 WHERE key='PA' IF >> partitions['a']=null; >> >> [applied] >> --- >> False >> >> cqlsh:hms> UPDATE foo SET count=4962 WHERE key='PA'; >> cqlsh:hms> UPDATE foo SET count=4962 WHERE key='PA' IF >> partitions['a']=null; >> TSocket read 0 bytes >> >> We see the following NPE server-side: >> ERROR [Native-Transport-Requests:13353] 2014-05-15 15:10:00,154 >> QueryMessage.java (line 131) Unexpected error during query >> java.lang.NullPointerException >> at >> >> >>org.apache.cassandra.cql3.ColumnCondition$WithVariables.collectionApplies >>To( >> ColumnCondition.java:168) >> at >> >> >>org.apache.cassandra.cql3.ColumnCondition$WithVariables.appliesTo(ColumnC >>ond >> ition.java:142) >> at >> >> >>org.apache.cassandra.cql3.statements.CQL3CasConditions$ColumnsConditions. >>app >> liesTo(CQL3CasConditions.java:197) >> at >> >> >>org.apache.cassandra.cql3.statements.CQL3CasConditions.appliesTo(CQL3CasC >>ond >> itions.java:108) >> >> Is there a better way to test for existence of a key? >> Or is this a bug? (Regardless, we may want to protect against the NPE) >> Or am I missing something entirely? >> >> -brian >> >> --- >> Brian O'Neill >> Chief Technology Officer >> >> >> Health Market Science >> The Science of Better Results >> 2700 Horizon Drive € King of Prussia, PA € 19406 >> M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42> € >> healthmarketscience.com >> >> >> This information transmitted in this email message is for the intended >> recipient only and may contain confidential and/or privileged material. >>If >> you received this email in error and are not the intended recipient, or >>the >> person responsible to deliver it to the intended recipient, please >>contact >> the sender at the email above and delete this email and any attachments >>and >> destroy any copies thereof. Any review, retransmission, dissemination, >> copying or other use of, or taking any action in reliance upon, this >> information by persons or entities other than the intended recipient is >> strictly prohibited. >> >> >> >> > > >-- >Tyler Hobbs >DataStax <http://datastax.com/>
Re: Proposal: freeze Thrift starting with 2.1.0
I¹m +1. We¹ve had one foot out the door for a while now. We are throwing resources at CQL. (e.g. storm-cassandra-cql) And we are slowing support for the thrift-based implementation (e.g. storm-cassandra). Alas poor Thrift, I knew him (well). -brian --- Brian O'Neill Chief Technology Officer Health Market Science The Science of Better Results 2700 Horizon Drive King of Prussia, PA 19406 M: 215.588.6024 @boneill42 <http://www.twitter.com/boneill42> healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. On 3/11/14, 12:27 PM, "sankalp kohli" wrote: >RIP Thrift :) >+1 with "We will retain it for backwards compatibility". Hopefully most >people will move out of thrift by 2.1 > > >On Tue, Mar 11, 2014 at 10:18 AM, Brandon Williams >wrote: > >> As someone who has written a thrift wrapper, +1 >> >> >> On Tue, Mar 11, 2014 at 12:00 PM, Jonathan Ellis >> wrote: >> >> > CQL3 is almost two years old now and has proved to be the better API >> > that Cassandra needed. CQL drivers have caught up with and passed the >> > Thrift ones in terms of features, performance, and usability. CQL is >> > easier to learn and more productive than Thrift. >> > >> > With static columns and LWT batch support [1] landing in 2.0.6, and >> > UDT in 2.1 [2], I don't know of any use cases for Thrift that can't be >> > done in CQL. Contrawise, CQL makes many things easy that are >> > difficult to impossible in Thrift. New development is overwhelmingly >> > done using CQL. >> > >> > To date we have had an unofficial and poorly defined policy of "add >> > support for new features to Thrift when that is 'easy.'" However, >> > even relatively simple Thrift changes can create subtle complications >> > for the rest of the server; for instance, allowing Thrift range >> > tombtones would make filter conversion for CASSANDRA-6506 more >> > difficult. >> > >> > Thus, I think it's time to officially close the book on Thrift. We >> > will retain it for backwards compatibility, but we will commit to >> > adding no new features or changes to the Thrift API after 2.1.0. This >> > will help send an unambiguous message to users and eliminate any >> > remaining confusion from supporting two APIs. If any new use cases >> > come to light that can be done with Thrift but not CQL, we will commit >> > to supporting those in CQL. >> > >> > (To a large degree, this merely formalizes what is already de facto >> > reality. Most thrift clients have not even added support for >> > atomic_batch_mutate and cas from 2.0, and popular clients like >> > Astyanax are migrating to the native protocol.) >> > >> > Reasonable? >> > >> > [1] https://issues.apache.org/jira/browse/CASSANDRA-6561 >> > [2] https://issues.apache.org/jira/browse/CASSANDRA-5590 >> > >> > -- >> > Jonathan Ellis >> > Project Chair, Apache Cassandra >> > co-founder, http://www.datastax.com >> > @spyced >> > >>
Re: "[applied]" column in ModificationStatement?
Thanks Jonathan. It feels a little weird, but that will work. Not a big deal, but maybe we could include a wasApplied() method on the ResultSet in the future that would insulate clients from the ResultSet schema/column name. -brian --- Brian O'Neill Chief Technology Officer Health Market Science The Science of Better Results 2700 Horizon Drive • King of Prussia, PA • 19406 M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42> • healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. On 2/6/14, 2:58 PM, "Jonathan Ellis" wrote: >In Cassandra, it's ModificationStatement.CAS_RESULT_COLUMN.text > >On Thu, Feb 6, 2014 at 10:22 AM, Brian O'Neill >wrote: >> Silly questionŠ >> >> Using the CQL driver for conditional updates, I¹m looking into the >>ResultSet >> that comes back: >> for (ColumnDefinitions.Definition definition : >> results.getColumnDefinitions().asList()) { >> for (Row row : results.all()) { >> LOG.debug("UPDATE APPLIED = [{}]=[{}]", >> definition.getName(), row.getBool(definition.getName())); >> } >> } >> >> I noticed that the ResultSet of a conditional update contains a column >> ³[applied]², with a boolean indicating whether or not the update was >> applied. >> >> I assume this column name comes from: >> >>src/java/org/apache/cassandra/cql3/statements/ModificationStatement.java: >>50 >>private static final ColumnIdentifier CAS_RESULT_COLUMN = new >> ColumnIdentifier("[applied]", false); >> >> Does it make sense to expose this column name as a String constant >> somewhere? >> Either in the CQL java-driver, or Cassandra itself? >> >> -brian >> >> --- >> Brian O'Neill >> Chief Technology Officer >> >> >> Health Market Science >> The Science of Better Results >> 2700 Horizon Drive € King of Prussia, PA € 19406 >> M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42> € >> healthmarketscience.com >> >> >> This information transmitted in this email message is for the intended >> recipient only and may contain confidential and/or privileged material. >>If >> you received this email in error and are not the intended recipient, or >>the >> person responsible to deliver it to the intended recipient, please >>contact >> the sender at the email above and delete this email and any attachments >>and >> destroy any copies thereof. Any review, retransmission, dissemination, >> copying or other use of, or taking any action in reliance upon, this >> information by persons or entities other than the intended recipient is >> strictly prohibited. >> >> >> > > > >-- >Jonathan Ellis >Project Chair, Apache Cassandra >co-founder, http://www.datastax.com >@spyced
"[applied]" column in ModificationStatement?
Silly question Using the CQL driver for conditional updates, I¹m looking into the ResultSet that comes back: for (ColumnDefinitions.Definition definition : results.getColumnDefinitions().asList()) { for (Row row : results.all()) { LOG.debug("UPDATE APPLIED = [{}]=[{}]", definition.getName(), row.getBool(definition.getName())); } } I noticed that the ResultSet of a conditional update contains a column ³[applied]², with a boolean indicating whether or not the update was applied. I assume this column name comes from: src/java/org/apache/cassandra/cql3/statements/ModificationStatement.java:50 private static final ColumnIdentifier CAS_RESULT_COLUMN = new ColumnIdentifier("[applied]", false); Does it make sense to expose this column name as a String constant somewhere? Either in the CQL java-driver, or Cassandra itself? -brian --- Brian O'Neill Chief Technology Officer Health Market Science The Science of Better Results 2700 Horizon Drive King of Prussia, PA 19406 M: 215.588.6024 @boneill42 <http://www.twitter.com/boneill42> healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.
Re: Dimensional SUM, COUNT, & DISTINCT in C* (replacing Acunu)
Thanks for the pointer Alain. At a quick glance, it looks like people are looking for query time filtering/aggregation, which will suffice for small data sets. Hopefully we might be able to extend that to perform pre-computations as well. (which would support much larger data sets / volumes) I¹ll continue the discussion on the issue. thanks again, brian --- Brian O'Neill Chief Architect Health Market Science The Science of Better Results 2700 Horizon Drive King of Prussia, PA 19406 M: 215.588.6024 @boneill42 <http://www.twitter.com/boneill42> healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. From: Alain RODRIGUEZ Reply-To: Date: Wednesday, December 18, 2013 at 5:13 AM To: Cc: "dev@cassandra.apache.org" Subject: Re: Dimensional SUM, COUNT, & DISTINCT in C* (replacing Acunu) Hi, this would indeed be much appreciated by a lot of people. There is this issue, existing about this subject https://issues.apache.org/jira/browse/CASSANDRA-4914 Maybe could you help commiters there. Hope this will be usefull to you. Please let us know when you find a way to do these operations. Cheers. 2013/12/18 Brian O'Neill > We are seeking to replace Acunu in our technology stack / platform. It is the > only component in our stack that is not open source. > > In preparation, over the last few weeks I¹ve migrated Virgil to CQL. The > vision is that Virgil could receive a REST request to upsert/delete data > (hierarchical JSON to support collections). Virgil would lookup the > dimensions/aggregations for that table, add the key to the pertinent > dimensional tables (e.g. DISTINCT), incorporate values into aggregations (e.g. > SUMs) and increment/decrement relevant counters (COUNT). (using additional > CF¹s) > > This seems straight-forward, but appears to require a read-before-write. > (e.g. read the current value of a SUM, incorporate the new value, then use the > lightweight transactions of C* 2.0 to conditionally update the value.) > > Before I go down this path, I was wondering if anyone is designing/working on > the same, perhaps at a lower level? (CQL?) > > Is there any intent to support aggregations/filters (COUNT, SUM, DISTINCT, > etc) at the CQL level? If so, is there a preliminary design? > > I can see a lower-level approach, which would leverage the commit logs (and > mem/sstables) and perform the aggregation during read-operations (and > flush/compaction). > > thoughts? i'm open to all ideas. > > -brian > -- > Brian ONeill > Chief Architect, Health Market Science (http://healthmarketscience.com) > mobile:215.588.6024 > blog: http://brianoneill.blogspot.com/ > twitter: @boneill42
Dimensional SUM, COUNT, & DISTINCT in C* (replacing Acunu)
We are seeking to replace Acunu in our technology stack / platform. It is the only component in our stack that is not open source. In preparation, over the last few weeks I’ve migrated Virgil to CQL. The vision is that Virgil could receive a REST request to upsert/delete data (hierarchical JSON to support collections). Virgil would lookup the dimensions/aggregations for that table, add the key to the pertinent dimensional tables (e.g. DISTINCT), incorporate values into aggregations (e.g. SUMs) and increment/decrement relevant counters (COUNT). (using additional CF’s) This seems straight-forward, but appears to require a read-before-write. (e.g. read the current value of a SUM, incorporate the new value, then use the lightweight transactions of C* 2.0 to conditionally update the value.) Before I go down this path, I was wondering if anyone is designing/working on the same, perhaps at a lower level? (CQL?) Is there any intent to support aggregations/filters (COUNT, SUM, DISTINCT, etc) at the CQL level? If so, is there a preliminary design? I can see a lower-level approach, which would leverage the commit logs (and mem/sstables) and perform the aggregation during read-operations (and flush/compaction). thoughts? i'm open to all ideas. -brian -- Brian ONeill Chief Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://brianoneill.blogspot.com/ twitter: @boneill42
Re: Submit enhancements via pull requests?
Thanks Jeremiah. Done. https://issues.apache.org/jira/browse/CASSANDRA-6453 -brian --- Brian O'Neill Chief Architect Health Market Science The Science of Better Results 2700 Horizon Drive • King of Prussia, PA • 19406 M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42> • healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. On 12/5/13, 10:47 AM, "Jeremiah D Jordan" wrote: >JIRA + patch or link to git branch > >-Jeremiah > >On Dec 5, 2013, at 9:44 AM, Brian O'Neill wrote: > >> >> Sorry guys, it¹s been a while since I submitted a patch. >> >> I see there are a number of outstanding pull requests: >> https://github.com/apache/cassandra/pulls >> >> Are we able to submit enhancements via pull requests on github now? >> Or are we still using JIRA + patches? >> >> (I have a very minor change to an error message that I¹d like to get in >> there) >> >> thanks, >> brian >> >> --- >> Brian O'Neill >> Chief Architect >> Health Market Science >> The Science of Better Results >> 2700 Horizon Drive € King of Prussia, PA € 19406 >> M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42> € >> healthmarketscience.com >> >> >> This information transmitted in this email message is for the intended >> recipient only and may contain confidential and/or privileged material. >>If >> you received this email in error and are not the intended recipient, or >>the >> person responsible to deliver it to the intended recipient, please >>contact >> the sender at the email above and delete this email and any attachments >>and >> destroy any copies thereof. Any review, retransmission, dissemination, >> copying or other use of, or taking any action in reliance upon, this >> information by persons or entities other than the intended recipient is >> strictly prohibited. >> >> >> >
Submit enhancements via pull requests?
Sorry guys, it¹s been a while since I submitted a patch. I see there are a number of outstanding pull requests: https://github.com/apache/cassandra/pulls Are we able to submit enhancements via pull requests on github now? Or are we still using JIRA + patches? (I have a very minor change to an error message that I¹d like to get in there) thanks, brian --- Brian O'Neill Chief Architect Health Market Science The Science of Better Results 2700 Horizon Drive King of Prussia, PA 19406 M: 215.588.6024 @boneill42 <http://www.twitter.com/boneill42> healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.
Re: Bitmap indexes - reviving CASSANDRA-1472
@Jason, I have a lot of experience with SOLR + ES, but mainly for search. (i.e. Finding the most relevant records given a query) That's been working well, but now we have requirements to support dashboards. Those dashboards have aggregations in them (sum, average, count(s), etc). I have limited experience using filter functions and facets to achieve similar things w/ Lucene, but they never seemed to perform well when the sets were large. If Lucene/SOLR/ES can support this kind of functionality, we'd gladly use it instead. (Let me know!) When we looked around, Druid seemed to fit the bill exactly: (and it was open source) http://metamarkets.com/2011/druid-part-i-real-time-analytics-at-a-billion-r ows-per-second/ BTW, here is more information on the compression that Druid uses: http://metamarkets.com/2012/druid-bitmap-compression/ To echo Matt's sentiment, we'd love to leverage a C* native capability for this. (Acunu provides most of the capability, but it isn't open source) I think once we have the "conditional write" semantics that are coming, we could layer this on top of C*. (extending the secondary indexes functionality) -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive King of Prussia, PA 19406 M: 215.588.6024 @boneill42 <http://www.twitter.com/boneill42> healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. On 4/12/13 12:46 AM, "Matt Stump" wrote: >You could embed Lucene, but then you pretty much have DSE search, and >there >are people on this list in a better position than I to describe >the difficulty in making that scale. By rolling your own you get >simplicity >and control. If you use a uniform index size you can just assign chunks of >it to the cassandra ring making it easy to distribute queries. I think >that >using Lucene in this way would cause most of the benefit of the library to >be lost, and add unnecessary complexity. If Lucene were easy, then I think >given the team's experience with both Lucene and C* it would have been >done >already. > >Sorry if it's a fuzzy answer, but I haven't run down every technical angle >on the integration with C* yet. The idea was still very much in the >wouldn't it be very cool if this thing lived in Cassandra. It would be the >nail in the coffin for impala, redshift, et al. > > >On Thu, Apr 11, 2013 at 3:15 PM, Jason Rutherglen < >jason.rutherg...@gmail.com> wrote: > >> What's the advantage over Lucene? >> >> >> On Wed, Apr 10, 2013 at 10:43 PM, Matt Stump >> wrote: >> >> > Druid was our inspiration to layer bitmap indexes on top of Cassandra. >> > Druid doesn't work for us because or data set is too large. We would >>need >> > many hundreds of nodes just for the pre-processed data. What I >>envisioned >> > was the ability to perform druid style queries (no aggregation) >>without >> the >> > limitations imposed by having the entire dataset in memory. I >>primarily >> > need to query whether a user performed some event, but I also intend >>to >> add >> > trigram indexes for LIKE, ILIKE or possibly regex style matching. >> > >> > I wasn't aware of CONCISE, thanks for the pointer. We are currently >> > evaluating fastbit, which is a very similar project: >> > https://sdm.lbl.gov/fastbit/ >> > >> > >> > On Wed, Apr 10, 2013 at 5:49 PM, Brian O'Neill > > >wrote: >> > >> > > >> > > How does this compare with Druid? >> > > https://github.com/metamx/druid >> > > >> > > We're currently evaluating Acunu, Vertica and Druid... >> > > >> > > >> > >> >>http://brianoneill.blogspot.com/2013/04/bianalytics-on-big-datacassandra. >>html >> > > >> > > With its bitmapped indexes, Druid appears to have the most >>potential. >> > > They boast some pretty impressive stats, especially WRT handling >> > > "real-time" updates and adding
Re: Bitmap indexes - reviving CASSANDRA-1472
How does this compare with Druid? https://github.com/metamx/druid We're currently evaluating Acunu, Vertica and Druid... http://brianoneill.blogspot.com/2013/04/bianalytics-on-big-datacassandra.html With its bitmapped indexes, Druid appears to have the most potential. They boast some pretty impressive stats, especially WRT handling "real-time" updates and adding new dimensions. They also use a compression algorithm, CONCISE, to cut down on the space requirements. http://ricerca.mat.uniroma3.it/users/colanton/concise.html I haven't looked too deep into the Druid code, but I've been meaning to see if it could be backed by C*. We'd be game to join the hunt if you pursue such a beast. (with your code, or with portions of Druid) -brian On Apr 10, 2013, at 5:40 PM, mrevilgnome wrote: > What do you think about set manipulation via indexes in Cassandra? I'm > interested in answering queries such as give me all users that performed > event 1, 2, and 3, but not 4. If the answer is yes than I can make a case > for spending my time on C*. The only downside for us would be our current > prototype is in C++ so we would loose some performance and the ability to > dedicate an entire machine to caching/performing queries. > > > On Wed, Apr 10, 2013 at 11:57 AM, Jonathan Ellis wrote: > >> If you mean, "Can someone help me figure out how to get started updating >> these old patches to trunk and cleaning out the Avro?" then yes, I've been >> knee-deep in indexing code recently. >> >> >> On Wed, Apr 10, 2013 at 11:34 AM, mrevilgnome >> wrote: >> >>> I'm currently building a distributed cluster on top of cassandra to >> perform >>> fast set manipulation via bitmap indexes. This gives me the ability to >>> perform unions, intersections, and set subtraction across sub-queries. >>> Currently I'm storing index information for thousands of dimensions as >>> cassandra rows, and my cluster keeps this information cached, distributed >>> and replicated in order to answer queries. >>> >>> Every couple of days I think to myself this should really exist in C*. >>> Given all the benifits would there be any interest in >>> reviving CASSANDRA-1472? >>> >>> Some downsides are that this is very memory intensive, even for sparse >>> bitmaps. >>> >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder, http://www.datastax.com >> @spyced >> -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Re: Compund/Composite column names
Sorry, just got time to submit it. Here you go: https://issues.apache.org/jira/browse/CASSANDRA-5138 -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive King of Prussia, PA 19406 M: 215.588.6024 @boneill42 <http://www.twitter.com/boneill42> healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. From: Sylvain Lebresne Date: Monday, December 17, 2012 10:35 AM To: Cc: Vivek Mishra , Brian O'Neill Subject: Re: Compund/Composite column names Feel free to open a ticket with steps to reproduce. We can certainly throw a more meaningful exception. On Mon, Dec 17, 2012 at 4:11 PM, Edward Capriolo wrote: > This was discussed in one of the tickets. The problem is that CQL3's sparse > tables is it has different metadata that has NOT been added to thrift's > CFMetaData. Thus thrift is unaware of exactly how to verify the insert. > > Originally it was made impossible for thrift to see a sparse table (but > that restriction has been lifted) it seems. It is probably a bad idea to > thrift insert into a sparse table until Cassandra does not have two > distinct sources of meta information. > > > > > > On Mon, Dec 17, 2012 at 9:52 AM, Vivek Mishra wrote: > >> > Looks like Thrift API is not working as expected? >> > >> > -Vivek >> > >> > >> > >> > >> > >> > From: Brian O'Neill >> > To: dev@cassandra.apache.org >> > Cc: Vivek Mishra >> > Sent: Monday, December 17, 2012 8:12 PM >> > Subject: Re: Compund/Composite column names >> > >> > FYI -- I'm still seeing this on 1.2-beta1. >> > >> > If you create a table via CQL, then insert into it (via Java API) with >> > an incorrect number of components. The insert works, but select * >> > from CQL results in a TSocket read error. >> > >> > I showed this in the webinar last week, just in case people ran into >> > it. It would be great to translate the ArrayIndexOutofBoundsException >> > from the server side into something meaningful in cqlsh to help people >> > diagnose the problem. (a regular user probably doesn't have access to >> > the server-side logs) >> > >> > You can see it at minute 41 in the video from the webinar: >> > http://www.youtube.com/watch?v=AdfugJxfd0o&feature=youtu.be >> > >> > -brian >> > >> > >> > On Tue, Oct 9, 2012 at 9:39 AM, Jonathan Ellis wrote: >>> > > Sounds like you're running into the keyspace drop bug. It's "mostly" >> > fixed >>> > > in 1.1.5 but you might need the latest from 1.1 branch. 1.1.6 will be >>> > > released soon with the final fix. >>> > > On Oct 9, 2012 1:58 AM, "Vivek Mishra" wrote: >>> > > >>>> > >> >>>> > >> >>>> > >> Ok. I am able to understand the problem now. Issue is: >>>> > >> >>>> > >> If i create a column family altercations as: >>>> > >> >>>> > >> >>>> > >> >> > >> * >> *8 >>>> > >> CREATE TABLE altercations ( >>>> > >>instigator text, >>>> > >>started_at timestamp, >>>> > >>ships_destroyed int, >>>> > >>energy_used float, >>>> > >>alliance_involvement boolean, >>>> > >>PRIMARY KEY (instigator,started_at,ships_destroyed) >>>> > >>); >>>> > >> / >>>> > >>INSERT INTO altercations (instigator, started_at, ships_destroyed, >>&g
Re: Compund/Composite column names
Will do. --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive King of Prussia, PA 19406 M: 215.588.6024 @boneill42 <http://www.twitter.com/boneill42> healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. From: Sylvain Lebresne Date: Monday, December 17, 2012 10:35 AM To: Cc: Vivek Mishra , Brian O'Neill Subject: Re: Compund/Composite column names Feel free to open a ticket with steps to reproduce. We can certainly throw a more meaningful exception. On Mon, Dec 17, 2012 at 4:11 PM, Edward Capriolo wrote: > This was discussed in one of the tickets. The problem is that CQL3's sparse > tables is it has different metadata that has NOT been added to thrift's > CFMetaData. Thus thrift is unaware of exactly how to verify the insert. > > Originally it was made impossible for thrift to see a sparse table (but > that restriction has been lifted) it seems. It is probably a bad idea to > thrift insert into a sparse table until Cassandra does not have two > distinct sources of meta information. > > > > > > On Mon, Dec 17, 2012 at 9:52 AM, Vivek Mishra wrote: > >> > Looks like Thrift API is not working as expected? >> > >> > -Vivek >> > >> > >> > >> > >> > >> > From: Brian O'Neill >> > To: dev@cassandra.apache.org >> > Cc: Vivek Mishra >> > Sent: Monday, December 17, 2012 8:12 PM >> > Subject: Re: Compund/Composite column names >> > >> > FYI -- I'm still seeing this on 1.2-beta1. >> > >> > If you create a table via CQL, then insert into it (via Java API) with >> > an incorrect number of components. The insert works, but select * >> > from CQL results in a TSocket read error. >> > >> > I showed this in the webinar last week, just in case people ran into >> > it. It would be great to translate the ArrayIndexOutofBoundsException >> > from the server side into something meaningful in cqlsh to help people >> > diagnose the problem. (a regular user probably doesn't have access to >> > the server-side logs) >> > >> > You can see it at minute 41 in the video from the webinar: >> > http://www.youtube.com/watch?v=AdfugJxfd0o&feature=youtu.be >> > >> > -brian >> > >> > >> > On Tue, Oct 9, 2012 at 9:39 AM, Jonathan Ellis wrote: >>> > > Sounds like you're running into the keyspace drop bug. It's "mostly" >> > fixed >>> > > in 1.1.5 but you might need the latest from 1.1 branch. 1.1.6 will be >>> > > released soon with the final fix. >>> > > On Oct 9, 2012 1:58 AM, "Vivek Mishra" wrote: >>> > > >>>> > >> >>>> > >> >>>> > >> Ok. I am able to understand the problem now. Issue is: >>>> > >> >>>> > >> If i create a column family altercations as: >>>> > >> >>>> > >> >>>> > >> >> > >> * >> *8 >>>> > >> CREATE TABLE altercations ( >>>> > >>instigator text, >>>> > >>started_at timestamp, >>>> > >>ships_destroyed int, >>>> > >>energy_used float, >>>> > >>alliance_involvement boolean, >>>> > >>PRIMARY KEY (instigator,started_at,ships_destroyed) >>>> > >>); >>>> > >> / >>>> > >>INSERT INTO altercations (instigator, started_at, ships_destroyed, >>>> > >> energy_used, alliance_involvement) >>>> > >> VALUES ('Jayne Cobb', '2012-07-23', 2, 4.6, >> > '
Re: Compund/Composite column names
FYI -- I'm still seeing this on 1.2-beta1. If you create a table via CQL, then insert into it (via Java API) with an incorrect number of components. The insert works, but select * from CQL results in a TSocket read error. I showed this in the webinar last week, just in case people ran into it. It would be great to translate the ArrayIndexOutofBoundsException from the server side into something meaningful in cqlsh to help people diagnose the problem. (a regular user probably doesn't have access to the server-side logs) You can see it at minute 41 in the video from the webinar: http://www.youtube.com/watch?v=AdfugJxfd0o&feature=youtu.be -brian On Tue, Oct 9, 2012 at 9:39 AM, Jonathan Ellis wrote: > Sounds like you're running into the keyspace drop bug. It's "mostly" fixed > in 1.1.5 but you might need the latest from 1.1 branch. 1.1.6 will be > released soon with the final fix. > On Oct 9, 2012 1:58 AM, "Vivek Mishra" wrote: > >> >> >> Ok. I am able to understand the problem now. Issue is: >> >> If i create a column family altercations as: >> >> >> **8 >> CREATE TABLE altercations ( >>instigator text, >>started_at timestamp, >>ships_destroyed int, >>energy_used float, >>alliance_involvement boolean, >>PRIMARY KEY (instigator,started_at,ships_destroyed) >>); >> / >>INSERT INTO altercations (instigator, started_at, ships_destroyed, >> energy_used, alliance_involvement) >> VALUES ('Jayne Cobb', '2012-07-23', 2, 4.6, 'false'); >> >> * >> >> It works! >> >> But if i create a column family with compound primary key with 2 composite >> column as: >> >> >> * >> CREATE TABLE altercations ( >>instigator text, >>started_at timestamp, >>ships_destroyed int, >>energy_used float, >>alliance_involvement boolean, >>PRIMARY KEY (instigator,started_at) >>); >> >> >> * >> and Then drop this column family: >> >> >> * >> drop columnfamily altercations; >> >> * >> >> and then try to create same one with primary compound key with 3 composite >> column: >> >> >> * >> >> CREATE TABLE altercations ( >>instigator text, >>started_at timestamp, >>ships_destroyed int, >>energy_used float, >>alliance_involvement boolean, >>PRIMARY KEY (instigator,started_at,ships_destroyed) >>); >> >> * >> >> it gives me error: "TSocket read 0 bytes" >> >> Rest, as no column family is created, so nothing onwards will work. >> >> Is this an issue? >> >> -Vivek >> >> >> >> From: Jonathan Ellis >> To: dev@cassandra.apache.org; Vivek Mishra >> Sent: Tuesday, October 9, 2012 9:08 AM >> Subject: Re: Compund/Composite column names >> >> Works for me on latest 1.1 in cql3 mode. cql2 mode gives a parse error. >> >> On Mon, Oct 8, 2012 at 9:18 PM, Vivek Mishra >> wrote: >> > Hi All, >> > >> > I am trying to use compound primary key column name and i am referring >> to: >> > http://www.datastax.com/dev/blog/whats-new-in-cql-3-0 >> > >> > >> > As mentioned on this example, i tried to create a column family >> containing compound primary key (one or more) as: >> > >> > CREATE TABLE altercations ( >> >instigator text, >> >started_at timestamp, >> >ships_destroyed int, >> >energy_used float, >> >alliance_involvement boolean, >> >PRIMARY KEY (instigator,started_at,ships_destroyed) >> >); >> > >> > And i am getting: >> > >> > >> > ** >> > TSocket read 0 bytes >> > cqlsh:testcomp> >> > ** >> > >> > >> > Then followed by insert and select statements giving me following errors: >> > >> > >> >> > >> > cqlsh:testcomp>INSERT INTO altercations (instigator, started_at, >> ships_destroyed, >> >
Re: CQL/CLI Experiments w/ 1.2
Thanks for the explanation(s). I'm going to give a "Create your first java app for Cassandra" webinar on Wednesday, and I was trying to embrace schema creation in CQL, but didn't want to have to use CompositeType's right off the bat. (I'll go with compact storage) I think I can explain away the empty row/column, but we should probably publicize that. I can see that question coming up on every client/api user list. (hector, astyanax, etc.) -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive King of Prussia, PA 19406 M: 215.588.6024 @boneill42 <http://www.twitter.com/boneill42> healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. On 12/10/12 3:16 AM, "Sylvain Lebresne" wrote: >There is some more details in >http://www.datastax.com/dev/blog/thrift-to-cql3 but to answer your >questions: > > >> Question 1: >> What is the empty column/value? > > >The technical reasons are here: >https://issues.apache.org/jira/browse/CASSANDRA-4361. But basically, it's >a >CQL3 implementation detail. > >Question 2: >> It also appears as though the column names are CompositeType even >> though there is only one component: > > >Yes, it is the case, and the reason is that this is required to accept >collections (even if you don't use collection initially, not using a >composite means you wouldn't be able to add some later). If you explicitly >don't want a compositeType underneath, you'll need to use the 'WITH >COMPACT >STORAGE' option (in which case you will not be able to use collections >obviously). > >-- >Sylvain
CQL/CLI Experiments w/ 1.2
I'm using the following schema and data: CREATE TABLE children ( childId varchar, firstName varchar, lastName varchar, timezone varchar, PRIMARY KEY (childId ) ); insert into children (childId, firstName, lastName, timezone) values ('bart.simpson', 'Bart', 'Simpson', 'PST'); insert into children (childId, firstName, lastName, timezone) values ('dennis.menace', 'Dennis', 'Menace', 'PST'); All is well on the CQL side of things, but when I go over into CLI, I see the following: [default@northpole] list children; Using default limit of 100 Using default column limit of 100 --- RowKey: bart.simpson => (column=, value=, timestamp=1355116106465000) => (column=firstname, value=42617274, timestamp=1355116106465000) => (column=lastname, value=53696d70736f6e, timestamp=1355116106465000) => (column=timezone, value=505354, timestamp=1355116106465000) --- RowKey: dennis.menace => (column=, value=, timestamp=1355116106466000) => (column=firstname, value=44656e6e6973, timestamp=1355116106466000) => (column=lastname, value=4d656e616365, timestamp=1355116106466000) => (column=timezone, value=505354, timestamp=1355116106466000) Question 1: What is the empty column/value? I ask because it causes confusion/issues when accessing it from a Java API. (like Astyanax) That column and value are in the result set. Should clients start ignoring empty column names/values? Question 2: It also appears as though the column names are CompositeType even though there is only one component: (below is from CLI) Columns sorted by: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type) Because of that, I would need to use CompositeTypes in my java app to insert into the table. Is there any way to create a table via CQL3 that doesn't force me to use Composite types in my Java app? (In CQL2, we could specify comparators, but I don't see that in CQL3) -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://brianoneill.blogspot.com/ twitter: @boneill42
Re: TSocket read 0 bytes from cqlsh
Scratch that it can change on a per column basis. Strange world this Java API vs. CQL. -brian On Thu, Oct 4, 2012 at 3:57 PM, Brian O'Neill wrote: > Actually, I found the underlying issue... > > CQL appends the *name* of the "value" column into the compound key. > > Using the previous schema: > insert into data (uid, t, foo, bar) values ('PI7JC8KRF6', > '1349110576', 'foovalue', 'barvalue') > > list data; > RowKey: PI7JC8KRF6 > => (column=1970-01-16 09:45:10-0500:foovalue:bar, value=barvalue, > timestamp=1349380029082000) > > Notice "bar" is on the end of the column name. > > If you don't have that element represented from the Java API (in this > case, w/ Astyanax), you end up with misaligned interpretation of the > compound key. I'll add an extra element to the composite type in > Astyanax, which should fix things. I'll also add this to my blog so > other people don't get tripped up. > > Any insight into why CQL puts that in column name? > Where does it store the metadata related to compound key > interpretation? Wouldn't that be a better place for that since it > shouldn't change within a table? > > -brian > > > On Thu, Oct 4, 2012 at 3:39 PM, Brian O'Neill wrote: >> Perfect. Tnx. >> >> On Thu, Oct 4, 2012 at 3:37 PM, Jonathan Ellis wrote: >>> Oh, I see. I misunderstood at first. Yes, the thrift side in 1.1 >>> doesn't validate cql3 composites. This should be fixed in 1.2 beta1; >>> see >>> https://issues.apache.org/jira/browse/CASSANDRA-4377?focusedCommentId=13436817&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13436817 >>> >>> On Thu, Oct 4, 2012 at 2:31 PM, Brian O'Neill wrote: >>>> I was able to reproduce with CLI. I'll send over the example as soon >>>> as I can obfuscate it. >>>> >>>> -brian >>>> >>>> On Thu, Oct 4, 2012 at 3:19 PM, Jonathan Ellis wrote: >>>>> Nothing jumps out at me, varchar should be pretty straightforward. >>>>> Probably going to need a test case. (Even better if you can repro w/ >>>>> cli instead of needing Astyanax.) >>>>> >>>>> On Thu, Oct 4, 2012 at 2:15 PM, Brian O'Neill >>>>> wrote: >>>>>> Obfuscated slightly >>>>>> >>>>>> The table is something simliar to: >>>>>> >>>>>> CREATE TABLE data ( >>>>>> uid varchar, >>>>>> t timestamp, >>>>>> foo varchar, >>>>>> bar varchar, >>>>>> PRIMARY KEY (uid, t, foo, bar) >>>>>> ); >>>>>> >>>>>> Then I can insert just fine via Astyanax and I can see the row via >>>>>> cli, but the select statement fails in cqlsh. >>>>>> >>>>>> The table is fine, when I only interact with it through CQL. I can >>>>>> insert and select fine, until I insert a row from Asytanax. >>>>>> >>>>>> If needed, I can probably create a small test for this that I can share. >>>>>> >>>>>> -brian >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Oct 4, 2012 at 3:08 PM, Jonathan Ellis wrote: >>>>>>> What kind of data did you insert, and what was expected? Expected >>>>>>> behavior would be to reject nonconforming data at insert time. >>>>>>> >>>>>>> On Thu, Oct 4, 2012 at 2:04 PM, Brian O'Neill >>>>>>> wrote: >>>>>>>> This is probably already on your radar, but we could use a better >>>>>>>> error message from cqlsh when the column key doesn't conform to the >>>>>>>> expected schema... >>>>>>>> >>>>>>>> I accidentally inserted data using Astyanax that didn't conform to the >>>>>>>> schema. After that, selects from that table via cqlsh return no >>>>>>>> useful information. >>>>>>>> (CLI shows the data just fine) >>>>>>>> >>>>>>>> >>>>>>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli >>>>>>>> Connected to: "Test Cluster" on 127.0.0.1/9160 >>>>>>>> Welcome to Cassandra
Re: TSocket read 0 bytes from cqlsh
Actually, I found the underlying issue... CQL appends the *name* of the "value" column into the compound key. Using the previous schema: insert into data (uid, t, foo, bar) values ('PI7JC8KRF6', '1349110576', 'foovalue', 'barvalue') list data; RowKey: PI7JC8KRF6 => (column=1970-01-16 09:45:10-0500:foovalue:bar, value=barvalue, timestamp=1349380029082000) Notice "bar" is on the end of the column name. If you don't have that element represented from the Java API (in this case, w/ Astyanax), you end up with misaligned interpretation of the compound key. I'll add an extra element to the composite type in Astyanax, which should fix things. I'll also add this to my blog so other people don't get tripped up. Any insight into why CQL puts that in column name? Where does it store the metadata related to compound key interpretation? Wouldn't that be a better place for that since it shouldn't change within a table? -brian On Thu, Oct 4, 2012 at 3:39 PM, Brian O'Neill wrote: > Perfect. Tnx. > > On Thu, Oct 4, 2012 at 3:37 PM, Jonathan Ellis wrote: >> Oh, I see. I misunderstood at first. Yes, the thrift side in 1.1 >> doesn't validate cql3 composites. This should be fixed in 1.2 beta1; >> see >> https://issues.apache.org/jira/browse/CASSANDRA-4377?focusedCommentId=13436817&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13436817 >> >> On Thu, Oct 4, 2012 at 2:31 PM, Brian O'Neill wrote: >>> I was able to reproduce with CLI. I'll send over the example as soon >>> as I can obfuscate it. >>> >>> -brian >>> >>> On Thu, Oct 4, 2012 at 3:19 PM, Jonathan Ellis wrote: >>>> Nothing jumps out at me, varchar should be pretty straightforward. >>>> Probably going to need a test case. (Even better if you can repro w/ >>>> cli instead of needing Astyanax.) >>>> >>>> On Thu, Oct 4, 2012 at 2:15 PM, Brian O'Neill >>>> wrote: >>>>> Obfuscated slightly >>>>> >>>>> The table is something simliar to: >>>>> >>>>> CREATE TABLE data ( >>>>> uid varchar, >>>>> t timestamp, >>>>> foo varchar, >>>>> bar varchar, >>>>> PRIMARY KEY (uid, t, foo, bar) >>>>> ); >>>>> >>>>> Then I can insert just fine via Astyanax and I can see the row via >>>>> cli, but the select statement fails in cqlsh. >>>>> >>>>> The table is fine, when I only interact with it through CQL. I can >>>>> insert and select fine, until I insert a row from Asytanax. >>>>> >>>>> If needed, I can probably create a small test for this that I can share. >>>>> >>>>> -brian >>>>> >>>>> >>>>> >>>>> On Thu, Oct 4, 2012 at 3:08 PM, Jonathan Ellis wrote: >>>>>> What kind of data did you insert, and what was expected? Expected >>>>>> behavior would be to reject nonconforming data at insert time. >>>>>> >>>>>> On Thu, Oct 4, 2012 at 2:04 PM, Brian O'Neill >>>>>> wrote: >>>>>>> This is probably already on your radar, but we could use a better >>>>>>> error message from cqlsh when the column key doesn't conform to the >>>>>>> expected schema... >>>>>>> >>>>>>> I accidentally inserted data using Astyanax that didn't conform to the >>>>>>> schema. After that, selects from that table via cqlsh return no >>>>>>> useful information. >>>>>>> (CLI shows the data just fine) >>>>>>> >>>>>>> >>>>>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli >>>>>>> Connected to: "Test Cluster" on 127.0.0.1/9160 >>>>>>> Welcome to Cassandra CLI version 1.1.5 >>>>>>> >>>>>>> Type 'help;' or '?' for help. >>>>>>> Type 'quit;' or 'exit;' to quit. >>>>>>> >>>>>>> [default@unknown] use cirrus; >>>>>>> Authenticated to keyspace: cirrus >>>>>>> [default@cirrus] list data; >>>>>>> Using default limit of 100 >>>>>>> Using default column limit of 100 &
Re: TSocket read 0 bytes from cqlsh
Perfect. Tnx. On Thu, Oct 4, 2012 at 3:37 PM, Jonathan Ellis wrote: > Oh, I see. I misunderstood at first. Yes, the thrift side in 1.1 > doesn't validate cql3 composites. This should be fixed in 1.2 beta1; > see > https://issues.apache.org/jira/browse/CASSANDRA-4377?focusedCommentId=13436817&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13436817 > > On Thu, Oct 4, 2012 at 2:31 PM, Brian O'Neill wrote: >> I was able to reproduce with CLI. I'll send over the example as soon >> as I can obfuscate it. >> >> -brian >> >> On Thu, Oct 4, 2012 at 3:19 PM, Jonathan Ellis wrote: >>> Nothing jumps out at me, varchar should be pretty straightforward. >>> Probably going to need a test case. (Even better if you can repro w/ >>> cli instead of needing Astyanax.) >>> >>> On Thu, Oct 4, 2012 at 2:15 PM, Brian O'Neill wrote: >>>> Obfuscated slightly >>>> >>>> The table is something simliar to: >>>> >>>> CREATE TABLE data ( >>>> uid varchar, >>>> t timestamp, >>>> foo varchar, >>>> bar varchar, >>>> PRIMARY KEY (uid, t, foo, bar) >>>> ); >>>> >>>> Then I can insert just fine via Astyanax and I can see the row via >>>> cli, but the select statement fails in cqlsh. >>>> >>>> The table is fine, when I only interact with it through CQL. I can >>>> insert and select fine, until I insert a row from Asytanax. >>>> >>>> If needed, I can probably create a small test for this that I can share. >>>> >>>> -brian >>>> >>>> >>>> >>>> On Thu, Oct 4, 2012 at 3:08 PM, Jonathan Ellis wrote: >>>>> What kind of data did you insert, and what was expected? Expected >>>>> behavior would be to reject nonconforming data at insert time. >>>>> >>>>> On Thu, Oct 4, 2012 at 2:04 PM, Brian O'Neill >>>>> wrote: >>>>>> This is probably already on your radar, but we could use a better >>>>>> error message from cqlsh when the column key doesn't conform to the >>>>>> expected schema... >>>>>> >>>>>> I accidentally inserted data using Astyanax that didn't conform to the >>>>>> schema. After that, selects from that table via cqlsh return no >>>>>> useful information. >>>>>> (CLI shows the data just fine) >>>>>> >>>>>> >>>>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli >>>>>> Connected to: "Test Cluster" on 127.0.0.1/9160 >>>>>> Welcome to Cassandra CLI version 1.1.5 >>>>>> >>>>>> Type 'help;' or '?' for help. >>>>>> Type 'quit;' or 'exit;' to quit. >>>>>> >>>>>> [default@unknown] use cirrus; >>>>>> Authenticated to keyspace: cirrus >>>>>> [default@cirrus] list data; >>>>>> Using default limit of 100 >>>>>> Using default column limit of 100 >>>>>> --- >>>>>> RowKey: PI7JC8 >>>>>> => (column=*, value=2014-07-31, timestamp=1349376866686000) >>>>>> --- >>>>>> RowKey: PI1234 >>>>>> => (column=*, value=Y, timestamp=1349372660453000) >>>>>> >>>>>> 2 Rows Returned. >>>>>> Elapsed time: 212 msec(s). >>>>>> [default@cirrus] quit; >>>>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cqlsh -3 >>>>>> Connected to Test Cluster at localhost:9160. >>>>>> [cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 3.0.0 | Thrift protocol >>>>>> 19.32.0] >>>>>> Use HELP for help. >>>>>> cqlsh> use cirrus; >>>>>> cqlsh:cirrus> select * from data; >>>>>> TSocket read 0 bytes >>>>>> cqlsh:cirrus> >>>>>> >>>>>> -- >>>>>> Brian ONeill >>>>>> Lead Architect, Health Market Science (http://healthmarketscience.com) >>>>>> mobile:215.588.6024 >>>>>> blog: http://brianoneill.blogspot.com/ >>>>>> twitter: @boneill42 >>>>> >>>>> >>>>> >>>>> -- >>>>> Jonathan Ellis >>>>> Project Chair, Apache Cassandra >>>>> co-founder of DataStax, the source for professional Cassandra support >>>>> http://www.datastax.com >>>> >>>> >>>> >>>> -- >>>> Brian ONeill >>>> Lead Architect, Health Market Science (http://healthmarketscience.com) >>>> >>>> mobile:215.588.6024 >>>> blog: http://brianoneill.blogspot.com/ >>>> twitter: @boneill42 >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >> >> >> >> -- >> Brian ONeill >> Lead Architect, Health Market Science (http://healthmarketscience.com) >> >> mobile:215.588.6024 >> blog: http://brianoneill.blogspot.com/ >> twitter: @boneill42 > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://brianoneill.blogspot.com/ twitter: @boneill42
Re: TSocket read 0 bytes from cqlsh
Here you go... // // IN CQLSH // CREATE KEYSPACE cirrus WITH strategy_class = 'NetworkTopologyStrategy' AND strategy_options:datacenter1 = '1'; use cirrus; CREATE TABLE data ( uid varchar, t timestamp, foo varchar, bar varchar, PRIMARY KEY (uid, t, foo) ); // // Then in CLI // use cirrus; set data['PI7JC8KRF6']['1349110576']='2014-07-31'; list data; // Note, I intentially didn't supply a value for "foo" in the primary key definition. // Listing works. // // Then in CLI // select * from data; // The result is... cqlsh:cirrus> select * from data; TSocket read 0 bytes On Thu, Oct 4, 2012 at 3:31 PM, Brian O'Neill wrote: > I was able to reproduce with CLI. I'll send over the example as soon > as I can obfuscate it. > > -brian > > On Thu, Oct 4, 2012 at 3:19 PM, Jonathan Ellis wrote: >> Nothing jumps out at me, varchar should be pretty straightforward. >> Probably going to need a test case. (Even better if you can repro w/ >> cli instead of needing Astyanax.) >> >> On Thu, Oct 4, 2012 at 2:15 PM, Brian O'Neill wrote: >>> Obfuscated slightly >>> >>> The table is something simliar to: >>> >>> CREATE TABLE data ( >>> uid varchar, >>> t timestamp, >>> foo varchar, >>> bar varchar, >>> PRIMARY KEY (uid, t, foo, bar) >>> ); >>> >>> Then I can insert just fine via Astyanax and I can see the row via >>> cli, but the select statement fails in cqlsh. >>> >>> The table is fine, when I only interact with it through CQL. I can >>> insert and select fine, until I insert a row from Asytanax. >>> >>> If needed, I can probably create a small test for this that I can share. >>> >>> -brian >>> >>> >>> >>> On Thu, Oct 4, 2012 at 3:08 PM, Jonathan Ellis wrote: >>>> What kind of data did you insert, and what was expected? Expected >>>> behavior would be to reject nonconforming data at insert time. >>>> >>>> On Thu, Oct 4, 2012 at 2:04 PM, Brian O'Neill >>>> wrote: >>>>> This is probably already on your radar, but we could use a better >>>>> error message from cqlsh when the column key doesn't conform to the >>>>> expected schema... >>>>> >>>>> I accidentally inserted data using Astyanax that didn't conform to the >>>>> schema. After that, selects from that table via cqlsh return no >>>>> useful information. >>>>> (CLI shows the data just fine) >>>>> >>>>> >>>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli >>>>> Connected to: "Test Cluster" on 127.0.0.1/9160 >>>>> Welcome to Cassandra CLI version 1.1.5 >>>>> >>>>> Type 'help;' or '?' for help. >>>>> Type 'quit;' or 'exit;' to quit. >>>>> >>>>> [default@unknown] use cirrus; >>>>> Authenticated to keyspace: cirrus >>>>> [default@cirrus] list data; >>>>> Using default limit of 100 >>>>> Using default column limit of 100 >>>>> --- >>>>> RowKey: PI7JC8 >>>>> => (column=*, value=2014-07-31, timestamp=1349376866686000) >>>>> --- >>>>> RowKey: PI1234 >>>>> => (column=*, value=Y, timestamp=1349372660453000) >>>>> >>>>> 2 Rows Returned. >>>>> Elapsed time: 212 msec(s). >>>>> [default@cirrus] quit; >>>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cqlsh -3 >>>>> Connected to Test Cluster at localhost:9160. >>>>> [cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 3.0.0 | Thrift protocol 19.32.0] >>>>> Use HELP for help. >>>>> cqlsh> use cirrus; >>>>> cqlsh:cirrus> select * from data; >>>>> TSocket read 0 bytes >>>>> cqlsh:cirrus> >>>>> >>>>> -- >>>>> Brian ONeill >>>>> Lead Architect, Health Market Science (http://healthmarketscience.com) >>>>> mobile:215.588.6024 >>>>> blog: http://brianoneill.blogspot.com/ >>>>> twitter: @boneill42 >>>> >>>> >>>> >>>> -- >>>> Jonathan Ellis >>>> Project Chair, Apache Cassandra >>>> co-founder of DataStax, the source for professional Cassandra support >>>> http://www.datastax.com >>> >>> >>> >>> -- >>> Brian ONeill >>> Lead Architect, Health Market Science (http://healthmarketscience.com) >>> >>> mobile:215.588.6024 >>> blog: http://brianoneill.blogspot.com/ >>> twitter: @boneill42 >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com > > > > -- > Brian ONeill > Lead Architect, Health Market Science (http://healthmarketscience.com) > > mobile:215.588.6024 > blog: http://brianoneill.blogspot.com/ > twitter: @boneill42 -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://brianoneill.blogspot.com/ twitter: @boneill42
Re: TSocket read 0 bytes from cqlsh
I was able to reproduce with CLI. I'll send over the example as soon as I can obfuscate it. -brian On Thu, Oct 4, 2012 at 3:19 PM, Jonathan Ellis wrote: > Nothing jumps out at me, varchar should be pretty straightforward. > Probably going to need a test case. (Even better if you can repro w/ > cli instead of needing Astyanax.) > > On Thu, Oct 4, 2012 at 2:15 PM, Brian O'Neill wrote: >> Obfuscated slightly >> >> The table is something simliar to: >> >> CREATE TABLE data ( >> uid varchar, >> t timestamp, >> foo varchar, >> bar varchar, >> PRIMARY KEY (uid, t, foo, bar) >> ); >> >> Then I can insert just fine via Astyanax and I can see the row via >> cli, but the select statement fails in cqlsh. >> >> The table is fine, when I only interact with it through CQL. I can >> insert and select fine, until I insert a row from Asytanax. >> >> If needed, I can probably create a small test for this that I can share. >> >> -brian >> >> >> >> On Thu, Oct 4, 2012 at 3:08 PM, Jonathan Ellis wrote: >>> What kind of data did you insert, and what was expected? Expected >>> behavior would be to reject nonconforming data at insert time. >>> >>> On Thu, Oct 4, 2012 at 2:04 PM, Brian O'Neill wrote: >>>> This is probably already on your radar, but we could use a better >>>> error message from cqlsh when the column key doesn't conform to the >>>> expected schema... >>>> >>>> I accidentally inserted data using Astyanax that didn't conform to the >>>> schema. After that, selects from that table via cqlsh return no >>>> useful information. >>>> (CLI shows the data just fine) >>>> >>>> >>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli >>>> Connected to: "Test Cluster" on 127.0.0.1/9160 >>>> Welcome to Cassandra CLI version 1.1.5 >>>> >>>> Type 'help;' or '?' for help. >>>> Type 'quit;' or 'exit;' to quit. >>>> >>>> [default@unknown] use cirrus; >>>> Authenticated to keyspace: cirrus >>>> [default@cirrus] list data; >>>> Using default limit of 100 >>>> Using default column limit of 100 >>>> --- >>>> RowKey: PI7JC8 >>>> => (column=*, value=2014-07-31, timestamp=1349376866686000) >>>> --- >>>> RowKey: PI1234 >>>> => (column=*, value=Y, timestamp=1349372660453000) >>>> >>>> 2 Rows Returned. >>>> Elapsed time: 212 msec(s). >>>> [default@cirrus] quit; >>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cqlsh -3 >>>> Connected to Test Cluster at localhost:9160. >>>> [cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 3.0.0 | Thrift protocol 19.32.0] >>>> Use HELP for help. >>>> cqlsh> use cirrus; >>>> cqlsh:cirrus> select * from data; >>>> TSocket read 0 bytes >>>> cqlsh:cirrus> >>>> >>>> -- >>>> Brian ONeill >>>> Lead Architect, Health Market Science (http://healthmarketscience.com) >>>> mobile:215.588.6024 >>>> blog: http://brianoneill.blogspot.com/ >>>> twitter: @boneill42 >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >> >> >> >> -- >> Brian ONeill >> Lead Architect, Health Market Science (http://healthmarketscience.com) >> >> mobile:215.588.6024 >> blog: http://brianoneill.blogspot.com/ >> twitter: @boneill42 > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://brianoneill.blogspot.com/ twitter: @boneill42
Re: TSocket read 0 bytes from cqlsh
>From this, I assume I inserted the wrong number of values into the compound key from Astyanax. It would be nice to carry this error across to the CQL client. -brian On Thu, Oct 4, 2012 at 3:17 PM, Brian O'Neill wrote: > Here you go... > > ERROR 14:57:37,270 Error occurred during processing of message. > java.lang.ArrayIndexOutOfBoundsException: 4 > at > org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:773) > at > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:137) > at > org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:108) > at > org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:121) > at > org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1237) > at > org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3542) > at > org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3530) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:680) > > > On Thu, Oct 4, 2012 at 3:15 PM, Brian O'Neill wrote: >> Obfuscated slightly >> >> The table is something simliar to: >> >> CREATE TABLE data ( >> uid varchar, >> t timestamp, >> foo varchar, >> bar varchar, >> PRIMARY KEY (uid, t, foo, bar) >> ); >> >> Then I can insert just fine via Astyanax and I can see the row via >> cli, but the select statement fails in cqlsh. >> >> The table is fine, when I only interact with it through CQL. I can >> insert and select fine, until I insert a row from Asytanax. >> >> If needed, I can probably create a small test for this that I can share. >> >> -brian >> >> >> >> On Thu, Oct 4, 2012 at 3:08 PM, Jonathan Ellis wrote: >>> What kind of data did you insert, and what was expected? Expected >>> behavior would be to reject nonconforming data at insert time. >>> >>> On Thu, Oct 4, 2012 at 2:04 PM, Brian O'Neill wrote: >>>> This is probably already on your radar, but we could use a better >>>> error message from cqlsh when the column key doesn't conform to the >>>> expected schema... >>>> >>>> I accidentally inserted data using Astyanax that didn't conform to the >>>> schema. After that, selects from that table via cqlsh return no >>>> useful information. >>>> (CLI shows the data just fine) >>>> >>>> >>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli >>>> Connected to: "Test Cluster" on 127.0.0.1/9160 >>>> Welcome to Cassandra CLI version 1.1.5 >>>> >>>> Type 'help;' or '?' for help. >>>> Type 'quit;' or 'exit;' to quit. >>>> >>>> [default@unknown] use cirrus; >>>> Authenticated to keyspace: cirrus >>>> [default@cirrus] list data; >>>> Using default limit of 100 >>>> Using default column limit of 100 >>>> --- >>>> RowKey: PI7JC8 >>>> => (column=*, value=2014-07-31, timestamp=1349376866686000) >>>> --- >>>> RowKey: PI1234 >>>> => (column=*, value=Y, timestamp=1349372660453000) >>>> >>>> 2 Rows Returned. >>>> Elapsed time: 212 msec(s). >>>> [default@cirrus] quit; >>>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cqlsh -3 >>>> Connected to Test Cluster at localhost:9160. >>>> [cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 3.0.0 | Thrift protocol 19.32.0] >>>> Use HELP for help. >>>> cqlsh> use cirrus; >>>> cqlsh:cirrus> select * from data; >>>> TSocket read 0 bytes >>>> cqlsh:cirrus> >>>> >>>> -- >>>> Brian ONeill >>>> Lead Architect, Health Market Science (http://healthmarketscience.com) >>>> mobile:215.588.6024 >>>> blog: http://brianoneill.blogspot.com/ >>>> twitter: @boneill42 >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >> >> >> >> -- >> Brian ONeill >> Lead Architect, Health Market Science (http://healthmarketscience.com) >> >> mobile:215.588.6024 >> blog: http://brianoneill.blogspot.com/ >> twitter: @boneill42 > > > > -- > Brian ONeill > Lead Architect, Health Market Science (http://healthmarketscience.com) > > mobile:215.588.6024 > blog: http://brianoneill.blogspot.com/ > twitter: @boneill42 -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://brianoneill.blogspot.com/ twitter: @boneill42
Re: TSocket read 0 bytes from cqlsh
Here you go... ERROR 14:57:37,270 Error occurred during processing of message. java.lang.ArrayIndexOutOfBoundsException: 4 at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:773) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:137) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:108) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:121) at org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1237) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3542) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3530) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) On Thu, Oct 4, 2012 at 3:15 PM, Brian O'Neill wrote: > Obfuscated slightly > > The table is something simliar to: > > CREATE TABLE data ( > uid varchar, > t timestamp, > foo varchar, > bar varchar, > PRIMARY KEY (uid, t, foo, bar) > ); > > Then I can insert just fine via Astyanax and I can see the row via > cli, but the select statement fails in cqlsh. > > The table is fine, when I only interact with it through CQL. I can > insert and select fine, until I insert a row from Asytanax. > > If needed, I can probably create a small test for this that I can share. > > -brian > > > > On Thu, Oct 4, 2012 at 3:08 PM, Jonathan Ellis wrote: >> What kind of data did you insert, and what was expected? Expected >> behavior would be to reject nonconforming data at insert time. >> >> On Thu, Oct 4, 2012 at 2:04 PM, Brian O'Neill wrote: >>> This is probably already on your radar, but we could use a better >>> error message from cqlsh when the column key doesn't conform to the >>> expected schema... >>> >>> I accidentally inserted data using Astyanax that didn't conform to the >>> schema. After that, selects from that table via cqlsh return no >>> useful information. >>> (CLI shows the data just fine) >>> >>> >>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli >>> Connected to: "Test Cluster" on 127.0.0.1/9160 >>> Welcome to Cassandra CLI version 1.1.5 >>> >>> Type 'help;' or '?' for help. >>> Type 'quit;' or 'exit;' to quit. >>> >>> [default@unknown] use cirrus; >>> Authenticated to keyspace: cirrus >>> [default@cirrus] list data; >>> Using default limit of 100 >>> Using default column limit of 100 >>> --- >>> RowKey: PI7JC8 >>> => (column=*, value=2014-07-31, timestamp=1349376866686000) >>> --- >>> RowKey: PI1234 >>> => (column=*, value=Y, timestamp=1349372660453000) >>> >>> 2 Rows Returned. >>> Elapsed time: 212 msec(s). >>> [default@cirrus] quit; >>> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cqlsh -3 >>> Connected to Test Cluster at localhost:9160. >>> [cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 3.0.0 | Thrift protocol 19.32.0] >>> Use HELP for help. >>> cqlsh> use cirrus; >>> cqlsh:cirrus> select * from data; >>> TSocket read 0 bytes >>> cqlsh:cirrus> >>> >>> -- >>> Brian ONeill >>> Lead Architect, Health Market Science (http://healthmarketscience.com) >>> mobile:215.588.6024 >>> blog: http://brianoneill.blogspot.com/ >>> twitter: @boneill42 >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com > > > > -- > Brian ONeill > Lead Architect, Health Market Science (http://healthmarketscience.com) > > mobile:215.588.6024 > blog: http://brianoneill.blogspot.com/ > twitter: @boneill42 -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://brianoneill.blogspot.com/ twitter: @boneill42
Re: TSocket read 0 bytes from cqlsh
Obfuscated slightly The table is something simliar to: CREATE TABLE data ( uid varchar, t timestamp, foo varchar, bar varchar, PRIMARY KEY (uid, t, foo, bar) ); Then I can insert just fine via Astyanax and I can see the row via cli, but the select statement fails in cqlsh. The table is fine, when I only interact with it through CQL. I can insert and select fine, until I insert a row from Asytanax. If needed, I can probably create a small test for this that I can share. -brian On Thu, Oct 4, 2012 at 3:08 PM, Jonathan Ellis wrote: > What kind of data did you insert, and what was expected? Expected > behavior would be to reject nonconforming data at insert time. > > On Thu, Oct 4, 2012 at 2:04 PM, Brian O'Neill wrote: >> This is probably already on your radar, but we could use a better >> error message from cqlsh when the column key doesn't conform to the >> expected schema... >> >> I accidentally inserted data using Astyanax that didn't conform to the >> schema. After that, selects from that table via cqlsh return no >> useful information. >> (CLI shows the data just fine) >> >> >> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli >> Connected to: "Test Cluster" on 127.0.0.1/9160 >> Welcome to Cassandra CLI version 1.1.5 >> >> Type 'help;' or '?' for help. >> Type 'quit;' or 'exit;' to quit. >> >> [default@unknown] use cirrus; >> Authenticated to keyspace: cirrus >> [default@cirrus] list data; >> Using default limit of 100 >> Using default column limit of 100 >> --- >> RowKey: PI7JC8 >> => (column=*, value=2014-07-31, timestamp=1349376866686000) >> --- >> RowKey: PI1234 >> => (column=*, value=Y, timestamp=1349372660453000) >> >> 2 Rows Returned. >> Elapsed time: 212 msec(s). >> [default@cirrus] quit; >> bone@boneill-macbook-wired:~/tools/cassandra-> bin/cqlsh -3 >> Connected to Test Cluster at localhost:9160. >> [cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 3.0.0 | Thrift protocol 19.32.0] >> Use HELP for help. >> cqlsh> use cirrus; >> cqlsh:cirrus> select * from data; >> TSocket read 0 bytes >> cqlsh:cirrus> >> >> -- >> Brian ONeill >> Lead Architect, Health Market Science (http://healthmarketscience.com) >> mobile:215.588.6024 >> blog: http://brianoneill.blogspot.com/ >> twitter: @boneill42 > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://brianoneill.blogspot.com/ twitter: @boneill42
TSocket read 0 bytes from cqlsh
This is probably already on your radar, but we could use a better error message from cqlsh when the column key doesn't conform to the expected schema... I accidentally inserted data using Astyanax that didn't conform to the schema. After that, selects from that table via cqlsh return no useful information. (CLI shows the data just fine) bone@boneill-macbook-wired:~/tools/cassandra-> bin/cassandra-cli Connected to: "Test Cluster" on 127.0.0.1/9160 Welcome to Cassandra CLI version 1.1.5 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] use cirrus; Authenticated to keyspace: cirrus [default@cirrus] list data; Using default limit of 100 Using default column limit of 100 --- RowKey: PI7JC8 => (column=*, value=2014-07-31, timestamp=1349376866686000) --- RowKey: PI1234 => (column=*, value=Y, timestamp=1349372660453000) 2 Rows Returned. Elapsed time: 212 msec(s). [default@cirrus] quit; bone@boneill-macbook-wired:~/tools/cassandra-> bin/cqlsh -3 Connected to Test Cluster at localhost:9160. [cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 3.0.0 | Thrift protocol 19.32.0] Use HELP for help. cqlsh> use cirrus; cqlsh:cirrus> select * from data; TSocket read 0 bytes cqlsh:cirrus> -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://brianoneill.blogspot.com/ twitter: @boneill42
Re: Document storage
Just following up on this age-old thread because we've recently done some development Ben, we recently had the exact need you outline. We are storing JSON documents in Cassandra. We needed to index based on a field in the JSON. We ended up extending our cassandra-indexing code to accomodate this. https://github.com/hmsonline/cassandra-indexing You can now configure the indexing to accomodate a field within the JSON document. We're going to update the wiki to make this more usable, but it triggered the same kind of debate/thought process on this thread. In the coming weeks/months, we'll probably consider a switch to protobuf with an update to our indexing code to understand the internal structure of documents stored in Cassandra. just an update for now, brian On Fri, Mar 30, 2012 at 1:33 PM, Ben McCann wrote: > > > > If you don't need selected updates and having something as compact as > > possible on disk make a important difference for you, sure, do use blobs. > > The only argument is that you can already do that without any change to > > the core. > > > The thing that we can't do today without changes to the core is index on > subparts of some document format like Protobuf/JSON/etc. If cassandra were > to understand one of these formats, it could remove the need for manual > management of an index. > > > On Fri, Mar 30, 2012 at 10:23 AM, Sylvain Lebresne >wrote: > > > On Fri, Mar 30, 2012 at 6:01 PM, Daniel Doubleday > > wrote: > > > But decomposing into columns will lead to more of that: > > > > > > - Total amount of serialized data is (in most cases a lot) larger than > > protobuffed / compressed version > > > > At least with sstable compression, I would expect the difference to > > not be too big in practice. > > > > > - If you do selective updates the document will be scattered over > > multiple ssts plus if you do sliced reads you can't optimize reads as > > opposed to the single column version that when updated is automatically > > superseding older versions so most reads will hit only one sst > > > > But if you need to do selective updates, then a blob just doesn't work > > so that comparison is moot. > > > > Now I don't think anyone pretended that you should never use blobs > > (whether that's protobuffed, jsoned, ...). If you don't need selected > > updates and having something as compact as possible on disk make a > > important difference for you, sure, do use blobs. The only argument is > > that you can already do that without any change to the core. What we > > are saying is that for the case where you care more about schema > > flexibility (being able to do selective updates, to index on some > > subpart, etc...) then we think that something like the map and list > > idea of CASSANDRA-3647 will probably be a more natural fit to the > > current CQL API. > > > > -- > > Sylvain > > > > > > > > All these reads make the hot dataset. If it fits the page cache your > > fine. If it doesn't you need to buy more iron. > > > > > > Really could not resist because your statement seems to be contrary to > > all our tests / learnings. > > > > > > Cheers, > > > Daniel > > > > > > From dev list: > > > > > > Re: Document storage > > > On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian > > wrote: > > >>> I think this is a much better approach because that gives you the > > >>> ability to update or retrieve just parts of objects efficiently, > > >>> rather than making column values just blobs with a bunch of special > > >>> case logic to introspect them. Which feels like a big step backwards > > >>> to me. > > >> > > >> Unless your access pattern involves reading/writing the whole document > > each time. In > > > that case you're better off serializing the whole document and storing > > it in a column as a > > > byte[] without incurring the overhead of column indexes. Right? > > > > > > Hmm, not sure what you're thinking of there. > > > > > > If you mean the "index" that's part of the row header for random > > > access within a row, then no, serializing to byte[] doesn't save you > > > anything. > > > > > > If you mean secondary indexes, don't declare any if you don't want any. > > :) > > > > > > Just telling C* to store a byte[] *will* be slightly lighter-weight > > > than giving it named columns, but we're talking negligible compared to > > > the overhead of actually moving the data on or off disk in the first > > > place. Not even close to being worth giving up being able to deal > > > with your data from standard tools like cqlsh, IMO. > > > > > > -- > > > Jonathan Ellis > > > Project Chair, Apache Cassandra > > > co-founder of DataStax, the source for professional Cassandra support > > > http://www.datastax.com > > > > > > -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Re: Server Side Logic/Script - Triggers / StoreProc
Praveen, We are certainly interested. To get things moving we implemented an add-on for Cassandra to demonstrate the viability (using AOP): https://github.com/hmsonline/cassandra-triggers Right now the implementation executes triggers asynchronously, allowing you to implement a java interface and plugin your own java class that will get called for every insert. Per the discussion on 1311, we intend to extend our proof of concept to be able to invoke scripts as well. (minimally we'll enable javascript, but we'll probably allow for ruby and groovy as well) -brian On Apr 22, 2012, at 12:23 PM, Praveen Baratam wrote: > I found that Triggers are coming in Cassandra 1.2 > (https://issues.apache.org/jira/browse/CASSANDRA-1311) but no mention of any > StoreProc like pattern. > > I know this has been discussed so many times but never met with any > initiative. Even Groovy was staged out of the trunk. > > Cassandra is great for logging and as such will be infinitely more useful if > some logic can be pushed into the Cassandra cluster nearer to the location of > Data to generate a materialized view useful for applications. > > Server Side Scripts/Routines in Distributed Databases could soon prove to be > the differentiating factor. > > Let me reiterate things with a use case. > > In our application we store time series data in wide rows with TTL set on > each point to prevent data from growing beyond acceptable limits. Still the > data size can be a limiting factor to move all of it from the cluster node to > the querying node and then to the application via thrift for processing and > presentation. > > Ideally we should process the data on the residing node and pass only the > materialized view of the data upstream. This should be trivial if Cassandra > implements some sort of server side scripting and CQL semantics to call it. > > Is anybody else interested in a similar feature? Is it being worked on? Are > there any alternative strategies to this problem? > > Praveen > > -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
kudos...
I just wanted to let you guys know that I gave you a shout out... http://brianoneill.blogspot.com/2012/04/cassandra-vs-couchdb-mongodb-riak-hbase.html thanks for all the support, brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Re: Document storage
Do we also need to consider the client API? If we don't adjust thrift, the client just gets bytes right? The client is on their own to marshal back into a structure. In this case, it seems like we would want to chose a standard that is efficient and for which there are common libraries. Protobuf seems to fit the bill here. Or do we pass back some other structure? (Native lists/maps? JSON strings?) Do we ignore sorting/comparators? (similar to SOLR, I'm not sure people have defined a good sort for multi-valued items) -brian ---- Brian O'Neill Lead Architect, Software Development Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406 p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/ On 3/30/12 12:01 PM, "Daniel Doubleday" wrote: >> Just telling C* to store a byte[] *will* be slightly lighter-weight >> than giving it named columns, but we're talking negligible compared to >> the overhead of actually moving the data on or off disk in the first >> place. >Hm - but isn't this exactly the point? You don't want to move data off >disk. >But decomposing into columns will lead to more of that: > >- Total amount of serialized data is (in most cases a lot) larger than >protobuffed / compressed version >- If you do selective updates the document will be scattered over >multiple ssts plus if you do sliced reads you can't optimize reads as >opposed to the single column version that when updated is automatically >superseding older versions so most reads will hit only one sst > >All these reads make the hot dataset. If it fits the page cache your >fine. If it doesn't you need to buy more iron. > >Really could not resist because your statement seems to be contrary to >all our tests / learnings. > >Cheers, >Daniel > >From dev list: > >Re: Document storage >On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian wrote: >>> I think this is a much better approach because that gives you the >>> ability to update or retrieve just parts of objects efficiently, >>> rather than making column values just blobs with a bunch of special >>> case logic to introspect them. Which feels like a big step backwards >>> to me. >> >> Unless your access pattern involves reading/writing the whole document >>each time. In >that case you're better off serializing the whole document and storing it >in a column as a >byte[] without incurring the overhead of column indexes. Right? > >Hmm, not sure what you're thinking of there. > >If you mean the "index" that's part of the row header for random >access within a row, then no, serializing to byte[] doesn't save you >anything. > >If you mean secondary indexes, don't declare any if you don't want any. :) > >Just telling C* to store a byte[] *will* be slightly lighter-weight >than giving it named columns, but we're talking negligible compared to >the overhead of actually moving the data on or off disk in the first >place. Not even close to being worth giving up being able to deal >with your data from standard tools like cqlsh, IMO. > >-- >Jonathan Ellis >Project Chair, Apache Cassandra >co-founder of DataStax, the source for professional Cassandra support >http://www.datastax.com >
Re: Document storage
Jonathan, We store JSON as our column values. I'd love to see support for maps and lists. If I get some time this weekend, I'll take a look to see what is required. I doesn't seem like it would be that hard. -brian Brian O'Neill Lead Architect, Software Development Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406 p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/ On 3/29/12 3:18 PM, "Jonathan Ellis" wrote: >On Thu, Mar 29, 2012 at 2:06 PM, Ben McCann wrote: >> As far as I can tell, Cassandra >> doesn't support maps and lists in a standardized way today, which is the >> root of my problem. > >I'm pretty serious about adding those for 1.2, for what that's worth. >(If you want to jump in and help code that up, so much the better.) > >-- >Jonathan Ellis >Project Chair, Apache Cassandra >co-founder of DataStax, the source for professional Cassandra support >http://www.datastax.com
Re: Document storage
Jonathan, I was actually going to take this up with Nate McCall a few weeks back. I think it might make sense to get the client development community together (Netflix w/ Astyanax, Hector, Pycassa, Virgil, etc.) I agree whole-heartedly that it shouldn't go into the database for all the reasons you point out. If we can all decide on some standards for data storage (e.g. composite types), indexing strategies, etc. We can provide higher-level functions through the client libraries and also provide interoperability between them. (without bloating Cassandra) CCing Nate. Nate, thoughts? I wouldn't mind coordinating/facilitating the conversation. If we know who should be involved. -brian ---- Brian O'Neill Lead Architect, Software Development Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406 p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/ On 3/29/12 3:06 PM, "Ben McCann" wrote: >Jonathan, I asked Brian about his REST >API<https://groups.google.com/forum/?fromgroups#!topic/virgil-users/oncBas >9C8Us>and >he said he does not take the json objects and split them because the >client libraries do not agree on implementations. This was exactly my >concern as well with this solution. I would be perfectly happy to do it >this way instead of using JSON if it were standardized. The reason I >suggested JSON is that it is standardized. As far as I can tell, >Cassandra >doesn't support maps and lists in a standardized way today, which is the >root of my problem. > >-Ben > > >On Thu, Mar 29, 2012 at 11:30 AM, Drew Kutcharian wrote: > >> Yes, I meant the "row header index". What I have done is that I'm >>storing >> an object (i.e. UserProfile) where you read or write it as a whole (a >>user >> updates their user details in a single page in the UI). So I serialize >>that >> object into a binary JSON using SMILE format. I then compress it using >> Snappy on the client side. So as far as Cassandra cares it's storing a >> byte[]. >> >> Now on the client side, I'm using cassandra-cli with a custom type that >> knows how to turn a byte[] into a JSON text and back. The only issue was >> CASSANDRA-4081 where "assume" doesn't work with custom types. If >> CASSANDRA-4081 gets fixed, I'll get the best of both worlds. >> >> Also advantages of this vs. the thrift based Super Column families are: >> >> 1. Saving extra CPU usage on the Cassandra nodes. Since >> serialize/deserialize and compression/decompression happens on the >>client >> nodes where there is plenty idle CPU time >> >> 2. Saving network bandwidth since I'm sending over a compressed byte[] >> >> >> -- Drew >> >> >> >> On Mar 29, 2012, at 11:16 AM, Jonathan Ellis wrote: >> >> > On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian >> wrote: >> >>> I think this is a much better approach because that gives you the >> >>> ability to update or retrieve just parts of objects efficiently, >> >>> rather than making column values just blobs with a bunch of special >> >>> case logic to introspect them. Which feels like a big step >>backwards >> >>> to me. >> >> >> >> Unless your access pattern involves reading/writing the whole >>document >> each time. In that case you're better off serializing the whole document >> and storing it in a column as a byte[] without incurring the overhead of >> column indexes. Right? >> > >> > Hmm, not sure what you're thinking of there. >> > >> > If you mean the "index" that's part of the row header for random >> > access within a row, then no, serializing to byte[] doesn't save you >> > anything. >> > >> > If you mean secondary indexes, don't declare any if you don't want >>any. >> :) >> > >> > Just telling C* to store a byte[] *will* be slightly lighter-weight >> > than giving it named columns, but we're talking negligible compared to >> > the overhead of actually moving the data on or off disk in the first >> > place. Not even close to being worth giving up being able to deal >> > with your data from standard tools like cqlsh, IMO. >> > >> > -- >> > Jonathan Ellis >> > Project Chair, Apache Cassandra >> > co-founder of DataStax, the source for professional Cassandra support >> > http://www.datastax.com >> >>
Re: OoM querying very wide-row in CLI
Sorry, I didn't realize we weren't hip to pulls yet. I created a JIRA and attached the patch. https://issues.apache.org/jira/browse/CASSANDRA-4098 -brian On Tue, Mar 27, 2012 at 10:42 PM, Brian O'Neill wrote: > Here she is: > https://github.com/apache/cassandra/pull/8 > > Verified functionally with the attached data script. > > -brian > > > > On Tue, Mar 27, 2012 at 9:49 PM, Brian O'Neill wrote: > >> 10-4. I'll see if I can track it down and submit a pull request that >> specifies a default if one does not exist. >> >> -brian >> >> >> Brian O'Neill >> Lead Architect, Software Development >> Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406 >> p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/ >> blog: http://brianoneill.blogspot.com/ >> >> >> >> >> >> >> >> On 3/27/12 9:45 PM, "Jonathan Ellis" wrote: >> >> >I believe we added support for specifying a column range to the cli >> >recently. I don't know if there is a default limit. >> > >> >On Tue, Mar 27, 2012 at 8:40 PM, Brian O'Neill >> >wrote: >> >> Today, running 1.0.7, we saw a node crash with an OutOfMemory. >> >> We have a single row with ~10million columns in it. (using it as an >> >>index) >> >> Accidentally, we attempted to list the CF in CLI that had the wide-row. >> >> This caused the CLI to hang and then eventually crashed Cassandra with >> >>an >> >> OoM. >> >> >> >> I know this is a case of "If it hurts when you do that, don't do that", >> >>but >> >> we may want to better protect against it in the CLI and/or the DB. I >> >>know >> >> we limit row counts on lists in CLI. Do we also limit column counts? >> >>If >> >> not, I don't mind submitting a patch for this. >> >> >> >> let me know, >> >> brian >> >> >> >> -- >> >> Brian ONeill >> >> Lead Architect, Health Market Science (http://healthmarketscience.com) >> >> mobile:215.588.6024 >> >> blog: http://weblogs.java.net/blog/boneill42/ >> >> blog: http://brianoneill.blogspot.com/ >> > >> > >> > >> >-- >> >Jonathan Ellis >> >Project Chair, Apache Cassandra >> >co-founder of DataStax, the source for professional Cassandra support >> >http://www.datastax.com >> >> >> > > > -- > Brian ONeill > Lead Architect, Health Market Science (http://healthmarketscience.com) > mobile:215.588.6024 > blog: http://weblogs.java.net/blog/boneill42/ > blog: http://brianoneill.blogspot.com/ > > -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Re: OoM querying very wide-row in CLI
Here she is: https://github.com/apache/cassandra/pull/8 Verified functionally with the attached data script. -brian On Tue, Mar 27, 2012 at 9:49 PM, Brian O'Neill wrote: > 10-4. I'll see if I can track it down and submit a pull request that > specifies a default if one does not exist. > > -brian > > > Brian O'Neill > Lead Architect, Software Development > Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406 > p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/ > blog: http://brianoneill.blogspot.com/ > > > > > > > > On 3/27/12 9:45 PM, "Jonathan Ellis" wrote: > > >I believe we added support for specifying a column range to the cli > >recently. I don't know if there is a default limit. > > > >On Tue, Mar 27, 2012 at 8:40 PM, Brian O'Neill > >wrote: > >> Today, running 1.0.7, we saw a node crash with an OutOfMemory. > >> We have a single row with ~10million columns in it. (using it as an > >>index) > >> Accidentally, we attempted to list the CF in CLI that had the wide-row. > >> This caused the CLI to hang and then eventually crashed Cassandra with > >>an > >> OoM. > >> > >> I know this is a case of "If it hurts when you do that, don't do that", > >>but > >> we may want to better protect against it in the CLI and/or the DB. I > >>know > >> we limit row counts on lists in CLI. Do we also limit column counts? > >>If > >> not, I don't mind submitting a patch for this. > >> > >> let me know, > >> brian > >> > >> -- > >> Brian ONeill > >> Lead Architect, Health Market Science (http://healthmarketscience.com) > >> mobile:215.588.6024 > >> blog: http://weblogs.java.net/blog/boneill42/ > >> blog: http://brianoneill.blogspot.com/ > > > > > > > >-- > >Jonathan Ellis > >Project Chair, Apache Cassandra > >co-founder of DataStax, the source for professional Cassandra support > >http://www.datastax.com > > > -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Re: OoM querying very wide-row in CLI
10-4. I'll see if I can track it down and submit a pull request that specifies a default if one does not exist. -brian ---- Brian O'Neill Lead Architect, Software Development Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406 p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/ On 3/27/12 9:45 PM, "Jonathan Ellis" wrote: >I believe we added support for specifying a column range to the cli >recently. I don't know if there is a default limit. > >On Tue, Mar 27, 2012 at 8:40 PM, Brian O'Neill >wrote: >> Today, running 1.0.7, we saw a node crash with an OutOfMemory. >> We have a single row with ~10million columns in it. (using it as an >>index) >> Accidentally, we attempted to list the CF in CLI that had the wide-row. >> This caused the CLI to hang and then eventually crashed Cassandra with >>an >> OoM. >> >> I know this is a case of "If it hurts when you do that, don't do that", >>but >> we may want to better protect against it in the CLI and/or the DB. I >>know >> we limit row counts on lists in CLI. Do we also limit column counts? >>If >> not, I don't mind submitting a patch for this. >> >> let me know, >> brian >> >> -- >> Brian ONeill >> Lead Architect, Health Market Science (http://healthmarketscience.com) >> mobile:215.588.6024 >> blog: http://weblogs.java.net/blog/boneill42/ >> blog: http://brianoneill.blogspot.com/ > > > >-- >Jonathan Ellis >Project Chair, Apache Cassandra >co-founder of DataStax, the source for professional Cassandra support >http://www.datastax.com
OoM querying very wide-row in CLI
Today, running 1.0.7, we saw a node crash with an OutOfMemory. We have a single row with ~10million columns in it. (using it as an index) Accidentally, we attempted to list the CF in CLI that had the wide-row. This caused the CLI to hang and then eventually crashed Cassandra with an OoM. I know this is a case of "If it hurts when you do that, don't do that", but we may want to better protect against it in the CLI and/or the DB. I know we limit row counts on lists in CLI. Do we also limit column counts? If not, I don't mind submitting a patch for this. let me know, brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Triggers?
I just posted to the user list, but figured I would post here as well. We had a big session today designing application-level triggers using a new column family as a distributed commit log. When I got back to my desk, I re-googled Cassandra triggers, and re-read: https://issues.apache.org/jira/browse/CASSANDRA-1311 We had planned to implement something similar to the "crack smoking" concept... Keeping a separate column family that logged the mutation, which a trigger could then act on and write-back upon success. Conceptually, this doesn't seem too difficult to implement. Is anyone working on this already? If not, is it worth working it and contributing as a patch? Or should we just keep it to our app layer? -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Re: Cassandra has moved to Git
I'm by no means a git guru, but just happened to attend a meeting last night where the presenter addressed this exact issue. He has a pretty slick process that kept the master/trunk clean without rebasing by squashing a set of commits into a single commit when merged to trunk. (using git squash?) I'm CCing the guru, Nicholas Hance. Nicholas, can you share that process handout from last night? -brian On Thu, Jan 5, 2012 at 11:58 AM, Sylvain Lebresne wrote: > > This discourages collaboration because anyone that might fork > > github.com/author/666 is sitting on a powder keg. > > Alright, but then what is it you're proposing? > > > At best it's yak shaving. At worst it's going to result in some very > > frustrated contributors. This is one of the major reasons why rebase > > is so contentious, and it's exactly why you hear so many people saying > > "don't rebase branches that have been published". > > Again, I was more talking about the only reasonable solution I saw. > Because to be clear, if the history for some issue 666 in say trunk looks > like: > > commit : last nits from reviewer > commit : oops, typo that prevented commit > commit : some more fix found during review > commit : refactor half of preceding patch following reviewer comments > commit : Do something awesome - patch for #666 > > then imho that's a big regression from current patch based development. > > So basically my question is how do we meld all those commits that will > necessarily happen due to the nature of distributed reviews so that our > main history don't look like shit? And if the answer is "we don't" then > I'm not too fond of that solution. > > -- > Sylvain > -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
FYI -- BufferOverflowException out of CommitLog on trunk
I haven't had time to look into it yet, but just wanted to let you guys know that I hit this in case someone was in that code. ERROR 14:07:31,215 Fatal exception in thread Thread[COMMIT-LOG-WRITER,5,main] java.nio.BufferOverflowException at java.nio.Buffer.nextPutIndex(Buffer.java:501) at java.nio.DirectByteBuffer.putInt(DirectByteBuffer.java:654) at org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:259) at org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:568) at org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:49) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.lang.Thread.run(Thread.java:662) INFO 14:07:31,504 flushing high-traffic column family CFS(Keyspace='***', ColumnFamily='***') (estimated 103394287 bytes) It happened during a fairly standard load process using M/R. After that, the server refused to come down with a standard kill. -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Re: How is Cassandra being used?
Lively thread... +1 opt-in +1 in separate module I'll just substantiate Rick Shaw's comments. If this is on by default, I can see it making its way into production at a large corporation, at which time the traffic would sound an alarm as suspicious activity, which would immediately get the server's plug pulled and trigger an investigation. That would land the architect responsible for deploying that server in the proverbial principal's office. In the extreme case, that might "black-list" the technology and add fuel to any debate that the corporation should just stick with the 'proven enterprise' solutions. That is not my perspective, just be aware that in some large corporations it is an uphill battle to deploy Cassandra in the first place given incumbent systems. In every situation I've been in, even outside of large corporations, we would need to disable this feature given the sensitivity of the data. All that said... I would love to see this data. ;) I'd love to know where our deployment lies on the spectrum of use. Maybe a good old fashioned web form that allows companies to submit their usage scenarios might accomplish the same goal? (and you could get additional context information about the industry, etc.) It wouldn't be comprehensive, but it may be sufficiently representative. Maybe you could just output a couple lines at server start that said something like "Go here http://... to see how your usage compares to others." I personally wouldn't throw to big a hissy if it was incorporated into the actual server and on by default, but I certainly know others that would. -brian On Wed, Nov 16, 2011 at 7:17 AM, Eric Evans wrote: > On Wed, Nov 16, 2011 at 2:01 AM, Jonathan Ellis wrote: > > On Tue, Nov 15, 2011 at 7:02 PM, Eric Evans wrote: > >> I think this is potentially quite dangerous; There are a lot people > >> who get very twitchy at the idea of software that Phones Home. I've > >> seen this so many times, and in all cases it was for software a lot > >> less sensitive than a database. > > > > True, but unlike most Home Phoners, ours will be out there in the open > > and you can see exactly what it's sending (or not, if you disable it). > > I'm sure there's other examples in the wild of this, but the only one > > I can think of is popcorn [1]. > > I don't think the transparency of the implementation changes things > much. It's still going to be opaque to a lot of folks, and more > importantly is the precedence it sets and the way it changes the > project/user trust relationship. > > Even if you're satisfied with the implementation, and trust that it > won't be extended to transmit additional data later (unintentionally > or otherwise), there are still very valid privacy concerns. For > example, seeing as how this must be transmitted over an IP network, > there are only so many guarantees you can make with respect to > anonymity. There will always be *someone* that can tie the data to a > unique IP, and an IP can almost always be tied to an individual or > organization. Imagine an organization that doesn't want *anyone* to > know it uses Cassandra, and isn't willing to accept the risk that one > of their admins might accidentally enable this reporting. > > It's also interesting that you mention popcon because it has always > been contentious. It's taken years for it to transition from the > point where it required users to install it themselves, to a prompt at > install-time that defaulted to "No", to the current state of an > install-time prompt that defaults to "Yes". And, the installer asks > *very* few questions; Whether or not popcon is enabled is on par with > partitioning and the assignment of a root password. > > Also, there should be no shame in the admission that we haven't earned > anywhere near the level of trust and respect that the Debian project > has. > > > More broadly, my sense is that people are getting used to the idea > > that it's okay to give away anonymous statistics as part of the price > > of "free," although YMMclearlyV. I am, after all, a Windows user. :) > > As privacy becomes more threatened people are either capitulating, or > becoming even more defensive; Whether that makes it better or worse > for us if we do this is debatable. > > >> I'm sure you've already considered this though, you're already talking > >> about anonymity, and transparency, and what I assume is neutrality of > >> the collection endpoint (can apache actually provide a VM; is that a > >> thing?). > > > > Yes, they provide Ubuntu or FreeBSD VMs. > > > >> I'm just afraid that we'll scare people off before they can > >> be properly convinced that it's all on the up-and-up. > > > > How would you propose addressing this? > > Honestly? The best way to convince people that we take the privacy of > their data seriously is to not transmit any of it to a machine outside > their control. > > >> I'm curious to see what others think, but at the moment I'm hovering > >> somewhere around
Re: AOP for SOLR Integration with Cassandra
Understandable. I'll leave it as is then in the REST layer. -brian On Fri, Nov 4, 2011 at 11:24 PM, Jonathan Ellis wrote: > On Fri, Nov 4, 2011 at 3:57 PM, Brian O'Neill > wrote: > > Doing it with AOP will also allow us to move it into > > the main codebase if/when we want to. > > I'm not sure I understand. I'm definitely -1 about adding an AspectJ > dependency or similar to core C*. > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
AOP for SOLR Integration with Cassandra
I just sent an email out over the users list. Over a couple nights this week, I added SOLR integration into Virgil. (Virgil is that REST layer that we've been building out over in Apache Extras) I just wanted to through an idea out to the dev list... I plan to migrate the current implementation in Virgil to use AOP. That will provide a good separation of concerns between Cassandra Storage and the SOLR indexing. Doing it with AOP will also allow us to move it into the main codebase if/when we want to. We would simply move the AOP to surround CassandraServer. (or lower... even down into Storage) Let me know if you think that is worth exploring further. -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Contribution: Native REST Layer for Cassandra
Jeremy/Jonathan, When you finish celebrating the 1.0 release, I just submitted a native rest layer for Cassandra. https://issues.apache.org/jira/browse/CASSANDRA-3380 It uses JAX-RS and Apache CXF supporting the following operations (JSON over HTTP): - Create keyspace - Drop keyspace - Create column family - Drop column family - Insert row - Fetch row - Delete row - Insert column - Fetch column - Delete column This is a new module under contrib/rest. It builds using ant and ivy. I also included a maven pom.xml file that makes it easier to get setup in Eclipse for those that use m2eclipse. You start the server with bin/rest_cassandra. After that, you can issue all commands over HTTP on port 8080. I included example curl commands in the README.txt. There are junit tests that provide good code coverage of the JSON marshalling, the system and data operations as well as the REST layer. Let me know if you have any trouble building / using it. In the meantime, I'll start work on some additional todo's. Specifically we should add: - Better exception handling - Host/Port configuration - Security - XML support - Binary object / Byte support (assumes String's right now) (kudos to Gary Dusbabek for the initial thought to implement this as a native layer) all the best, brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Re: Eclipse style/formatting file?
Perfect. Thanks. -brian On Thu, Oct 13, 2011 at 11:13 AM, Feiyi Wang wrote: > How about this? > https://github.com/tjake/cassandra-style-eclipse > > Feiyi > > > On Thu, Oct 13, 2011 at 10:58 AM, Brian O'Neill >wrote: > > > All, > > > > Anyone have an eclipse style/formatting file compatible with the > Cassandra > > code? > > > > I don't see one here: > > http://wiki.apache.org/cassandra/RunningCassandraInEclipse > > > > (I'm trying to get the REST API in a good state for contribution) > > > > thanks, > > brian > > > > -- > > Brian ONeill > > Lead Architect, Health Market Science (http://healthmarketscience.com) > > mobile:215.588.6024 > > blog: http://weblogs.java.net/blog/boneill42/ > > blog: http://brianoneill.blogspot.com/ > > > -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Eclipse style/formatting file?
All, Anyone have an eclipse style/formatting file compatible with the Cassandra code? I don't see one here: http://wiki.apache.org/cassandra/RunningCassandraInEclipse (I'm trying to get the REST API in a good state for contribution) thanks, brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Re: REST API?
To give everyone an update... I was able to take what Gary had and update it to run on trunk. I like the native integration, as opposed to layering it on top of Hector. It's working out well. I layered in JAX-RS to replace the hand parsing of the url, and the handlers. I have reads and writes working through the StorageProxy, but I think I'm going to raise it up one layer to take advantage of ThriftValidations. (but still using direct method invocation instead of the thift client) I added unit tests for the read/write of columns. I'm going to add a few other operations (add/drop keyspace, add/drop CF). Then it should be in a state where I can share it. -brian On Mon, Oct 10, 2011 at 10:06 PM, Jeremy Hanna wrote: > Brian, > > If you end up doing something with the rest api and making it > available/open source, please post again either here or on the user list. I > think others would be interested and may contribute to it. > > Cheers, > > Jeremy > > On Oct 10, 2011, at 8:42 PM, Brian O'Neill wrote: > > > Thanks Gary. Perfect. Checking it out now. > > > > Performance isn't much of a concern for us through the REST interface. > We > > are using the Hadoop/PIG integration to do the heavy lifting. This will > be > > mostly for reads and small number of writes. > > > > I'll definitely give this a try. Thanks again. I'll let you know how it > > turns out. > > > > -brian > > > > On Mon, Oct 10, 2011 at 9:35 PM, Gary Dusbabek > wrote: > > > >> It turns out that it is pretty easy (or it was a year ago) to replace > >> the native Cassandra transport with your own. I wrote about it on my > >> blog (http://www.onemanclapping.org/2010/09/restful-cassandra.html), > >> using REST as an example. > >> > >> > >> On Mon, Oct 10, 2011 at 20:12, Brian O'Neill > >> wrote: > >>> My team desperately needs a REST API for Cassandra. > >>> > >>> I saw the following: > >>> http://code.google.com/p/restish/ > >>> from > >>> > >> > http://crlog.info/2011/01/29/restish-wrapper-for-hectorcassandra-data-manipulation/ > >>> > >>> But it appears to have little activity and documentation. > >>> > >>> That lead me to start work on a contrib/rest module, but before I get > to > >> far > >>> I wanted to ask if there was any effort underway for a REST Server/API. > >>> If not, I'll continue developing the REST server. Any preference for a > >> REST > >>> stack? (JAX-RS on Apache-CXF? Raw Servlets? Netty? etc.) > >>> > >>> Until I hear back, I'll continue with the JAX-RS / Apache CXF > >> implementation > >>> I have cooking. > >>> > >>> -brian > >>> > >>> -- > >>> Brian ONeill > >>> Lead Architect, Health Market Science (http://healthmarketscience.com) > >>> mobile:215.588.6024 > >>> blog: http://weblogs.java.net/blog/boneill42/ > >>> blog: http://brianoneill.blogspot.com/ > >>> > >> > > > > > > > > -- > > Brian ONeill > > Lead Architect, Health Market Science (http://healthmarketscience.com) > > mobile:215.588.6024 > > blog: http://weblogs.java.net/blog/boneill42/ > > blog: http://brianoneill.blogspot.com/ > > -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Re: REST API?
Will do. I've picked up where Gary left off. It is good starting point, with a good mapping between REST and get/set/mutations. (kudos to Gary) I'll update it to accomodate any changes and see if I can add some tests on top of it. I may look to add in JAX-RS (on either Jersey or Apache CXF). We use it for all of our REST services, and it may provide a good abstraction layer that we can build on. Give me a couple days. I have to get back into the "ant mentality". I've been doing maven too long. BTW -- Does anyone know if there are plans to move to maven? (Not trying to start a religious war, just curious. ;) -brian On Mon, Oct 10, 2011 at 10:06 PM, Jeremy Hanna wrote: > Brian, > > If you end up doing something with the rest api and making it > available/open source, please post again either here or on the user list. I > think others would be interested and may contribute to it. > > Cheers, > > Jeremy > > On Oct 10, 2011, at 8:42 PM, Brian O'Neill wrote: > > > Thanks Gary. Perfect. Checking it out now. > > > > Performance isn't much of a concern for us through the REST interface. > We > > are using the Hadoop/PIG integration to do the heavy lifting. This will > be > > mostly for reads and small number of writes. > > > > I'll definitely give this a try. Thanks again. I'll let you know how it > > turns out. > > > > -brian > > > > On Mon, Oct 10, 2011 at 9:35 PM, Gary Dusbabek > wrote: > > > >> It turns out that it is pretty easy (or it was a year ago) to replace > >> the native Cassandra transport with your own. I wrote about it on my > >> blog (http://www.onemanclapping.org/2010/09/restful-cassandra.html), > >> using REST as an example. > >> > >> > >> On Mon, Oct 10, 2011 at 20:12, Brian O'Neill > >> wrote: > >>> My team desperately needs a REST API for Cassandra. > >>> > >>> I saw the following: > >>> http://code.google.com/p/restish/ > >>> from > >>> > >> > http://crlog.info/2011/01/29/restish-wrapper-for-hectorcassandra-data-manipulation/ > >>> > >>> But it appears to have little activity and documentation. > >>> > >>> That lead me to start work on a contrib/rest module, but before I get > to > >> far > >>> I wanted to ask if there was any effort underway for a REST Server/API. > >>> If not, I'll continue developing the REST server. Any preference for a > >> REST > >>> stack? (JAX-RS on Apache-CXF? Raw Servlets? Netty? etc.) > >>> > >>> Until I hear back, I'll continue with the JAX-RS / Apache CXF > >> implementation > >>> I have cooking. > >>> > >>> -brian > >>> > >>> -- > >>> Brian ONeill > >>> Lead Architect, Health Market Science (http://healthmarketscience.com) > >>> mobile:215.588.6024 > >>> blog: http://weblogs.java.net/blog/boneill42/ > >>> blog: http://brianoneill.blogspot.com/ > >>> > >> > > > > > > > > -- > > Brian ONeill > > Lead Architect, Health Market Science (http://healthmarketscience.com) > > mobile:215.588.6024 > > blog: http://weblogs.java.net/blog/boneill42/ > > blog: http://brianoneill.blogspot.com/ > > -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Re: REST API?
Thanks Gary. Perfect. Checking it out now. Performance isn't much of a concern for us through the REST interface. We are using the Hadoop/PIG integration to do the heavy lifting. This will be mostly for reads and small number of writes. I'll definitely give this a try. Thanks again. I'll let you know how it turns out. -brian On Mon, Oct 10, 2011 at 9:35 PM, Gary Dusbabek wrote: > It turns out that it is pretty easy (or it was a year ago) to replace > the native Cassandra transport with your own. I wrote about it on my > blog (http://www.onemanclapping.org/2010/09/restful-cassandra.html), > using REST as an example. > > > On Mon, Oct 10, 2011 at 20:12, Brian O'Neill > wrote: > > My team desperately needs a REST API for Cassandra. > > > > I saw the following: > > http://code.google.com/p/restish/ > > from > > > http://crlog.info/2011/01/29/restish-wrapper-for-hectorcassandra-data-manipulation/ > > > > But it appears to have little activity and documentation. > > > > That lead me to start work on a contrib/rest module, but before I get to > far > > I wanted to ask if there was any effort underway for a REST Server/API. > > If not, I'll continue developing the REST server. Any preference for a > REST > > stack? (JAX-RS on Apache-CXF? Raw Servlets? Netty? etc.) > > > > Until I hear back, I'll continue with the JAX-RS / Apache CXF > implementation > > I have cooking. > > > > -brian > > > > -- > > Brian ONeill > > Lead Architect, Health Market Science (http://healthmarketscience.com) > > mobile:215.588.6024 > > blog: http://weblogs.java.net/blog/boneill42/ > > blog: http://brianoneill.blogspot.com/ > > > -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
REST API?
My team desperately needs a REST API for Cassandra. I saw the following: http://code.google.com/p/restish/ from http://crlog.info/2011/01/29/restish-wrapper-for-hectorcassandra-data-manipulation/ But it appears to have little activity and documentation. That lead me to start work on a contrib/rest module, but before I get to far I wanted to ask if there was any effort underway for a REST Server/API. If not, I'll continue developing the REST server. Any preference for a REST stack? (JAX-RS on Apache-CXF? Raw Servlets? Netty? etc.) Until I hear back, I'll continue with the JAX-RS / Apache CXF implementation I have cooking. -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Patch for Contrib/Pig to accommodate refactoring of hexToBytes
Jonathan, We need a small update to contrib/pig to accommodate pulling hexToBytes out of FBUtilities into Hex. I raised an issue, and attached is the patch for trunk. https://issues.apache.org/jira/browse/CASSANDRA-3341 -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/ Index: src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java === --- src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java (revision 1181048) +++ src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java (working copy) @@ -26,7 +26,7 @@ import org.apache.cassandra.db.marshal.IntegerType; import org.apache.cassandra.db.marshal.TypeParser; import org.apache.cassandra.thrift.*; -import org.apache.cassandra.utils.FBUtilities; +import org.apache.cassandra.utils.Hex; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; @@ -601,7 +601,7 @@ TSerializer serializer = new TSerializer(new TBinaryProtocol.Factory()); try { -return FBUtilities.bytesToHex(serializer.serialize(cfDef)); +return Hex.bytesToHex(serializer.serialize(cfDef)); } catch (TException e) { @@ -616,7 +616,7 @@ CfDef cfDef = new CfDef(); try { -deserializer.deserialize(cfDef, FBUtilities.hexToBytes(st)); +deserializer.deserialize(cfDef, Hex.hexToBytes(st)); } catch (TException e) {