Re: Unit Testing Cassandra

2013-06-19 Thread Stephen Connolly
Unit testing means testing in isolation the smallest part.

Unit tests should not take more than a few milliseconds to set up and
verify their assertions.

As such, if your code is not factored well for testing, you would typically
use mocking (either by hand, or with mocking libraries) to mock out the
bits not under test.

Extensive use of mocks is usually a smell of code that is not well designed
*for testing*

If you intend to test components integrated together... That is integration
testing.

If you intend to test performance of the whole or significant parts of the
whole... That is performance testing.

When searching for the above, you will not get much luck if you are looking
for them in the context of "unit testing" as those things are *outside the
scope of unit testing"

On Wednesday, 19 June 2013, Shahab Yunus wrote:

> Hello,
>
> Can anyone suggest a good/popular Unit Test tools/frameworks/utilities out
> there for unit testing Cassandra stores? I am looking for testing from
> performance/load and monitoring perspective. I am using 1.2.
>
> Thanks a lot.
>
> Regards,
> Shahab
>


-- 
Sent from my phone


[RESULT] [VOTE] Release Mojo's Cassandra Maven Plugin 1.2.1-1

2013-02-25 Thread Stephen Connolly
Result

+1: Stephen Connolly, Mikhail Mazursky
0: Fred Cooke
-1:

-Stephen


On 14 February 2013 09:28, Stephen Connolly  wrote:

> Hi,
>
> I'd like to release version 1.2.1-1 of Mojo's Cassandra Maven Plugin
> to sync up with the 1.2.1 release of Apache Cassandra.
>
> We solved 1 issues:
>
> http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=12121&version=19089
>
> Staging Repository:
> https://nexus.codehaus.org/content/repositories/orgcodehausmojo-015/
>
> Site:
> http://mojo.codehaus.org/cassandra-maven-plugin/index.html
>
> SCM Tag:
> https://svn.codehaus.org/mojo/tags/cassandra-maven-plugin-1.2.1-1@17931
>
>  [ ] +1 Yeah! fire ahead oh and the blind man on the galloping horse
> says it looks fine too.
>  [ ] 0 Mehhh! like I care, I don't have any opinions either, I'd
> follow somebody else if only I could decide who
>  [ ] -1 No! wait up there I have issues (in general like, ya know,
> and being a trouble-maker is only one of them)
>
> The vote is open for 72h and will succeed by lazy consensus.
>
> Guide to testing staged releases:
> http://maven.apache.org/guides/development/guide-testing-releases.html
>
> Cheers
>
> -Stephen
>
> P.S.
>  In the interest of ensuring (more is) better testing, and as is now
> tradition for Mojo's Cassandra Maven Plugin, this vote is
> also open to any subscribers of the dev and user@cassandra.apache.org
> mailing lists that want to test or use this plugin.
>


[VOTE] Release Mojo's Cassandra Maven Plugin 1.2.1-1

2013-02-14 Thread Stephen Connolly
Hi,

I'd like to release version 1.2.1-1 of Mojo's Cassandra Maven Plugin
to sync up with the 1.2.1 release of Apache Cassandra.

We solved 1 issues:
http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=12121&version=19089

Staging Repository:
https://nexus.codehaus.org/content/repositories/orgcodehausmojo-015/

Site:
http://mojo.codehaus.org/cassandra-maven-plugin/index.html

SCM Tag:
https://svn.codehaus.org/mojo/tags/cassandra-maven-plugin-1.2.1-1@17931

 [ ] +1 Yeah! fire ahead oh and the blind man on the galloping horse
says it looks fine too.
 [ ] 0 Mehhh! like I care, I don't have any opinions either, I'd
follow somebody else if only I could decide who
 [ ] -1 No! wait up there I have issues (in general like, ya know,
and being a trouble-maker is only one of them)

The vote is open for 72h and will succeed by lazy consensus.

Guide to testing staged releases:
http://maven.apache.org/guides/development/guide-testing-releases.html

Cheers

-Stephen

P.S.
 In the interest of ensuring (more is) better testing, and as is now
tradition for Mojo's Cassandra Maven Plugin, this vote is
also open to any subscribers of the dev and user@cassandra.apache.org
mailing lists that want to test or use this plugin.


[mojo-dev] [RESULT] [VOTE] Release Mojo's Cassandra Maven Plugin 1.2.0-1

2013-02-14 Thread Stephen Connolly
This vote has passed:

+1: Stephen, Michael, Mikhail
0:
-1:

I will proceed with the promotion of artifacts to central

-Stephen


On 14 February 2013 06:26, Mikhail Mazursky  wrote:

> +1. Please, release it.
>
>
> 2013/2/14 Stephen Connolly 
>
>> More I'm looking for somebody who is actively sing C* to test it (there
>> are a couple of users... The lot f you who asked me to roll another
>> release). I will roll a 1.2.1 once I close this vote... I could close with
>> lazy consensus, but feel more comfortable if it has ad some testing ;-)
>>
>>
>> On Wednesday, 13 February 2013, Michael Kjellman wrote:
>>
>>> Considering that 1.2.1 is out, and looking at your project very quickly
>>> (looks interesting)/overlaps a bit with CCMBridge no?/ I'd def say +1 :)
>>>
>>> From: Stephen Connolly >> stephen.alan.conno...@gmail.com>>
>>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
>>> mailto:user@cassandra.apache.org>>
>>>
>>> Date: Wednesday, February 13, 2013 1:27 PM
>>> To: "d...@mojo.codehaus.org<mailto:d...@mojo.codehaus.org>" <
>>> d...@mojo.codehaus.org<mailto:d...@mojo.codehaus.org>>, dev <
>>> d...@cassandra.apache.org<mailto:d...@cassandra.apache.org>>, "
>>> user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
>>> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>>>
>>> Subject: Re: [VOTE] Release Mojo's Cassandra Maven Plugin 1.2.0-1
>>>
>>> Ping
>>>
>>> On Monday, 4 February 2013, Stephen Connolly wrote:
>>> Hi,
>>>
>>> I'd like to release version 1.2.0-1 of Mojo's Cassandra Maven Plugin
>>> to sync up with the 1.2.0 release of Apache Cassandra. (a 1.2.1-1 will
>>> follow shortly after this release, but it should be possible to use the
>>> xpath://project/build/plugins/plugin/dependencies/dependency override of
>>> cassandra-server to use C* releases from the 1.2.x stream now that the link
>>> errors have been resolved, so that is less urgent)
>>>
>>> We solved 1 issues:
>>>
>>> http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=12121&version=18467
>>>
>>> Staging Repository:
>>> https://nexus.codehaus.org/content/repositories/orgcodehausmojo-013/
>>>
>>> Site:
>>> http://mojo.codehaus.org/cassandra-maven-plugin/index.html
>>>
>>> SCM Tag:
>>> https://svn.codehaus.org/mojo/tags/cassandra-maven-plugin-1.2.0-1@17921
>>>
>>>  [ ] +1 Yeah! fire ahead oh and the blind man on the galloping horse
>>> says it looks fine too.
>>>  [ ] 0 Mehhh! like I care, I don't have any opinions either, I'd
>>> follow somebody else if only I could decide who
>>>  [ ] -1 No! wait up there I have issues (in general like, ya know,
>>> and being a trouble-maker is only one of them)
>>>
>>> The vote is open for 72h and will succeed by lazy consensus.
>>>
>>> Guide to testing staged releases:
>>> http://maven.apache.org/guides/development/guide-testing-releases.html
>>>
>>> Cheers
>>>
>>> -Stephen
>>>
>>> P.S.
>>>  In the interest of ensuring (more is) better testing, and as is now
>>> tradition for Mojo's Cassandra Maven Plugin, this vote is
>>> also open to any subscribers of the dev and user@cassandra.apache.org
>>> 
>>>
>>> mailing lists that want to test or use this plugin.
>>>
>>
>


Re: [VOTE] Release Mojo's Cassandra Maven Plugin 1.2.0-1

2013-02-13 Thread Stephen Connolly
More I'm looking for somebody who is actively sing C* to test it (there are
a couple of users... The lot f you who asked me to roll another release). I
will roll a 1.2.1 once I close this vote... I could close with lazy
consensus, but feel more comfortable if it has ad some testing ;-)

On Wednesday, 13 February 2013, Michael Kjellman wrote:

> Considering that 1.2.1 is out, and looking at your project very quickly
> (looks interesting)/overlaps a bit with CCMBridge no?/ I'd def say +1 :)
>
> From: Stephen Connolly 
> <mailto:stephen.alan.conno...@gmail.com >>
> Reply-To: "user@cassandra.apache.org  user@cassandra.apache.org >" 
> 
> <mailto:user@cassandra.apache.org >>
> Date: Wednesday, February 13, 2013 1:27 PM
> To: "d...@mojo.codehaus.org 
> <mailto:d...@mojo.codehaus.org>"
>  <mailto:d...@mojo.codehaus.org>>,
> dev  d...@cassandra.apache.org >>, 
> "user@cassandra.apache.org
> <mailto:user@cassandra.apache.org >" <
> user@cassandra.apache.org 
> <mailto:user@cassandra.apache.org
> >>
> Subject: Re: [VOTE] Release Mojo's Cassandra Maven Plugin 1.2.0-1
>
> Ping
>
> On Monday, 4 February 2013, Stephen Connolly wrote:
> Hi,
>
> I'd like to release version 1.2.0-1 of Mojo's Cassandra Maven Plugin
> to sync up with the 1.2.0 release of Apache Cassandra. (a 1.2.1-1 will
> follow shortly after this release, but it should be possible to use the
> xpath://project/build/plugins/plugin/dependencies/dependency override of
> cassandra-server to use C* releases from the 1.2.x stream now that the link
> errors have been resolved, so that is less urgent)
>
> We solved 1 issues:
>
> http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=12121&version=18467
>
> Staging Repository:
> https://nexus.codehaus.org/content/repositories/orgcodehausmojo-013/
>
> Site:
> http://mojo.codehaus.org/cassandra-maven-plugin/index.html
>
> SCM Tag:
> https://svn.codehaus.org/mojo/tags/cassandra-maven-plugin-1.2.0-1@17921
>
>  [ ] +1 Yeah! fire ahead oh and the blind man on the galloping horse
> says it looks fine too.
>  [ ] 0 Mehhh! like I care, I don't have any opinions either, I'd
> follow somebody else if only I could decide who
>  [ ] -1 No! wait up there I have issues (in general like, ya know,
> and being a trouble-maker is only one of them)
>
> The vote is open for 72h and will succeed by lazy consensus.
>
> Guide to testing staged releases:
> http://maven.apache.org/guides/development/guide-testing-releases.html
>
> Cheers
>
> -Stephen
>
> P.S.
>  In the interest of ensuring (more is) better testing, and as is now
> tradition for Mojo's Cassandra Maven Plugin, this vote is
> also open to any subscribers of the dev and 
> user@cassandra.apache.org
> 
> ');>
> mailing lists that want to test or use this plugin.
>


Re: [VOTE] Release Mojo's Cassandra Maven Plugin 1.2.0-1

2013-02-13 Thread Stephen Connolly
Ping

On Monday, 4 February 2013, Stephen Connolly wrote:

> Hi,
>
> I'd like to release version 1.2.0-1 of Mojo's Cassandra Maven Plugin
> to sync up with the 1.2.0 release of Apache Cassandra. (a 1.2.1-1 will
> follow shortly after this release, but it should be possible to use the
> xpath://project/build/plugins/plugin/dependencies/dependency override of
> cassandra-server to use C* releases from the 1.2.x stream now that the link
> errors have been resolved, so that is less urgent)
>
> We solved 1 issues:
>
> http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=12121&version=18467
>
> Staging Repository:
> https://nexus.codehaus.org/content/repositories/orgcodehausmojo-013/
>
> Site:
> http://mojo.codehaus.org/cassandra-maven-plugin/index.html
>
> SCM Tag:
> https://svn.codehaus.org/mojo/tags/cassandra-maven-plugin-1.2.0-1@17921
>
>  [ ] +1 Yeah! fire ahead oh and the blind man on the galloping horse
> says it looks fine too.
>  [ ] 0 Mehhh! like I care, I don't have any opinions either, I'd
> follow somebody else if only I could decide who
>  [ ] -1 No! wait up there I have issues (in general like, ya know,
> and being a trouble-maker is only one of them)
>
> The vote is open for 72h and will succeed by lazy consensus.
>
> Guide to testing staged releases:
> http://maven.apache.org/guides/development/guide-testing-releases.html
>
> Cheers
>
> -Stephen
>
> P.S.
>  In the interest of ensuring (more is) better testing, and as is now
> tradition for Mojo's Cassandra Maven Plugin, this vote is
> also open to any subscribers of the dev and 
> user@cassandra.apache.org 'user@cassandra.apache.org');>
> mailing lists that want to test or use this plugin.
>


[VOTE] Release Mojo's Cassandra Maven Plugin 1.2.0-1

2013-02-04 Thread Stephen Connolly
Hi,

I'd like to release version 1.2.0-1 of Mojo's Cassandra Maven Plugin
to sync up with the 1.2.0 release of Apache Cassandra. (a 1.2.1-1 will
follow shortly after this release, but it should be possible to use the
xpath://project/build/plugins/plugin/dependencies/dependency override of
cassandra-server to use C* releases from the 1.2.x stream now that the link
errors have been resolved, so that is less urgent)

We solved 1 issues:
http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=12121&version=18467

Staging Repository:
https://nexus.codehaus.org/content/repositories/orgcodehausmojo-013/

Site:
http://mojo.codehaus.org/cassandra-maven-plugin/index.html

SCM Tag:
https://svn.codehaus.org/mojo/tags/cassandra-maven-plugin-1.2.0-1@17921

 [ ] +1 Yeah! fire ahead oh and the blind man on the galloping horse
says it looks fine too.
 [ ] 0 Mehhh! like I care, I don't have any opinions either, I'd
follow somebody else if only I could decide who
 [ ] -1 No! wait up there I have issues (in general like, ya know,
and being a trouble-maker is only one of them)

The vote is open for 72h and will succeed by lazy consensus.

Guide to testing staged releases:
http://maven.apache.org/guides/development/guide-testing-releases.html

Cheers

-Stephen

P.S.
 In the interest of ensuring (more is) better testing, and as is now
tradition for Mojo's Cassandra Maven Plugin, this vote is
also open to any subscribers of the dev and user@cassandra.apache.org
mailing lists that want to test or use this plugin.


[ANN] Mojo's Cassandra Maven Plugin 1.1.0-1 released

2012-05-07 Thread Stephen Connolly
The Mojo team is pleased to announce the release of Mojo's Cassandra
Maven Plugin version 1.1.0-1.

Mojo's Cassandra Plugin is used when you want to install and control a
test instance of Apache Cassandra from within your Apache Maven build.

The Cassandra Plugin has the following goals.

 * cassandra:start Starts up a test instance of Cassandra in the background.
 * cassandra:stop Stops the test instance of Cassandra that was
started using cassandra:start.
 * cassandra:start-cluster Starts up a test cluster of Cassandra in
the background bound to the local loopback IP addresses 127.0.0.1,
127.0.0.2, etc.
 * cassandra:stop Stops the test cluster of Cassandra that was
started using cassandra:start.
 * cassandra:run Starts up a test instance of Cassandra in the foreground.
 * cassandra:load Runs a cassandra-cli script against the test
instance of Cassandra.
 * cassandra:repair Runs nodetool repair against the test instance of
Cassandra.
 * cassandra:flush Runs nodetool flush against the test instance of Cassandra.
 * cassandra:compact Runs nodetool compact against the test instance
of Cassandra.
 * cassandra:cleanup Runs nodetool cleanup against the test instance
of Cassandra.
 * cassandra:delete Deletes the the test instance of Cassandra.
 * cassandra:cql-exec Execute a CQL statement (directly or from a
file) against the test instance of Cassandra.

http://mojo.codehaus.org/cassandra-maven-plugin/

To use this version, simply specify the version in your project's
plugin configuration:


   org.codehaus.mojo
   cassandra-maven-plugin
   1.1.0-1



Release Notes - Mojo's Cassandra Maven Plugin - Version 1.1.0-1

** Bug
* [MCASSANDRA-15] - Whitespace in path breaks execution

** New Feature
* [MCASSANDRA-18] - Support Cassandra 1.1

Enjoy,

The Mojo team.

Apache, Apache Maven, Apache Cassandra, Maven and Cassandra are
trademarks of The Apache Software Foundation.


[RESULT] [VOTE] Release Mojo's Cassandra Maven Plugin 1.0.0-1

2012-05-05 Thread Stephen Connolly
This vote has passed with the following results

+1: Stephen Connolly (dev@mojo), Colin Tayloy (dev@cassandra), Rick
Shaw (dev@cassandra)
0:
-1:

I will proceed to push the plugin artifacts into central

On 2 May 2012 12:15, Stephen Connolly  wrote:
> Hi,
>
> I'd like to release version 1.1.0-1 of Mojo's Cassandra Maven Plugin
> to sync up with the 1.1.0 release of Apache Cassandra.
>
> We solved 2 issues:
> http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=12121&version=17926
>
> Staging Repository:
> https://nexus.codehaus.org/content/repositories/orgcodehausmojo-068/
>
> Site:
> http://mojo.codehaus.org/cassandra-maven-plugin/index.html
>
> SCM Tag:
> https://svn.codehaus.org/mojo/tags/cassandra-maven-plugin-1.1.0-1@16519
>
>  [ ] +1 Yeah! fire ahead oh and the blind man on the galloping horse
> says it looks fine too.
>  [ ] 0 Mehhh! like I care, I don't have any opinions either, I'd
> follow somebody else if only I could decide who
>  [ ] -1 No! wait up there I have issues (in general like, ya know,
> and being a trouble-maker is only one of them)
>
> The vote is open for 72h and will succeed by lazy consensus.
>
> Guide to testing staged releases:
> http://maven.apache.org/guides/development/guide-testing-releases.html
>
> Cheers
>
> -Stephen
>
> P.S.
>  In the interest of ensuring (more is) better testing, this vote is
> also open to subscribers of the dev and user@cassandra.apache.org
> mailing lists.


[VOTE] Release Mojo's Cassandra Maven Plugin 1.0.0-1

2012-05-02 Thread Stephen Connolly
Hi,

I'd like to release version 1.1.0-1 of Mojo's Cassandra Maven Plugin
to sync up with the 1.1.0 release of Apache Cassandra.

We solved 2 issues:
http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=12121&version=17926

Staging Repository:
https://nexus.codehaus.org/content/repositories/orgcodehausmojo-068/

Site:
http://mojo.codehaus.org/cassandra-maven-plugin/index.html

SCM Tag:
https://svn.codehaus.org/mojo/tags/cassandra-maven-plugin-1.1.0-1@16519

 [ ] +1 Yeah! fire ahead oh and the blind man on the galloping horse
says it looks fine too.
 [ ] 0 Mehhh! like I care, I don't have any opinions either, I'd
follow somebody else if only I could decide who
 [ ] -1 No! wait up there I have issues (in general like, ya know,
and being a trouble-maker is only one of them)

The vote is open for 72h and will succeed by lazy consensus.

Guide to testing staged releases:
http://maven.apache.org/guides/development/guide-testing-releases.html

Cheers

-Stephen

P.S.
 In the interest of ensuring (more is) better testing, this vote is
also open to subscribers of the dev and user@cassandra.apache.org
mailing lists.


Re: data agility

2011-11-20 Thread Stephen Connolly
if your startup is bootstrapping then cassandra is sometimes to heavy to
start with.

i.e. it needs to be fed ram... you're not going to seriously run it in less
than 1gb per node... that level of ram commitment can be too much while
bootstrapping.

if your startup has enough cash to pay for 3-5 recommended spec (see wiki)
nodes to be up 24/7 then cassandra is a good fit...

a friend of mine is bootstrapping a startup and had to drop back to mysql
while he finds his pain points and customers... he knows he will end up
jumping back to cassandra when he gets enough customers (or a VC) but for
now the running costs are too much to pay from his own pocket... note that
the jdbc driver and cql will make jumping back easy for him (as he still
tests with c*... just runs at present against mysql nuts eh!)

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 20 Nov 2011 19:07, "Dotan N."  wrote:

> Hi all,
> my question may be more philosophical than related technically
> to Cassandra, but please bear with me.
>
> Given that a young startup may not know its product full at the early
> stages, but that it definitely points to ~200M users,
> would Cassandra will be the right way to go?
>
> That is, the requirement is for a large data store, that can move with
> product changes and requirements swiftly.
>
> Given that in Cassandra one thinks hard about the queries, and then builds
> a model to suit it best, I was thinking of
> this situation as problematic.
>
> So here are some questions:
>
> - would it be wiser to start with a more agile data store (such as
> mongodb) and then progress onto Cassandra, when the product itself
> solidifies?
> - given that we start with Cassandra from the get go, what is a common
> (and quick in terms of development) way or practice to change data, change
> schemas, as the product evolves?
> - is it even smart to start with Cassandra? would only startups whose core
> business is big data start with it from the get go?
> - how would you do map/reduce with Cassandra? how agile is that? (for
> example, can you run map/reduce _very_ frequently?)
>
> Thanks!
>
> --
> Dotan, @jondot 
>
>


Re: Will writes with < ALL consistency eventually propagate?

2011-11-07 Thread Stephen Connolly
at that point, your cluster will either have so much data on each node that
you will need to split them, keeping rf=5 so you have 10 nodes... or the
intra cluster traffic will swap you and you will split each node keeping
rf=5 so you have 10 nodes again.

safest thing is not to design with the assumption that rf=n

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 7 Nov 2011 17:47, "Riyad Kalla"  wrote:

> Stephen,
>
> I appreciate you making the point more strongly; I won't make this
> decision lightly given the stress you are putting on it, but the technical
> aspects of this make me curious...
>
> If I start with RF=N (number of nodes) now, and in 2 years
> (hypothetically) my dataset is too large and I say to myself "Dangit,
> Stephen was right...", couldn't I just change the RF to some smaller value,
> say "3" at that point or would the Cassandra ring not rebalance the data
> set nicely at that point?
>
> More specifically, would it not know how best to slowly remove extraneous
> copies from the nodes and make the data more sparse among the ring members?
>
> Thanks for the hand-holding; it is helping me understand the operational
> landscape quickly.
>
> -R
>
> On Mon, Nov 7, 2011 at 10:18 AM, Stephen Connolly <
> stephen.alan.conno...@gmail.com> wrote:
>
>> Plan for the future
>>
>> At some point your data set will become too big for the node that it
>> is running on, or your load will force you to split nodes once you
>> do that RF < N
>>
>> To solve performance issues with C* the solution is add more nodes
>>
>> To solve storage issues with C* the solution is add more nodes
>>
>> In most cases the solution in C* is add more nodes.
>>
>> Don't assume RF=Number of nodes as a core design decision of your
>> application and you will not have your ass bitten
>>
>> ;-)
>>
>> -Stephen
>> P.S. making the point more extreme to make it clear
>>
>> On 7 November 2011 15:04, Riyad Kalla  wrote:
>> > Stephen,
>> > Excellent breakdown; I appreciate all the detail.
>> > Your last comment about RF being smaller than N (number of nodes) -- in
>> my
>> > particular case my data set isn't particularly large (a few GB) and is
>> > distributed globally across a handful of data centers. What I am
>> utilizing
>> > Cassandra for is the replication in order to minimize latency for
>> requests.
>> > So when a request comes into any location, I want each node in the ring
>> to
>> > contain the full data set so it never needs to defer to another member
>> of
>> > the ring to answer a question (even if this means eventually
>> consistency,
>> > that is alright in my case).
>> > Given that, the way I've understood this discussion so far is I would
>> have a
>> > RF of N (my total node count) but my Consistency Level with all my
>> writes
>> > will *likely* be QUORUM -- I think that is a good/safe default for me
>> to use
>> > as writes aren't the scenario I need to optimize for latency; that being
>> > said, I also don't want to wait for a ConsistencyLevel of ALL to
>> complete
>> > before my code continues though.
>> > Would you agree with this assessment or am I missing the boat on
>> something?
>> > Best,
>> > Riyad
>> >
>> > On Mon, Nov 7, 2011 at 7:42 AM, Stephen Connolly
>> >  wrote:
>> >>
>> >> Consistency Level is a pseudo-enum...
>> >>
>> >> you have the choice between
>> >>
>> >> ONE
>> >> Quorum (and there are different types of this)
>> >> ALL
>> >>
>> >> At CL=ONE, only one node is guaranteed to have got the write if the
>> >> operation is a success.
>> >> At CL=ALL, all nodes that the RF says it should be stored at must
>> >> confirm the write before the operation succeeds, but a partial write
>> >> will succeed eventually if at least one node recorded the write
>> >> At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the
>> >> operation to succeed, otherwise failure, but a partial write will
>> >> succeed eventually if at least one node recorded the write.
>> >>
>> >> Read repair will eventually ensure that the write is replicated across
>> >> all RF nodes in the cluster.
>> >>
>> >&

Re: Will writes with < ALL consistency eventually propagate?

2011-11-07 Thread Stephen Connolly
Plan for the future

At some point your data set will become too big for the node that it
is running on, or your load will force you to split nodes once you
do that RF < N

To solve performance issues with C* the solution is add more nodes

To solve storage issues with C* the solution is add more nodes

In most cases the solution in C* is add more nodes.

Don't assume RF=Number of nodes as a core design decision of your
application and you will not have your ass bitten

;-)

-Stephen
P.S. making the point more extreme to make it clear

On 7 November 2011 15:04, Riyad Kalla  wrote:
> Stephen,
> Excellent breakdown; I appreciate all the detail.
> Your last comment about RF being smaller than N (number of nodes) -- in my
> particular case my data set isn't particularly large (a few GB) and is
> distributed globally across a handful of data centers. What I am utilizing
> Cassandra for is the replication in order to minimize latency for requests.
> So when a request comes into any location, I want each node in the ring to
> contain the full data set so it never needs to defer to another member of
> the ring to answer a question (even if this means eventually consistency,
> that is alright in my case).
> Given that, the way I've understood this discussion so far is I would have a
> RF of N (my total node count) but my Consistency Level with all my writes
> will *likely* be QUORUM -- I think that is a good/safe default for me to use
> as writes aren't the scenario I need to optimize for latency; that being
> said, I also don't want to wait for a ConsistencyLevel of ALL to complete
> before my code continues though.
> Would you agree with this assessment or am I missing the boat on something?
> Best,
> Riyad
>
> On Mon, Nov 7, 2011 at 7:42 AM, Stephen Connolly
>  wrote:
>>
>> Consistency Level is a pseudo-enum...
>>
>> you have the choice between
>>
>> ONE
>> Quorum (and there are different types of this)
>> ALL
>>
>> At CL=ONE, only one node is guaranteed to have got the write if the
>> operation is a success.
>> At CL=ALL, all nodes that the RF says it should be stored at must
>> confirm the write before the operation succeeds, but a partial write
>> will succeed eventually if at least one node recorded the write
>> At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the
>> operation to succeed, otherwise failure, but a partial write will
>> succeed eventually if at least one node recorded the write.
>>
>> Read repair will eventually ensure that the write is replicated across
>> all RF nodes in the cluster.
>>
>> The N in QUORUM above depends on the type of QUORUM you choose, in
>> general think N=RF unless you choose a fancy QUORUM.
>>
>> To have a consistent read, CL of write + CL of read must be > RF...
>>
>> Write at ONE, read at ONE => may not get the most recent write if RF >
>> 1 [fastest write, fastest read] {data loss possible if node lost
>> before read repair}
>> Write at QUORUM, read at ONE => consistent read [moderate write,
>> fastest read] {multiple nodes must be lost for data loss to be
>> possible}
>> Write at ALL, read at ONE => consistent read, writes may be blocked if
>> any node fails [slowest write, fastest read]
>>
>> Write at ONE, read at QUORUM => may not get the most recent write if
>> RF > 2 [fastest write, moderate read]  {data loss possible if node
>> lost before read repair}
>> Write at QUORUM, read at QUORUM => consistent read [moderate write,
>> moderate read] {multiple nodes must be lost for data loss to be
>> possible}
>> Write at ALL, read at QUORUM => consistent read, writes may be blocked
>> if any node fails [slowest write, moderate read]
>>
>> Write at ONE, read at ALL => consistent read, reads may fail if any
>> node fails [fastest write, slowest read] {data loss possible if node
>> lost before read repair}
>> Write at QUORUM, read at ALL => consistent read, reads may fail if any
>> node fails [moderate write, slowest read] {multiple nodes must be lost
>> for data loss to be possible}
>> Write at ALL, read at ALL => consistent read, writes may be blocked if
>> any node fails, reads may fail if any node fails [slowest write,
>> slowest read]
>>
>> Note: You can choose the CL for each and every operation. This is
>> something that you should design into your application (unless you
>> exclusively use QUORUM for all operations, in which case you are
>> advised to bake the logic in, but it is less necessary)
>>
>> The other thing to remember is that RF does not have to equa

Re: Will writes with < ALL consistency eventually propagate?

2011-11-07 Thread Stephen Connolly
Consistency Level is a pseudo-enum...

you have the choice between

ONE
Quorum (and there are different types of this)
ALL

At CL=ONE, only one node is guaranteed to have got the write if the
operation is a success.
At CL=ALL, all nodes that the RF says it should be stored at must
confirm the write before the operation succeeds, but a partial write
will succeed eventually if at least one node recorded the write
At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the
operation to succeed, otherwise failure, but a partial write will
succeed eventually if at least one node recorded the write.

Read repair will eventually ensure that the write is replicated across
all RF nodes in the cluster.

The N in QUORUM above depends on the type of QUORUM you choose, in
general think N=RF unless you choose a fancy QUORUM.

To have a consistent read, CL of write + CL of read must be > RF...

Write at ONE, read at ONE => may not get the most recent write if RF >
1 [fastest write, fastest read] {data loss possible if node lost
before read repair}
Write at QUORUM, read at ONE => consistent read [moderate write,
fastest read] {multiple nodes must be lost for data loss to be
possible}
Write at ALL, read at ONE => consistent read, writes may be blocked if
any node fails [slowest write, fastest read]

Write at ONE, read at QUORUM => may not get the most recent write if
RF > 2 [fastest write, moderate read]  {data loss possible if node
lost before read repair}
Write at QUORUM, read at QUORUM => consistent read [moderate write,
moderate read] {multiple nodes must be lost for data loss to be
possible}
Write at ALL, read at QUORUM => consistent read, writes may be blocked
if any node fails [slowest write, moderate read]

Write at ONE, read at ALL => consistent read, reads may fail if any
node fails [fastest write, slowest read] {data loss possible if node
lost before read repair}
Write at QUORUM, read at ALL => consistent read, reads may fail if any
node fails [moderate write, slowest read] {multiple nodes must be lost
for data loss to be possible}
Write at ALL, read at ALL => consistent read, writes may be blocked if
any node fails, reads may fail if any node fails [slowest write,
slowest read]

Note: You can choose the CL for each and every operation. This is
something that you should design into your application (unless you
exclusively use QUORUM for all operations, in which case you are
advised to bake the logic in, but it is less necessary)

The other thing to remember is that RF does not have to equal the
number of nodes in your cluster... in fact I would recommend designing
your app on the basis that RF < number of nodes in your cluster...
because at some point, when your data set grows big enough, you will
end up with RF < number of nodes.

-Stephen

On 7 November 2011 13:03, Riyad Kalla  wrote:
> Ah! Ok I was interpreting what you were saying to mean that if my RF was too
> high, then the ring would die if I lost one.
> Ultimately what I want (I think) is:
> Replication Factor: 5 (aka "all of my nodes")
> Consistency Level: 2
> Put another way, when I write a value, I want it to exist on two servers *at
> least* before I consider that write "successful" enough for my code to
> continue, but in the background I would like Cassandra to keep copying that
> value around at its leisure until all the ring nodes know about it.
> This sounds like what I need. Thanks for pointing me in the right direction.
> Best,
> Riyad
>
> On Mon, Nov 7, 2011 at 5:47 AM, Anthony Ikeda 
> wrote:
>>
>> Riyad, I'm also just getting to know the different settings and values
>> myself :)
>> I believe, and it also depends on your config, CL.ONE Should ignore the
>> loss of a node if your RF is 5, once you increase the CL then if you lose a
>> node the CL is not met and you will get exceptions returned.
>> Sent from my iPhone
>> On 07/11/2011, at 4:32, Riyad Kalla  wrote:
>>
>> Anthony and Jaydeep, thank you for weighing in. I am glad to see that they
>> are two different values (makes more sense mentally to me).
>> Anthony, what you said caught my attention "to ensure all nodes have a
>> copy you may not be able to survive the loss of a single node." -- why would
>> this be the case?
>> I assumed (incorrectly?) that a node would simply disappear off the map
>> until I could bring it back up again, at which point all the missing values
>> that it didn't get while it was done, it would slowly retrieve from other
>> members of the ring. Is this the wrong understanding?
>> If forcing a replication factor equal to the number of nodes in my ring
>> will cause a hard-stop when one ring goes down (as I understood your comment
>> to mean), it seems to me I should go with a much lower replication factor...
>> something along the lines of 3 or roughly ceiling(N / 2) and just deal with
>> the latency when one of the nodes has to route a request to another server
>> when it doesn't contain the value.
>> Is there a better way to accomplish what I want,

Re: Storing pre-sorted data

2011-10-21 Thread Stephen Connolly
Well you could use a DOUBLE value to encode relative positions...

first item in list gets key 1.0
insert before first item -> key[first]/2.0;
append after last item -> key[last]*2.0;
insert after non-final item -> (key[n]+key[n+1])/2.0

Using double precision should give you quite a space to fit items...

you should be able to cleanly do 2^10 appends, or 2^10 insert first
before hitting the significands... and even then you have 2^51 of
those...

If you start to hit issues like heavy segments, you can re-normalize the row.

-Stephen

On 17 October 2011 10:43, Matthias Pfau  wrote:
> Thanks for that hint! However, it seems like soundex is a very language
> specific algorithm (US English). We have to get into this topic further...
>
> Kind regards
> Matthias
>
> On 10/13/2011 10:43 PM, Stephen Connolly wrote:
>>
>> Then just use a soundex function on the first word in the text... that
>> will shrink it sufficiently and give nice buckets in near sequential
>> order (http://en.wikipedia.org/wiki/Soundex)
>>
>> On 13 October 2011 21:21, Matthias Pfau  wrote:
>>>
>>> Hi Stephen,
>>> we are hashing the first 8 byte (8 US-ASCII characters) of text that has
>>> been written by humans. Wouldn't it be easy for the attacker to do a
>>> dictionary attack on this text, especially if he knows the language of
>>> the
>>> text?
>>>
>>> Kind regards
>>> Matthias
>>>
>>> On 10/13/2011 08:20 PM, Stephen Connolly wrote:
>>>>
>>>> in theory, however they have less than 32 bits of entropy from which
>>>> they can do that, leaving them with at least 32 more bits of
>>>> combinations to try... that's 2 billion or so... must be a big
>>>> dictionary
>>>>
>>>> - Stephen
>>>>
>>>> ---
>>>> Sent from my Android phone, so random spelling mistakes, random nonsense
>>>> words and other nonsense are a direct result of using swype to type on
>>>> the screen
>>>>
>>>> On 13 Oct 2011 17:57, "Matthias Pfau"mailto:p...@l3s.de>>
>>>> wrote:
>>>>
>>>>    Hi Stephen,
>>>>    this sounds very reasonable. But wouldn't this enable an attacker to
>>>>    execute dictionary attacks in order to "decrypt" the first 8 bytes
>>>>    of the plain text?
>>>>
>>>>    Kind regards
>>>>    Matthias
>>>>
>>>>    On 10/13/2011 05:03 PM, Stephen Connolly wrote:
>>>>
>>>>        It wouldn't be unencrypted... which is the point
>>>>
>>>>        you use a one way linear hash function to take the first, say 8
>>>>        bytes,
>>>>        of unencrypted data and turn it into 4 bytes of a sort prefix.
>>>>
>>>>        You've used lost half the data in the process, so effectively
>>>>        each bit
>>>>        is an OR of two bits and you can only infer from 0 values... so
>>>> data
>>>>        is still encrypted, but you have an approximate sorting.
>>>>
>>>>        For example, if your data is US-ASCII text with no numbers, you
>>>>        could
>>>>        use Soundex to get the pre-key, so that worst case you have a
>>>> bucket
>>>>        of values in the range.
>>>>
>>>>        Using this technique, a random get will have to get the values
>>>>        at the
>>>>        desired prefix +/- a small amount rather than the whole row...
>>>>        on the
>>>>        client side you can then decrypt the data and sort that small
>>>> bucket
>>>>        to get the correct index position.
>>>>
>>>>        You could do a 1 byte prefix, but that only gives you at best 256
>>>>        buckets and assumes that the first 2 bytes are uniformly
>>>>        distributed... you've said your data is not uniformly
>>>>        distributed, so
>>>>        a linear hash function sounds like your best bet.
>>>>
>>>>        your hash function should have the property that hash(A)>=
>>>>        hash(B) if
>>>>        and only if A>= B
>>>>
>>>>        On 13 October 2011 08:47, Matthias Pfau>>>        <mailto:p...@l3s.de>>    wrote:
>>>>
>>>>            Hi Stephen,
>>>>            this is a great idea but unfortunately doesn't work for us
>>>

[ANN] Mojo's Cassandra Maven Plugin 1.0.0-1 released

2011-10-18 Thread Stephen Connolly
The Mojo team is pleased to announce the release of Mojo's Cassandra
Maven Plugin version 1.0.0-1.

Mojo's Cassandra Plugin is used when you want to install and control a
test instance of Apache Cassandra from within your Apache Maven build.

The Cassandra Plugin has the following goals.

 * cassandra:start Starts up a test instance of Cassandra in the background.
 * cassandra:stop Stops the test instance of Cassandra that was
started using cassandra:start.
 * cassandra:start-cluster Starts up a test cluster of Cassandra in
the background bound to the local loopback IP addresses 127.0.0.1,
127.0.0.2, etc.
 * cassandra:stop Stops the test cluster of Cassandra that was
started using cassandra:start.
 * cassandra:run Starts up a test instance of Cassandra in the foreground.
 * cassandra:load Runs a cassandra-cli script against the test
instance of Cassandra.
 * cassandra:repair Runs nodetool repair against the test instance of
Cassandra.
 * cassandra:flush Runs nodetool flush against the test instance of Cassandra.
 * cassandra:compact Runs nodetool compact against the test instance
of Cassandra.
 * cassandra:cleanup Runs nodetool cleanup against the test instance
of Cassandra.
 * cassandra:delete Deletes the the test instance of Cassandra.
 * cassandra:cql-exec Execute a CQL statement (directly or from a
file) against the test instance of Cassandra.

http://mojo.codehaus.org/cassandra-maven-plugin/

To use this version, simply specify the version in your project's
plugin configuration:


org.codehaus.mojo
cassandra-maven-plugin
1.0.0-1


Release Notes - Mojo's Cassandra Maven Plugin - Version 1.0.0-1

** Improvement
* [MCASSANDRA-14] - Upgrade to Cassandra 1.0.0

Enjoy,

The Mojo team.

Apache, Apache Maven, Apache Cassandra, Maven and Cassandra are
trademarks of The Apache Software Foundation.


Re: [VOTE] Release Mojo's Cassandra Maven Plugin 1.0.0-1

2011-10-18 Thread Stephen Connolly
Nobody objects, so I will publish the artifacts as Cassandra 1.0.0 is being
released

On 12 October 2011 23:44, Stephen Connolly
wrote:

> Hi,
>
> I'd like to release version 1.0.0-1 of Mojo's Cassandra Maven Plugin
> to sync up with the pending 1.0.0 release of Apache Cassandra.
>
> This version needs to be tested in conjunction with the current
> staging repo for Cassandra 1.0.0
>
> We solved 1 issue:
>
> http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=12121&version=17828
>
> Staging Repository:
> https://nexus.codehaus.org/content/repositories/orgcodehausmojo-010/
>
> Site:
> http://mojo.codehaus.org/cassandra-maven-plugin/index.html
>
> SCM Tag:
> https://svn.codehaus.org/mojo/tags/cassandra-maven-plugin-1.0.0-1@14818
>
>  [ ] +1 Yeah! fire ahead oh and the blind man on the galloping horse
> says it looks fine too.
>  [ ] 0 Mehhh! like I care, I don't have any opinions either, I'd
> follow somebody else if only I could decide who
>  [ ] -1 No! wait up there I have issues (in general like, ya know,
> and being a trouble-maker is only one of them)
>
> The vote is open for until Cassandra 1.0.0 is released and will
> succeed by lazy consensus.
>
> Guide to testing staged releases:
> http://maven.apache.org/guides/development/guide-testing-releases.html
>
> Cheers
>
> -Stephen
>
> P.S.
>  In the interest of ensuring (more is) better testing, this vote is
> also open to subscribers of the dev and user@cassandra.apache.org
> mailing lists
>


Re: Storing pre-sorted data

2011-10-13 Thread Stephen Connolly
Then just use a soundex function on the first word in the text... that
will shrink it sufficiently and give nice buckets in near sequential
order (http://en.wikipedia.org/wiki/Soundex)

On 13 October 2011 21:21, Matthias Pfau  wrote:
> Hi Stephen,
> we are hashing the first 8 byte (8 US-ASCII characters) of text that has
> been written by humans. Wouldn't it be easy for the attacker to do a
> dictionary attack on this text, especially if he knows the language of the
> text?
>
> Kind regards
> Matthias
>
> On 10/13/2011 08:20 PM, Stephen Connolly wrote:
>>
>> in theory, however they have less than 32 bits of entropy from which
>> they can do that, leaving them with at least 32 more bits of
>> combinations to try... that's 2 billion or so... must be a big dictionary
>>
>> - Stephen
>>
>> ---
>> Sent from my Android phone, so random spelling mistakes, random nonsense
>> words and other nonsense are a direct result of using swype to type on
>> the screen
>>
>> On 13 Oct 2011 17:57, "Matthias Pfau" mailto:p...@l3s.de>>
>> wrote:
>>
>>    Hi Stephen,
>>    this sounds very reasonable. But wouldn't this enable an attacker to
>>    execute dictionary attacks in order to "decrypt" the first 8 bytes
>>    of the plain text?
>>
>>    Kind regards
>>    Matthias
>>
>>    On 10/13/2011 05:03 PM, Stephen Connolly wrote:
>>
>>        It wouldn't be unencrypted... which is the point
>>
>>        you use a one way linear hash function to take the first, say 8
>>        bytes,
>>        of unencrypted data and turn it into 4 bytes of a sort prefix.
>>
>>        You've used lost half the data in the process, so effectively
>>        each bit
>>        is an OR of two bits and you can only infer from 0 values... so
>> data
>>        is still encrypted, but you have an approximate sorting.
>>
>>        For example, if your data is US-ASCII text with no numbers, you
>>        could
>>        use Soundex to get the pre-key, so that worst case you have a
>> bucket
>>        of values in the range.
>>
>>        Using this technique, a random get will have to get the values
>>        at the
>>        desired prefix +/- a small amount rather than the whole row...
>>        on the
>>        client side you can then decrypt the data and sort that small
>> bucket
>>        to get the correct index position.
>>
>>        You could do a 1 byte prefix, but that only gives you at best 256
>>        buckets and assumes that the first 2 bytes are uniformly
>>        distributed... you've said your data is not uniformly
>>        distributed, so
>>        a linear hash function sounds like your best bet.
>>
>>        your hash function should have the property that hash(A)>=
>>        hash(B) if
>>        and only if A>= B
>>
>>        On 13 October 2011 08:47, Matthias Pfau>        <mailto:p...@l3s.de>>  wrote:
>>
>>            Hi Stephen,
>>            this is a great idea but unfortunately doesn't work for us
>>            either as we can
>>            not store the data in an unencrypted form.
>>
>>            Kind regards
>>            Matthias
>>
>>            On 10/12/2011 07:42 PM, Stephen Connolly wrote:
>>
>>
>>                could you prefix the data with 3-4 bytes of a linear
>>                hash of the
>>                unencypted data? it wouldn't be a perfect sort, but
>>                you'd have less of a
>>                range to query to get the sorted values?
>>
>>                - Stephen
>>
>>                ---
>>                Sent from my Android phone, so random spelling mistakes,
>>                random nonsense
>>                words and other nonsense are a direct result of using
>>                swype to type on
>>                the screen
>>
>>                On 12 Oct 2011 17:57, "Matthias Pfau">                <mailto:p...@l3s.de><mailto:pfau@__l3s.de
>>                <mailto:p...@l3s.de>>>
>>                wrote:
>>
>>                    Unfortunately, that is not an option as we have to
>>                store the data in
>>                    an compressed and encrypted and therefore binary and
>>                non-sortable form.
>>
>>                    On 10/12/2011 06:39 PM, David McNelis wrote:
>>
>>                        Is it an op

Re: Storing pre-sorted data

2011-10-13 Thread Stephen Connolly
in theory, however they have less than 32 bits of entropy from which they
can do that, leaving them with at least 32 more bits of combinations to
try... that's 2 billion or so... must be a big dictionary

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 13 Oct 2011 17:57, "Matthias Pfau"  wrote:

> Hi Stephen,
> this sounds very reasonable. But wouldn't this enable an attacker to
> execute dictionary attacks in order to "decrypt" the first 8 bytes of the
> plain text?
>
> Kind regards
> Matthias
>
> On 10/13/2011 05:03 PM, Stephen Connolly wrote:
>
>> It wouldn't be unencrypted... which is the point
>>
>> you use a one way linear hash function to take the first, say 8 bytes,
>> of unencrypted data and turn it into 4 bytes of a sort prefix.
>>
>> You've used lost half the data in the process, so effectively each bit
>> is an OR of two bits and you can only infer from 0 values... so data
>> is still encrypted, but you have an approximate sorting.
>>
>> For example, if your data is US-ASCII text with no numbers, you could
>> use Soundex to get the pre-key, so that worst case you have a bucket
>> of values in the range.
>>
>> Using this technique, a random get will have to get the values at the
>> desired prefix +/- a small amount rather than the whole row... on the
>> client side you can then decrypt the data and sort that small bucket
>> to get the correct index position.
>>
>> You could do a 1 byte prefix, but that only gives you at best 256
>> buckets and assumes that the first 2 bytes are uniformly
>> distributed... you've said your data is not uniformly distributed, so
>> a linear hash function sounds like your best bet.
>>
>> your hash function should have the property that hash(A)>= hash(B) if
>> and only if A>= B
>>
>> On 13 October 2011 08:47, Matthias Pfau  wrote:
>>
>>> Hi Stephen,
>>> this is a great idea but unfortunately doesn't work for us either as we
>>> can
>>> not store the data in an unencrypted form.
>>>
>>> Kind regards
>>> Matthias
>>>
>>> On 10/12/2011 07:42 PM, Stephen Connolly wrote:
>>>
>>>>
>>>> could you prefix the data with 3-4 bytes of a linear hash of the
>>>> unencypted data? it wouldn't be a perfect sort, but you'd have less of a
>>>> range to query to get the sorted values?
>>>>
>>>> - Stephen
>>>>
>>>> ---
>>>> Sent from my Android phone, so random spelling mistakes, random nonsense
>>>> words and other nonsense are a direct result of using swype to type on
>>>> the screen
>>>>
>>>> On 12 Oct 2011 17:57, "Matthias 
>>>> Pfau"mailto:pfau@**l3s.de
>>>> >>
>>>> wrote:
>>>>
>>>>Unfortunately, that is not an option as we have to store the data in
>>>>an compressed and encrypted and therefore binary and non-sortable
>>>> form.
>>>>
>>>>On 10/12/2011 06:39 PM, David McNelis wrote:
>>>>
>>>>Is it an option to not convert the data to binary prior to
>>>> inserting
>>>>into Cassandra?  Also, how large are the strings you're sorting?
>>>>  If its
>>>>viable to not convert to binary before writing to Cassandra, and
>>>>you use
>>>>one of the string based column ordering techniques (utf8, ascii,
>>>> for
>>>>example), then the data would be sorted without you  needing to
>>>>specifically worry about that.  Of course, if the strings are
>>>>lengthy
>>>>you could run into  additional issues.
>>>>
>>>>On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfau>>><mailto:p...@l3s.de>
>>>><mailto:p...@l3s.de<mailto:pfa**u...@l3s.de >>>
>>>>  wrote:
>>>>
>>>>Hi there,
>>>>we are currently building a prototype based on cassandra and
>>>>came
>>>>into problems on implementing sorted lists containing
>>>>millions of items.
>>>>
>>>>The special thing about the items of our lists is, that
>>>>cassandra is
>>>>not a

Re: Storing pre-sorted data

2011-10-13 Thread Stephen Connolly
It wouldn't be unencrypted... which is the point

you use a one way linear hash function to take the first, say 8 bytes,
of unencrypted data and turn it into 4 bytes of a sort prefix.

You've used lost half the data in the process, so effectively each bit
is an OR of two bits and you can only infer from 0 values... so data
is still encrypted, but you have an approximate sorting.

For example, if your data is US-ASCII text with no numbers, you could
use Soundex to get the pre-key, so that worst case you have a bucket
of values in the range.

Using this technique, a random get will have to get the values at the
desired prefix +/- a small amount rather than the whole row... on the
client side you can then decrypt the data and sort that small bucket
to get the correct index position.

You could do a 1 byte prefix, but that only gives you at best 256
buckets and assumes that the first 2 bytes are uniformly
distributed... you've said your data is not uniformly distributed, so
a linear hash function sounds like your best bet.

your hash function should have the property that hash(A) >= hash(B) if
and only if A >= B

On 13 October 2011 08:47, Matthias Pfau  wrote:
> Hi Stephen,
> this is a great idea but unfortunately doesn't work for us either as we can
> not store the data in an unencrypted form.
>
> Kind regards
> Matthias
>
> On 10/12/2011 07:42 PM, Stephen Connolly wrote:
>>
>> could you prefix the data with 3-4 bytes of a linear hash of the
>> unencypted data? it wouldn't be a perfect sort, but you'd have less of a
>> range to query to get the sorted values?
>>
>> - Stephen
>>
>> ---
>> Sent from my Android phone, so random spelling mistakes, random nonsense
>> words and other nonsense are a direct result of using swype to type on
>> the screen
>>
>> On 12 Oct 2011 17:57, "Matthias Pfau" mailto:p...@l3s.de>>
>> wrote:
>>
>>    Unfortunately, that is not an option as we have to store the data in
>>    an compressed and encrypted and therefore binary and non-sortable form.
>>
>>    On 10/12/2011 06:39 PM, David McNelis wrote:
>>
>>        Is it an option to not convert the data to binary prior to
>> inserting
>>        into Cassandra?  Also, how large are the strings you're sorting?
>>          If its
>>        viable to not convert to binary before writing to Cassandra, and
>>        you use
>>        one of the string based column ordering techniques (utf8, ascii,
>> for
>>        example), then the data would be sorted without you  needing to
>>        specifically worry about that.  Of course, if the strings are
>>        lengthy
>>        you could run into  additional issues.
>>
>>        On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfau >        <mailto:p...@l3s.de>
>>        <mailto:p...@l3s.de <mailto:p...@l3s.de>>> wrote:
>>
>>            Hi there,
>>            we are currently building a prototype based on cassandra and
>>        came
>>            into problems on implementing sorted lists containing
>>        millions of items.
>>
>>            The special thing about the items of our lists is, that
>>        cassandra is
>>            not able to sort them as the data is stored in a binary
>>        format which
>>            is not sortable. However, we are able to sort the data
>>        before the
>>            plain data gets encoded (our application is responsible for
>>        the order).
>>
>>            First Approach: Storing Lists in ColumnFamilies
>>            ***
>>            We first tried to map the list to a single row of a
>>        ColumnFamily in
>>            a way that the index of the list is mapped to the column
>>        names and
>>            the items of the list to the column values. The column names
>> are
>>            increasing numbers which define the sort order.
>>            This has the major drawback that big parts of the list have
>>        to be
>>            rewritten on inserts (because the column names are numbered
>>        by their
>>            index), which are quite common.
>>
>>
>>            Second Approach: Storing the whole List as Binary Data:
>>            ***
>>            We tried to store the compressed list in a single column.
>>        However,
>>            this is only feasible for smaller lists. Our lists are far
>>        to big
>>            leading to multi megabyte reads and writes. As we need to
>>        read and
>>            update the lists quite often, this would put o

[VOTE] Release Mojo's Cassandra Maven Plugin 1.0.0-1

2011-10-12 Thread Stephen Connolly
Hi,

I'd like to release version 1.0.0-1 of Mojo's Cassandra Maven Plugin
to sync up with the pending 1.0.0 release of Apache Cassandra.

This version needs to be tested in conjunction with the current
staging repo for Cassandra 1.0.0

We solved 1 issue:
http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=12121&version=17828

Staging Repository:
https://nexus.codehaus.org/content/repositories/orgcodehausmojo-010/

Site:
http://mojo.codehaus.org/cassandra-maven-plugin/index.html

SCM Tag:
https://svn.codehaus.org/mojo/tags/cassandra-maven-plugin-1.0.0-1@14818

 [ ] +1 Yeah! fire ahead oh and the blind man on the galloping horse
says it looks fine too.
 [ ] 0 Mehhh! like I care, I don't have any opinions either, I'd
follow somebody else if only I could decide who
 [ ] -1 No! wait up there I have issues (in general like, ya know,
and being a trouble-maker is only one of them)

The vote is open for until Cassandra 1.0.0 is released and will
succeed by lazy consensus.

Guide to testing staged releases:
http://maven.apache.org/guides/development/guide-testing-releases.html

Cheers

-Stephen

P.S.
 In the interest of ensuring (more is) better testing, this vote is
also open to subscribers of the dev and user@cassandra.apache.org
mailing lists


Re: Storing pre-sorted data

2011-10-12 Thread Stephen Connolly
could you prefix the data with 3-4 bytes of a linear hash of the unencypted
data? it wouldn't be a perfect sort, but you'd have less of a range to query
to get the sorted values?

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 12 Oct 2011 17:57, "Matthias Pfau"  wrote:

> Unfortunately, that is not an option as we have to store the data in an
> compressed and encrypted and therefore binary and non-sortable form.
>
> On 10/12/2011 06:39 PM, David McNelis wrote:
>
>> Is it an option to not convert the data to binary prior to inserting
>> into Cassandra?  Also, how large are the strings you're sorting?  If its
>> viable to not convert to binary before writing to Cassandra, and you use
>> one of the string based column ordering techniques (utf8, ascii, for
>> example), then the data would be sorted without you  needing to
>> specifically worry about that.  Of course, if the strings are lengthy
>> you could run into  additional issues.
>>
>> On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfau > > wrote:
>>
>>Hi there,
>>we are currently building a prototype based on cassandra and came
>>into problems on implementing sorted lists containing millions of
>> items.
>>
>>The special thing about the items of our lists is, that cassandra is
>>not able to sort them as the data is stored in a binary format which
>>is not sortable. However, we are able to sort the data before the
>>plain data gets encoded (our application is responsible for the order).
>>
>>First Approach: Storing Lists in ColumnFamilies
>>***
>>We first tried to map the list to a single row of a ColumnFamily in
>>a way that the index of the list is mapped to the column names and
>>the items of the list to the column values. The column names are
>>increasing numbers which define the sort order.
>>This has the major drawback that big parts of the list have to be
>>rewritten on inserts (because the column names are numbered by their
>>index), which are quite common.
>>
>>
>>Second Approach: Storing the whole List as Binary Data:
>>***
>>We tried to store the compressed list in a single column. However,
>>this is only feasible for smaller lists. Our lists are far to big
>>leading to multi megabyte reads and writes. As we need to read and
>>update the lists quite often, this would put our Cassandra cluster
>>under a lot of pressure.
>>
>>Ideal Solution: Native support for storing lists
>>***
>>We would be very happy with a way to store a list of sorted values
>>without making improper use of column names for the list index. This
>>implies that we would need a possibility to insert values at defined
>>positions. We know that this could lead to problems with concurrent
>>inserts in a distributed environment, but this is handled by our
>>application logic.
>>
>>
>>What are your ideas on that?
>>
>>Thanks
>>Matthias
>>
>>
>>
>>
>> --
>> *David McNelis*
>> Lead Software Engineer
>> Agentis Energy
>> www.agentisenergy.com 
>> c: 219.384.5143
>>
>> /A Smart Grid technology company focused on helping consumers of energy
>> control an often under-managed resource./
>>
>>
>>
>


[ANN] Mojo's Cassandra Maven Plugin 0.8.6-1 released

2011-09-26 Thread Stephen Connolly
The Mojo team is pleased to announce the release of Mojo's Cassandra
Maven Plugin version 0.8.6-1.

Mojo's Cassandra Plugin is used when you want to install and control a
test instance of Apache Cassandra from within your Apache Maven build.

The Cassandra Plugin has the following goals.

  * cassandra:start Starts up a test instance of Cassandra in the background.
  * cassandra:stop Stops the test instance of Cassandra that was
started using cassandra:start.
  * cassandra:start-cluster Starts up a test cluster of Cassandra in
the background bound to the local loopback IP addresses 127.0.0.1,
127.0.0.2, etc.
  * cassandra:stop Stops the test cluster of Cassandra that was
started using cassandra:start.
  * cassandra:run Starts up a test instance of Cassandra in the foreground.
  * cassandra:load Runs a cassandra-cli script against the test
instance of Cassandra.
  * cassandra:repair Runs nodetool repair against the test instance of
Cassandra.
  * cassandra:flush Runs nodetool flush against the test instance of Cassandra.
  * cassandra:compact Runs nodetool compact against the test instance
of Cassandra.
  * cassandra:cleanup Runs nodetool cleanup against the test instance
of Cassandra.
  * cassandra:delete Deletes the the test instance of Cassandra.
  * cassandra:cql-exec Execute a CQL statement (directly or from a
file) against the test instance of Cassandra.

http://mojo.codehaus.org/cassandra-maven-plugin/

To use this version, simply specify the version in your project's
plugin configuration:


 org.codehaus.mojo
 cassandra-maven-plugin
 0.8.6-1


Release Notes - Mojo's Cassandra Maven Plugin - Version 0.8.6-1

** Bug
* [MCASSANDRA-12] - cassandra:start fails on Cywin

** Improvement
* [MCASSANDRA-13] - Upgrade to Cassandra 0.8.6

Enjoy,

The Mojo team.

Apache, Apache Maven, Apache Cassandra, Maven and Cassandra are
trademarks of The Apache Software Foundation.


Re: [VOTE] Release Mojo's Cassandra Maven Plugin 0.8.6-1

2011-09-23 Thread Stephen Connolly
This vote has passed:

+1: Me, Colin & Nate
0:
-1:

I will proceed with the release

-Stephen

On 20 September 2011 15:27, Stephen Connolly
 wrote:
> Hi,
>
> I'd like to release version 0.8.6-1 of Mojo's Cassandra Maven Plugin
> to sync up with the recent 0.8.6 release of Apache Cassandra.
>
>
> We solved 2 issues:
> http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=12121&version=17425
>
>
> Staging Repository:
> https://nexus.codehaus.org/content/repositories/orgcodehausmojo-010/
>
> Site:
> http://mojo.codehaus.org/cassandra-maven-plugin/index.html
>
> SCM Tag:
> https://svn.codehaus.org/mojo/tags/cassandra-maven-plugin-0.8.6-1@14748
>
>  [ ] +1 Yeah! fire ahead oh and the blind man on the galloping horse
> says it looks fine too.
>  [ ] 0 Mehhh! like I care, I don't have any opinions either, I'd
> follow somebody else if only I could decide who
>  [ ] -1 No! wait up there I have issues (in general like, ya know,
> and being a trouble-maker is only one of them)
>
> The vote is open for 72 hours and will succeed by lazy consensus.
>
> Cheers
>
> -Stephen
>
> P.S.
>  In the interest of ensuring (more is) better testing, this vote is
> also open to subscribers of the dev and user@cassandra.apache.org
> mailing lists
>


[VOTE] Release Mojo's Cassandra Maven Plugin 0.8.6-1

2011-09-20 Thread Stephen Connolly
Hi,

I'd like to release version 0.8.6-1 of Mojo's Cassandra Maven Plugin
to sync up with the recent 0.8.6 release of Apache Cassandra.


We solved 2 issues:
http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=12121&version=17425


Staging Repository:
https://nexus.codehaus.org/content/repositories/orgcodehausmojo-010/

Site:
http://mojo.codehaus.org/cassandra-maven-plugin/index.html

SCM Tag:
https://svn.codehaus.org/mojo/tags/cassandra-maven-plugin-0.8.6-1@14748

 [ ] +1 Yeah! fire ahead oh and the blind man on the galloping horse
says it looks fine too.
 [ ] 0 Mehhh! like I care, I don't have any opinions either, I'd
follow somebody else if only I could decide who
 [ ] -1 No! wait up there I have issues (in general like, ya know,
and being a trouble-maker is only one of them)

The vote is open for 72 hours and will succeed by lazy consensus.

Cheers

-Stephen

P.S.
  In the interest of ensuring (more is) better testing, this vote is
also open to subscribers of the dev and user@cassandra.apache.org
mailing lists


Re: [RELEASE] Apache Cassandra 0.8.5 released

2011-09-09 Thread Stephen Connolly
On 9 September 2011 16:48, Stephen Connolly
 wrote:
> On 9 September 2011 16:18, Sylvain Lebresne  wrote:
>> On Fri, Sep 9, 2011 at 4:52 PM, Stephen Connolly
>>  wrote:
>>> is the staging repo released at repository.apache.org? or did somebody
>>> forget to finish that step?
>>
>> Nobody forgot that step as can be seen in:
>> https://repository.apache.org/content/repositories/releases/org/apache/cassandra/apache-cassandra/
>>
>
> Hard to tell from a phone over a shoe-string of a network connection.
> Yep looks fine from the apache side... I'll give a peek at the other
> sides

I'm seeing it on central now, so all should be good

>
>> --
>> Sylvain
>>
>>
>>>
>>> - Stephen
>>>
>>> ---
>>> Sent from my Android phone, so random spelling mistakes, random nonsense
>>> words and other nonsense are a direct result of using swype to type on the
>>> screen
>>>
>>> On 9 Sep 2011 04:48, "Roshan Dawrani"  wrote:
>>>> On Thu, Sep 8, 2011 at 8:21 PM, Stephen Connolly <
>>>> stephen.alan.conno...@gmail.com> wrote:
>>>>
>>>>> can take up to 12 hours for the sync to central
>>>>>
>>>>
>>>> Nearly 24 hours now, and 0.8.5 still not available at maven central -
>>>> http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-all :-(
>>>>
>>>> rgds,
>>>> Roshan
>>>
>>
>


Re: [RELEASE] Apache Cassandra 0.8.5 released

2011-09-09 Thread Stephen Connolly
On 9 September 2011 16:18, Sylvain Lebresne  wrote:
> On Fri, Sep 9, 2011 at 4:52 PM, Stephen Connolly
>  wrote:
>> is the staging repo released at repository.apache.org? or did somebody
>> forget to finish that step?
>
> Nobody forgot that step as can be seen in:
> https://repository.apache.org/content/repositories/releases/org/apache/cassandra/apache-cassandra/
>

Hard to tell from a phone over a shoe-string of a network connection.
Yep looks fine from the apache side... I'll give a peek at the other
sides

> --
> Sylvain
>
>
>>
>> - Stephen
>>
>> ---
>> Sent from my Android phone, so random spelling mistakes, random nonsense
>> words and other nonsense are a direct result of using swype to type on the
>> screen
>>
>> On 9 Sep 2011 04:48, "Roshan Dawrani"  wrote:
>>> On Thu, Sep 8, 2011 at 8:21 PM, Stephen Connolly <
>>> stephen.alan.conno...@gmail.com> wrote:
>>>
>>>> can take up to 12 hours for the sync to central
>>>>
>>>
>>> Nearly 24 hours now, and 0.8.5 still not available at maven central -
>>> http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-all :-(
>>>
>>> rgds,
>>> Roshan
>>
>


Re: [RELEASE] Apache Cassandra 0.8.5 released

2011-09-09 Thread Stephen Connolly
is the staging repo released at repository.apache.org? or did somebody
forget to finish that step?

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 9 Sep 2011 04:48, "Roshan Dawrani"  wrote:
> On Thu, Sep 8, 2011 at 8:21 PM, Stephen Connolly <
> stephen.alan.conno...@gmail.com> wrote:
>
>> can take up to 12 hours for the sync to central
>>
>
> Nearly 24 hours now, and 0.8.5 still not available at maven central -
> http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-all :-(
>
> rgds,
> Roshan


Re: [RELEASE] Apache Cassandra 0.8.5 released

2011-09-08 Thread Stephen Connolly
can take up to 12 hours for the sync to central

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 8 Sep 2011 06:40, "Roshan Dawrani"  wrote:
> Hi,
>
> The artefacts at
> http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-all still
> do not reflect 0.8.5.
>
> Does the availability at maven lag behind by much?
>
> Cheers.
>
> On Thu, Sep 8, 2011 at 5:45 PM, Sylvain Lebresne wrote:
>
>> The Cassandra team is pleased to announce the release of Apache Cassandra
>> version 0.8.5.
>>
>> Cassandra is a highly scalable second-generation distributed database,
>> bringing together Dynamo's fully distributed design and Bigtable's
>> ColumnFamily-based data model. You can read more here:
>>
>> http://cassandra.apache.org/
>>
>> Downloads of source and binary distributions are listed in our download
>> section:
>>
>> http://cassandra.apache.org/download/
>>
>> This version is a maintenance/bug fix release[1]. Please pay attention to
>> the
>> release notes[2] before upgrading and let us know[3] if you were to
>> encounter
>> any problem.
>>
>> Have fun!
>>
>>
>> [1]: http://goo.gl/A5YmF (CHANGES.txt)
>> [2]: http://goo.gl/J5Iix (NEWS.txt)
>> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>>
>
>
>
> --
> Roshan
> Blog: http://roshandawrani.wordpress.com/
> Twitter: @roshandawrani 
> Skype: roshandawrani


Re: Not all data structures need timestamps (and don't require wasted memory).

2011-09-03 Thread Stephen Connolly
maybe not all nosql applications fit cassandra.

the whole core logic of how cassandra is eventually consistent is because of
the per column timestamps... if they are a pain for you consider storing eg
as a small number of fat columns rather than many skinny ones... either that
or look at a different database for your use case. ;-)

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 3 Sep 2011 16:01, "Kevin Burton"  wrote:


Re: Unable to repair a node

2011-08-14 Thread Stephen Connolly
oh i know you can run rf 3 on a 3 node cluster. more i thought that if you
have one fail you have less nodes than the rf, so the cluster is at less
than rf, and writes might be disabled or something like that, while at 4 you
still have met the rf...

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 14 Aug 2011 16:08, "Peter Schuller"  wrote:


Re: Unable to repair a node

2011-08-14 Thread Stephen Connolly
i am always wondering why people run clusters with number of nodes == rf

i thought you needed to have number of nodes > rf ti gave any sensible
behaviour... but i am no expert at all

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 14 Aug 2011 11:30, "Philippe"  wrote:
> Hello, I've been fighting with my cluster for a couple days now... Running
> 0.8.1.3, using Hector and loadblancing requests across all nodes.
> My question is : how do I get my node back under control so that it runs
> like the other two nodes.
>
>
> It's a 3 node, RF=3 cluster with reads & writes at LC=QUORUM, I only have
> counter columns inside super columns. There are 6 keyspaces, each has
about
> 10 column families. I'm using the BOP. Before the sequence of events
> described below, I was writing at CL=ALL and reading at CL=ONE. I've
> launched repairs multiple times and they have failed for various reasons,
> one of them being hitting the limit on number of open files. I've raised
it
> to 32768 now. I've probably launched repairs when a repair was already
> running on the node. At some point compactions were throttled to 16MB / s,
> I've removed this limit.
>
> The problem is that one of my nodes is now impossible to repair (no such
> problem with the two others). The load is about 90GB, it should be a
> balanced ring but the other nodes are at 60GB. Each repair basically
> generates thousands of pending compactions of various types (SSTable
build,
> minor, major & validation) : it spikes up to 4000 thousands, levels then
> spikes up to 8000.Previously, I hit linux limits and had to restart the
node
> but it doesn't look like the repairs have been improving anything time
after
> time.
> At the same time,
>
> - the number of SSTables for some keyspaces goes dramatically up (from 3
> or 4 to several dozens).
> - the commit log keeps increasing in size, I'm at 4.3G now, it went up to
> 40G when the compaction was throttled at 16MB/s. On the other nodes it's
> around 1GB at most
> - the data directory is bigger than on the other nodes. I've seen it go
> up to 480GB when the compaction was throttled at 16MB/s
>
>
> Compaction stats:
> pending tasks: 5954
> compaction type keyspace column family bytes compacted
> bytes total progress
> ValidationROLLUP_WIFI_COVERAGEPUBLIC_MONTHLY_17
> 569432689 596621002 95.44%
> MinorROLLUP_CDMA_COVERAGEPUBLIC_MONTHLY_20
> 2751906910 5806164726 47.40%
> ValidationROLLUP_WIFI_COVERAGEPUBLIC_MONTHLY_20
> 2570106876 2776508919 92.57%
> ValidationROLLUP_CDMA_COVERAGEPUBLIC_MONTHLY_19
> 3010471905 6517183774 46.19%
> MinorROLLUP_CDMA_COVERAGEPUBLIC_MONTHLY_15
> 4132 303015882 0.00%
> MinorROLLUP_CDMA_COVERAGEPUBLIC_MONTHLY_18
> 36302803 595278385 6.10%
> MinorROLLUP_CDMA_COVERAGEPUBLIC_MONTHLY_17
> 24671866 70959088 34.77%
> MinorROLLUP_CDMA_COVERAGEPUBLIC_MONTHLY_20
> 15515781 692029872 2.24%
> MinorROLLUP_CDMA_COVERAGEPUBLIC_MONTHLY_20
> 1607953684 6606774662 24.34%
> ValidationROLLUP_WIFI_COVERAGEPUBLIC_MONTHLY_20
> 895043380 2776306015 32.24%
>
> My current lsof count for the cassandra user is
> root@xxx:/logs/cassandra# lsof -u cassandra| wc -l
> 13191
>
> What's even weirder is that currently I have 9 compactions running but CPU
> is throttled at 1/number of cores half the time (while > 80% the rest of
the
> time). Could this be because other repairs are happening in the ring ?
> Exemple (vmstat 2)
> 7 2 0 177632 1596 13868416 0 0 9060 61 5963 5968 40 7 53
> 0
> 7 0 0 165376 1600 13880012 0 0 41422 28 14027 4608 81 17
> 1 0
> 8 0 0 159820 1592 13880036 0 0 26830 22 10161 10398 76 19
> 4 1
> 6 0 0 161792 1592 13882312 0 0 20046 42 7272 4599 81 17 2
> 0
> 2 0 0 164960 1564 13879108 0 0 17404 26559 6172 3638 79 18 2
> 0
> 2 0 0 162344 1564 13867888 0 0 6 0 2014 2150 40 2 58
> 0
> 1 1 0 159864 1572 13867952 0 0 0 41668 958 581 27 0 72
> 1
> 1 0 0 161972 1572 13867952 0 0 0 89 661 443 17 0 82
> 1
> 1 0 0 162128 1572 13867952 0 0 0 20 482 398 17 0 83
> 0
> 2 0 0 162276 1572 13867952 0 0 0 788 485 395 18 0 82
> 0
> 1 0 0 173896 1572 13867952 0 0 0 29 547 461 17 0 83
> 0
> 1 0 0 163052 1572 13867920 0 0 0 0 741 620 18 1 81
> 0
> 1 0 0 162588 1580 13867948 0 0 0 32 523 387 17 0 82
> 0
> 13 0 0 168272 1580 13877140 0 0 12872 269 8056 6725 56 9 34
> 0
> 44 1 0 202536 1612 13835956 0 0 26606 530 7946 3887 79 19 2
> 0
> 48 1 0 406640 1612 13631740 0 0 22006 310 8605 3705 80 18 2
> 0
> 9 1 0 340300 1620 13697560 0 0 19530 103 8101 3984 84 14 1
> 0
> 2 0 0 297768 1620 13738036 0 0 12438 10 4115 2628 57 9 34
> 0
>
> Thanks


Re: CQL injection attacks?

2011-07-01 Thread Stephen Connolly
nate,

that is not relevant. cql is a text query that gets parsed. without
parameters you have to build the query by string concatenation. if i give
you a string which contains a single quote, unless you have written your app
to escape that quote, i can force a corrupted query on you that does
something else. .. cql injection attacks

- Stephen
---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 30 Jun 2011 20:20, "Nate McCall"  wrote:
> The CQL drivers are all still sitting on top of the execute_cql_query
> Thrift API method for now.
>
> On Wed, Jun 29, 2011 at 2:12 PM,  wrote:
>>
>> Someone asked a while ago whether Cassandra was vulnerable to injection
attacks:
>>
>>
http://stackoverflow.com/questions/5998838/nosql-injection-php-phpcassa-cassandra
>>
>> With Thrift, the answer was 'no'.
>>
>> With CQL, presumably the situation is different, at least until prepared
>> statements are possible (CASSANDRA-2475) ?
>>
>> Has there been any discussion on this already that someone could point me
to,
>> please? I couldn't see anything on JIRA (searching for CQL AND injection,
CQL
>> AND security, etc).
>>
>> Thanks.
>>
>> 
>> This message was sent using IMP, the Internet Messaging Program.
>>
>> This email and any attachments to it may be confidential and are
>> intended solely for the use of the individual to whom it is addressed.
>> If you are not the intended recipient of this email, you must neither
>> take any action based upon its contents, nor copy or show it to anyone.
>> Please contact the sender if you believe you have received this email in
>> error. QinetiQ may monitor email traffic data and also the content of
>> email for the purposes of security. QinetiQ Limited (Registered in
>> England & Wales: Company Number: 3796233) Registered office: Cody
Technology
>> Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com
.
>>


Re: Storing Accounting Data

2011-06-21 Thread Stephen Connolly
writes are not atomic.

the first side can succeed at quorum, and the second side can fail
completely... you'll know it failed, but now what... you retry, still
failed... erh I'll store it somewhere and retry it later... where do I store
it?

the consistency level is about tuning whether reads and writes are
replicated/checked across multiple of the replicates... but at any
consistency level, each write will either succeed or fail _independently_

you could have one column family which is kind of like a transaction log,
you write a json object of all the mutations you will make, then you go and
make the mutations, when they succeed you write a completed column to the
transaction log... them you can repeat that as often as need

you could have transactions posted as columns in a row, and to get the
balance you iterate all the columns adding the +'s and -'s

by processing the transaction log, you could establish the highest complete
timestamp, and add summary balance columns being the running total up to
that point, so that you don't have to iterate everything

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 21 Jun 2011 22:04, "AJ"  wrote:


Re: Storing Accounting Data

2011-06-21 Thread Stephen Connolly
how important are things like transactional consistency for you?

would you have issues if only one side of a transfer was recorded?

cassandra, out of the box, on it's own, would not be ideal if the above two
things are important for you.

you can add components to a system to help address these things, eg
zookeeper, etc. a reason why you moght do this is if you already use
cassandra in your app and are trying to limit the number of databases

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 21 Jun 2011 18:30, "AJ"  wrote:


[ANN] Mojo's Cassandra Maven Plugin 0.8.0-1 released

2011-06-20 Thread Stephen Connolly
Hi,

The Mojo team is pleased to announce the release of Mojo's Cassandra
Maven Plugin version 0.8.0-1.

Mojo's Cassandra Plugin is used when you want to install and control a
test instance of Apache Cassandra from within your Apache Maven build.

The plugin has the following goals.

  * cassandra:start Starts up a test instance of Cassandra in the background.
  * cassandra:stop Stops the test instance of Cassandra that was
started using cassandra:start.
  * cassandra:start-cluster Starts up a test cluster of Cassandra in
the background bound to the local loopback IP addresses 127.0.0.1,
127.0.0.2, etc.
  * cassandra:stop Stops the test cluster of Cassandra that was
started using cassandra:start.
  * cassandra:run Starts up a test instance of Cassandra in the foreground.
  * cassandra:load Runs a cassandra-cli script against the test
instance of Cassandra.
  * cassandra:repair Runs nodetool repair against the test instance of
Cassandra.
  * cassandra:flush Runs nodetool flush against the test instance of Cassandra.
  * cassandra:compact Runs nodetool compact against the test instance
of Cassandra.
  * cassandra:cleanup Runs nodetool cleanup against the test instance
of Cassandra.
  * cassandra:delete Deletes the the test instance of Cassandra.

http://mojo.codehaus.org/cassandra-maven-plugin/

To use this version, simply specify the version in your project's
plugin configuration:


     org.codehaus.mojo
     cassandra-maven-plugin
     0.8.0-1



Release Notes - Mojo's Cassandra Maven Plugin - Version 0.8.0-1

** Improvement
* [MCASSANDRA-8] - Document the availability of stopKey/stopPort
configuration parameters (and possibly allow just stopPort to be
defined)

** New Feature
* [MCASSANDRA-9] - Patch to support Cassandra 0.8.0-rc1
* [MCASSANDRA-10] - Add support for local clusters
* [MCASSANDRA-11] - Upgrade to Cassandra 0.8.0

Enjoy,

The Mojo team.

Apache, Apache Maven, Apache Cassandra, Maven and Cassandra are
trademarks of The Apache Software Foundation.


Re: Docs: "Why do deleted keys show up during range scans?"

2011-06-13 Thread Stephen Connolly
On 13 June 2011 17:09, AJ  wrote:
> On 6/13/2011 9:25 AM, Stephen Connolly wrote:
>>
>> On 13 June 2011 16:14, AJ  wrote:
>>>
>>> On 6/13/2011 7:03 AM, Stephen Connolly wrote:
>>>>
>>>> It returns the set of columns for the set of rows... how do you
>>>> determine the difference between a completely empty row and a row that
>>>> just does not have any of the matching columns?
>>>
>>> I would expect it to not return anything (no row at all) for both of
>>> those
>>> cases.  Are you saying that an empty row is returned for rows that do not
>>> match the predicate?  So, if I perform a range slice where the range is
>>> every row of the CF and the slice equates to no matches and I have 1
>>> million
>>> rows in the CF, then I will get a result set of 1 million empty rows?
>>>
>> No I am saying that for each row that matches, you will get a result,
>> even if the columns that you request happen to be empty for that
>> specific row.
>>
>
> Ok, this I understand I guess.  If I query a range of rows and want only a
> certain column and a row does not have that column, I would like to know
> that.

deleted rows don't have the column either which is the point.

>
>> Likewise, any deleted rows in the same row range will show as empty
>> because C* would have a tone of work to figure out the difference
>> between being deleted and being empty.
>>
>
> But, if a row does indeed have the column, but that row was deleted, why
> would I get an empty row?  You say because of a ton of work.  So, the
> tombstone for the row is not stored "close-by" for quick access... or
> something like that?  At any rate, how do I figure out if the empty row is
> empty because it was deleted?  Sorry if I'm being dense.
>

store the query inverted.

that way empty -> deleted

the tombstones are stored for each column that had data IIRC... but at
this point my grok of C* is lacking

>
>


Re: Docs: "Why do deleted keys show up during range scans?"

2011-06-13 Thread Stephen Connolly
On 13 June 2011 16:14, AJ  wrote:
> On 6/13/2011 7:03 AM, Stephen Connolly wrote:
>>
>> It returns the set of columns for the set of rows... how do you
>> determine the difference between a completely empty row and a row that
>> just does not have any of the matching columns?
>
> I would expect it to not return anything (no row at all) for both of those
> cases.  Are you saying that an empty row is returned for rows that do not
> match the predicate?  So, if I perform a range slice where the range is
> every row of the CF and the slice equates to no matches and I have 1 million
> rows in the CF, then I will get a result set of 1 million empty rows?
>
No I am saying that for each row that matches, you will get a result,
even if the columns that you request happen to be empty for that
specific row.

Likewise, any deleted rows in the same row range will show as empty
because C* would have a tone of work to figure out the difference
between being deleted and being empty.


Re: Docs: "Why do deleted keys show up during range scans?"

2011-06-13 Thread Stephen Connolly
It returns the set of columns for the set of rows... how do you
determine the difference between a completely empty row and a row that
just does not have any of the matching columns?

Well the answer is that Cassandra does not go and check whether there
are any columns outside of the range you are querying, so it will just
return the empty (for the column range you specified) row your
code needs to be robust enough to be able to understand that an empty
list of columns does not imply that there are no columns at all for
that row key (i.e. it is deleted and waiting tombstone expiry & gc) or
there is a column outside the range you queried.

On 13 June 2011 13:59, AJ  wrote:
> http://wiki.apache.org/cassandra/FAQ#range_ghosts
>
> "So to special case leaving out result entries for deletions, we would have
> to check the entire rest of the row to make sure there is no undeleted data
> anywhere else either (in which case leaving the key out would be an error)."
>
> The above doesn't read well and I don't get it.  Can anyone rephrase it or
> elaborate?
>
> Thanks!
>


Re: [RELEASE] 0.8.0

2011-06-03 Thread Stephen Connolly
Great work!

-Stephen

P.S.
  As the release of artifacts to Maven Central is now part of the
release process, the artifacts are all available from Maven Central
already (for people who use Maven/ANT+Ivy/Gradle/Buildr/etc)

On 3 June 2011 00:36, Eric Evans  wrote:
>
> I am very pleased to announce the official release of Cassandra 0.8.0.
>
> If you haven't been paying attention to this release, this is your last
> chance, because by this time tomorrow all your friends are going to be
> raving, and you don't want to look silly.
>
> So why am I resorting to hyperbole?  Well, for one because this is the
> release that debuts the Cassandra Query Language (CQL).  In one fell
> swoop Cassandra has become more than NoSQL, it's MoSQL.
>
> Cassandra also has distributed counters now.  With counters, you can
> count stuff, and counting stuff rocks.
>
> A kickass use-case for Cassandra is spanning data-centers for
> fault-tolerance and locality, but doing so has always meant sending data
> in the clear, or tunneling over a VPN.   New for 0.8.0, encryption of
> intranode traffic.
>
> If you're not motivated to go upgrade your clusters right now, you're
> either not easily impressed, or you're very lazy.  If it's the latter,
> would it help knowing that rolling upgrades between releases is now
> supported?  Yeah.  You can upgrade your 0.7 cluster to 0.8 without
> shutting it down.
>
> You see what I mean?  Then go read the release notes[1] to learn about
> the full range of awesomeness, then grab a copy[2] and become a
> (fashionably )early adopter.
>
> Drivers for CQL are available in Python[3], Java[3], and Node.js[4].
>
> As usual, a Debian package is available from the project's APT
> repository[5].
>
> Enjoy!
>
>
> [1]: http://goo.gl/CrJqJ (NEWS.txt)
> [2]: http://cassandra.debian.org/download
> [3]: http://www.apache.org/dist/cassandra/drivers
> [4]: https://github.com/racker/node-cassandra-client
> [5]: http://wiki.apache.org/cassandra/DebianPackaging
>
> --
> Eric Evans
> eev...@rackspace.com
>
>


Re: [RELEASE] Apache Cassandra 0.8.0 rc1

2011-05-17 Thread Stephen Connolly
Eric,

You forgot to release the staging repository on repository.apache.org so
that the artifacts get pushed to Maven Central.

-Stephen

On 17 May 2011 23:15, Eric Evans  wrote:

>
> I am pleased to announce the release of Apache Cassandra 0.8.0 rc1.
>
> The final release is upon us so if you're planning to give us a hand
> with testing, now is the time!
>
> As always, be sure to have a look at the changelog[1] and release
> notes[2]. Report any problems you find[3], and if you have any
> questions, please ask.
>
> Thanks!
>
> [1]: http://goo.gl/Rh3A3 (CHANGES.txt)
> [2]: http://goo.gl/wbXGM (NEWS.txt)
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
> [3]: http://cassandra.apache.org/download
> [4]: http://wiki.apache.org/cassandra/DebianPackaging
>
> --
> Eric Evans
> eev...@rackspace.com
>
>


Re: Running Cassandra across different Amazon EC2 regions

2011-05-07 Thread Stephen Connolly
vpn on ubuntu should be easy if you ask your good friend google... you
should not have to pay for it (but paying might get you a fancy GUI, or
perhaps very optimized performance that could squeeze a few more %)

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 6 May 2011 23:25, "Sameer Farooqui"  wrote:


Re: Ant error in Eclipse when building Cassandra

2011-05-07 Thread Stephen Connolly
if you can give me (an intellij user) enough details to reproduce on my MBP
I'll try and fix it.

things like, download this eclipse distro, add these update centers, set
these env variables, then clicks through this horrible UI as follows...
presto crash!

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 7 May 2011 06:50, "Jonathan Ellis"  wrote:
> Default stack is huge, so maven-ant-tasks-retrieve-build is probably
> recursing infinitely somewhere :(
>
> On Fri, May 6, 2011 at 2:42 PM, Ed Anuff  wrote:
>> I finally got around to getting Eclipse set up to build Cassandra
>> following the directions on the wiki and it seems to be working,
>> Eclipse isn't showing any errors except that when it fires off the
>> automatic ant build I get the following error:
>>
>> maven-ant-tasks-retrieve-build:
>>
>> BUILD FAILED
>> java.lang.StackOverflowError
>>at
org.apache.tools.ant.Project.executeSortedTargets(Project.java:1346)
>>at org.apache.tools.ant.Project.executeTarget(Project.java:1306)
>>at
org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
>>at
org.eclipse.ant.internal.core.ant.EclipseDefaultExecutor.executeTargets(EclipseDefaultExecutor.java:32)
>>at org.apache.tools.ant.Project.executeTargets(Project.java:1189)
>>at
org.eclipse.ant.internal.core.ant.InternalAntRunner.run(InternalAntRunner.java:662)
>>at
org.eclipse.ant.internal.core.ant.InternalAntRunner.run(InternalAntRunner.java:495)
>>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>at java.lang.reflect.Method.invoke(Method.java:597)
>>at org.eclipse.ant.core.AntRunner.run(AntRunner.java:378)
>>at
org.eclipse.ant.internal.launching.launchConfigurations.AntLaunchDelegate.runInSameVM(AntLaunchDelegate.java:321)
>>at
org.eclipse.ant.internal.launching.launchConfigurations.AntLaunchDelegate.launch(AntLaunchDelegate.java:274)
>>at
org.eclipse.debug.internal.core.LaunchConfiguration.launch(LaunchConfiguration.java:853)
>>at
org.eclipse.debug.internal.core.LaunchConfiguration.launch(LaunchConfiguration.java:702)
>>at
org.eclipse.debug.internal.core.LaunchConfiguration.launch(LaunchConfiguration.java:695)
>>at
org.eclipse.core.externaltools.internal.model.ExternalToolBuilder.launchBuild(ExternalToolBuilder.java:181)
>>at
org.eclipse.core.externaltools.internal.model.ExternalToolBuilder.doBuildBasedOnScope(ExternalToolBuilder.java:169)
>>at
org.eclipse.core.externaltools.internal.model.ExternalToolBuilder.build(ExternalToolBuilder.java:88)
>>at
org.eclipse.core.internal.events.BuildManager$2.run(BuildManager.java:629)
>>at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42)
>>at
org.eclipse.core.internal.events.BuildManager.basicBuild(BuildManager.java:172)
>>at
org.eclipse.core.internal.events.BuildManager.basicBuild(BuildManager.java:203)
>>at
org.eclipse.core.internal.events.BuildManager$1.run(BuildManager.java:255)
>>at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42)
>>at
org.eclipse.core.internal.events.BuildManager.basicBuild(BuildManager.java:258)
>>at
org.eclipse.core.internal.events.BuildManager.basicBuildLoop(BuildManager.java:311)
>>at
org.eclipse.core.internal.events.BuildManager.build(BuildManager.java:343)
>>at
org.eclipse.core.internal.events.AutoBuildJob.doBuild(AutoBuildJob.java:144)
>>at
org.eclipse.core.internal.events.AutoBuildJob.run(AutoBuildJob.java:242)
>>at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)
>> Caused by: java.lang.StackOverflowError
>>at
org.apache.tools.ant.PropertyHelper.getPropertyHook(PropertyHelper.java:189)
>>at
org.apache.maven.artifact.ant.POMPropertyHelper.getPropertyHook(POMPropertyHelper.java:50)
>>
>>
>> I never get this error when building from the command line and if I
>> right click on build.xml in Eclipse and select Run As Ant Build it
>> works fine as well.  Any ideas?  This is on a Mac.
>>
>> Ed
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com


Re: Migrating all rows from 0.6.13 to 0.7.5 over thrift?

2011-05-06 Thread Stephen Connolly
maven-shade-plugin could help with having two versions of thrift at the same
time... but you'd need to build some stuff with maven, and some people don't
like that idea

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 6 May 2011 01:11, "aaron morton"  wrote:


Re: Building from source from behind firewall since Maven switch?

2011-05-02 Thread Stephen Connolly
-autoproxy worked for me when I write the original patch

but as I no longer work for the company where I wrote the patch, I don't
have a firewall to deal with

worst case you might have to create a ~/.m2/settings.xml with the proxy
details... if that is the case can you raise a jira in MANTTASKS (which is
at jira.codehaus.org for hysterical reasons)

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 3 May 2011 01:06, "Suan-Aik Yeo"  wrote:


Re: [RELEASE] Apache Cassandra 0.8.0 beta1

2011-04-27 Thread Stephen Connolly
Similar issue with the RPMs from riptano

On 27 April 2011 11:01, Pierre-Yves Ritschard  wrote:
> Thanks Jonathan,
>
> Should I repackage myself or do you think updated Debian packages will
> be made available shortly ?
>
> Regards,
> - pyr
>
> On mar., 2011-04-26 at 11:47 -0500, Jonathan Ellis wrote:
>> https://issues.apache.org/jira/browse/CASSANDRA-2549 is open to fix this
>>
>
>
>


Re: [RELEASE] Apache Cassandra 0.8.0 beta1

2011-04-26 Thread Stephen Connolly
I will be calling the release vote on dev when I get a chance.

In the meantime, the staged artifacts are at
https://repository.apache.org/content/repositories/orgapachecassandra-114/

On 26 April 2011 13:27, Stephen Connolly
 wrote:
> beta versions will be available from releases repo.
>
> You can help validate the poms when I call the release vote.
>
> On 26 April 2011 13:15, Mck  wrote:
>> On Tue, 2011-04-26 at 12:53 +0100, Stephen Connolly wrote:
>>> (or did you want 20million unneeded deps for the
>>> client jars?)
>>
>> Yes that's a good reason :-)
>> If there anything i can help with?
>>
>> Will beta versions be available under releases repository?
>>
>>
>> ~mck
>>
>>
>


Re: [RELEASE] Apache Cassandra 0.8.0 beta1

2011-04-26 Thread Stephen Connolly
beta versions will be available from releases repo.

You can help validate the poms when I call the release vote.

On 26 April 2011 13:15, Mck  wrote:
> On Tue, 2011-04-26 at 12:53 +0100, Stephen Connolly wrote:
>> (or did you want 20million unneeded deps for the
>> client jars?)
>
> Yes that's a good reason :-)
> If there anything i can help with?
>
> Will beta versions be available under releases repository?
>
>
> ~mck
>
>


Re: [RELEASE] Apache Cassandra 0.8.0 beta1

2011-04-26 Thread Stephen Connolly
On 26 April 2011 10:37, Mck  wrote:
> On Fri, 2011-04-22 at 16:49 -0500, Eric Evans wrote:
>> I am pleased to announce the release of Apache Cassandra 0.8.0 beta1.
>
>
> *Truly Awesome!*
>  CQL rocks in so many ways.
>
>
> Is 0.8.0-beta1 available in apache's maven repository?
>  And if not, why not?

Because i'm still validating that I have the poms minimized for the
multiple artifacts (or did you want 20million unneeded deps for the
client jars?)

>
> ~mck
>
>
>
>


Re: consistency ONE and "null"

2011-04-06 Thread Stephen Connolly
also there is a configuration parameter that controls the probability of any
read request triggering a read repair

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 7 Apr 2011 07:35, "Stephen Connolly" 
wrote:
> as I understand, the read repair is a background task triggered by the
read
> request, but once the consistency requirement has been met you will be
given
> a response.
>
> the coordinator at CL.ONE is allowed to return your responce once it has
one
> response (empty or not) from any replica. if the first response is empty,
> you get null
>
> - Stephen
>
> ---
> Sent from my Android phone, so random spelling mistakes, random nonsense
> words and other nonsense are a direct result of using swype to type on the
> screen
> On 7 Apr 2011 00:10, "Jonathan Colby"  wrote:
>>
>> Let's say you have RF of 3 and a write was written to 2 nodes. 1 was not
> written because the node had a network hiccup (but came back online
again).
>>
>> My question is, if you are reading a key with a CL of ONE, and you happen
> to land on that node that didn't get the write, will the read fail
> immediately?
>>
>> Or, would read repair check the other replicas and fetch the correct data
> from the other node(s)?
>>
>> Secondly, is read repair done according to the consistency level, or is
> read repair an independent configuration setting that can be turned
on/off.
>>
>> There was a recent thread about a different variation of my question, but
> went into very technical details, so I didn't want to hijack that thread.


Re: consistency ONE and "null"

2011-04-06 Thread Stephen Connolly
as I understand, the read repair is a background task triggered by the read
request, but once the consistency requirement has been met you will be given
a response.

the coordinator at CL.ONE is allowed to return your responce once it has one
response (empty or not) from any replica. if the first response is empty,
you get null

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 7 Apr 2011 00:10, "Jonathan Colby"  wrote:
>
> Let's say you have RF of 3 and a write was written to 2 nodes. 1 was not
written because the node had a network hiccup (but came back online again).
>
> My question is, if you are reading a key with a CL of ONE, and you happen
to land on that node that didn't get the write, will the read fail
immediately?
>
> Or, would read repair check the other replicas and fetch the correct data
from the other node(s)?
>
> Secondly, is read repair done according to the consistency level, or is
read repair an independent configuration setting that can be turned on/off.
>
> There was a recent thread about a different variation of my question, but
went into very technical details, so I didn't want to hijack that thread.


Re: maven repository

2011-04-04 Thread Stephen Connolly
maven central

- Stephen
---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 5 Apr 2011 06:47, "Mikael Wikblom"  wrote:
> Hi,
>
> is there a maven repository where I can download the latest version of
> cassandra? I've found a few versions at riptano
>
>
http://mvn.riptano.com/content/repositories/riptano/org/apache/cassandra/apache-cassandra/
>
> but note the latest 0.7.4
>
> regards
> Mikael Wikblom
>
> --
> Mikael Wikblom
> Software Architect
> SiteVision AB
> 019-217058
> mikael.wikb...@sitevision.se
> http://www.sitevision.se
>


[ANN] Mojo's Cassandra Maven Plugin 0.7.4-1 released

2011-03-29 Thread Stephen Connolly
Hi,

The Mojo team is pleased to announce the release of Mojo's Cassandra
Maven Plugin version 0.7.4-1.

Mojo's Cassandra Plugin is used when you want to install and control a
test instance of Apache Cassandra from within your Apache Maven build.

The plugin has the following goals.

  * cassandra:start Starts up a test instance of Cassandra in the background.
  * cassandra:stop Stops the test instance of Cassandra that was started
using cassandra:start.
  * cassandra:run Starts up a test instance of Cassandra in the foreground.
  * cassandra:load Runs a cassandra-cli script against the test instance
of Cassandra.
  * cassandra:repair Runs nodetool repair against the test instance of
Cassandra.
  * cassandra:flush Runs nodetool flush against the test instance of Cassandra.
  * cassandra:compact Runs nodetool compact against the test instance of
Cassandra.
  * cassandra:cleanup Runs nodetool cleanup against the test instance of
Cassandra.
  * cassandra:delete Deletes the the test instance of Cassandra.

http://mojo.codehaus.org/cassandra-maven-plugin/

To use this version, simply specify the version in your project's
plugin configuration:


  org.codehaus.mojo
  cassandra-maven-plugin
  0.7.4-1


Release Notes - Mojo's Cassandra Maven Plugin - Version 0.7.4-1

** Improvement
* [MCASSANDRA-6] - Upgrade to Cassandra 0.7.4

Enjoy,

The Mojo team.

Apache, Apache Maven, Apache Cassandra, Maven and Cassandra are
trademarks of The Apache Software Foundation.


Re: Something about cassandra API

2011-03-28 Thread Stephen Connolly
On 28 March 2011 16:33, Eric Evans  wrote:
> On Mon, 2011-03-28 at 14:21 +0100, Stephen Connolly wrote:
>> FYI Avro is in all likelyhood being removed in 0.8
>
> FWIW, Avro is long-gone at this point.

You have the advantage of actually being a Cassandra Dev as opposed to
being a Cassandra Hanger-on like me ;-)
>
> --
> Eric Evans
> eev...@rackspace.com
>
>


Re: Something about cassandra API

2011-03-28 Thread Stephen Connolly
FYI Avro is in all likelyhood being removed in 0.8

2011/3/28 Norman Maurer :
> Hi there,
>
> you would be better of to use a high-level client like hector or pelops.
>
> See:
> http://wiki.apache.org/cassandra/ClientOptions
>
> But to answer your question... If you really want to use something lowlevel
> then Thrift is the way to go...
>
>
> Bye,
> Norman
>
>
>
> 2011/3/28 An Zhuo 
>>
>> HI, I've learned something about Cassandra and find that there are two
>> packages about how to access cassandra: avro and thrift。
>>
>> So how should I choose the suitable way with java, avro or thrift? thank
>> you.
>>
>> 2011-03-28
>> 
>> An Zhuo
>


Re: newbie question: how do I know the total number of rows of a cf?

2011-03-28 Thread Stephen Connolly
for #2 you could pipe through wc -l to get the answer

sort -n keys.txt | uniq | wc -l

but both examples are just refinements of iterate.

#1 is just a distributed iterate
#2 is just an optimized iterate based on knowledge of the on-disk
format (and my give inaccurate results... tombstones...)

On 28 March 2011 14:16, Or Yanay  wrote:
> I use one of two ways to achieve that:
>  1. run a map reduce. Pig is really helpful in these cases. Make sure you run 
> your MR using Hadoop task tracker on your nodes - or your performance will 
> take a hit.
>  2. dump all keys using sstablekeys script from relevant files on all 
> machines and count unique values. I do that using "sort -n  keys.txt |uniq >> 
> unique_keys.txt"
>
> Dumping all keys is much faster but less elegant and can be more annoying if 
> you want do that from your application.
>
> Hope that do the trick for you.
> -Orr
>
> -Original Message-
> From: Joshua Partogi [mailto:joshua.j...@gmail.com]
> Sent: Monday, March 28, 2011 2:39 PM
> To: user@cassandra.apache.org
> Subject: Re: newbie question: how do I know the total number of rows of a cf?
>
> Not all NoSQL is like that. Or perhaps the term NoSQL has became vague
> these days.
>
> On Mon, Mar 28, 2011 at 6:16 PM, Stephen Connolly
>  wrote:
>> iterate.
>>
>> otherwise if that will be too slow and you will do it often, the nosql way
>> is to create a separate column family updated with each row add/delete to
>> hold the answer for you.
>>
>> - Stephen
>>
>> ---
>> Sent from my Android phone, so random spelling mistakes, random nonsense
>> words and other nonsense are a direct result of using swype to type on the
>> screen
>>
>> On 28 Mar 2011 07:40, "Sheng Chen"  wrote:
>>> Hi all,
>>> I want to know how many records I am holding in Cassandra, just like
>>> count(*) in sql.
>>> What can I do ? Thank you.
>>>
>>> Sheng
>>
>
>
>
> --
> http://twitter.com/jpartogi
>


Re: newbie question: how do I know the total number of rows of a cf?

2011-03-28 Thread Stephen Connolly
ok, so not all nosql has column families...

just

s/nosql/cassandra/g

on my previous post ;-)

On 28 March 2011 13:38, Joshua Partogi  wrote:
> Not all NoSQL is like that. Or perhaps the term NoSQL has became vague
> these days.
>
> On Mon, Mar 28, 2011 at 6:16 PM, Stephen Connolly
>  wrote:
>> iterate.
>>
>> otherwise if that will be too slow and you will do it often, the nosql way
>> is to create a separate column family updated with each row add/delete to
>> hold the answer for you.
>>
>> - Stephen
>>
>> ---
>> Sent from my Android phone, so random spelling mistakes, random nonsense
>> words and other nonsense are a direct result of using swype to type on the
>> screen
>>
>> On 28 Mar 2011 07:40, "Sheng Chen"  wrote:
>>> Hi all,
>>> I want to know how many records I am holding in Cassandra, just like
>>> count(*) in sql.
>>> What can I do ? Thank you.
>>>
>>> Sheng
>>
>
>
>
> --
> http://twitter.com/jpartogi
>


Re: newbie question: how do I know the total number of rows of a cf?

2011-03-28 Thread Stephen Connolly
iterate.

otherwise if that will be too slow and you will do it often, the nosql way
is to create a separate column family updated with each row add/delete to
hold the answer for you.

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 28 Mar 2011 07:40, "Sheng Chen"  wrote:
> Hi all,
> I want to know how many records I am holding in Cassandra, just like
> count(*) in sql.
> What can I do ? Thank you.
>
> Sheng


Re: build.xml issue with 0.7.3?

2011-03-09 Thread Stephen Connolly
there is no ivy any more.

drop me a mail with details of your _exact_ ANT version and JDK and i'll see
if I can diagnose your issues

-Stephen

On 9 March 2011 18:51, Paul Choi  wrote:

> Eric,
> Thanks for your response.
>
> This is all I see in /build:
> [paulchoi@build02 build]$ find .
> .
> ./lib
> ./maven-ant-tasks-2.1.1.jar
> [paulchoi@build02 build]$
>
> I just downloaded a new copy of 0.7.3-src and tried manually. I'm still
> running into the same problem.
>
> I tried doing this with 0.7.0, and ant downloads Ivy and Ivy takes care of
> the dependencies. With 0.7.3, maybe Ant doesn't get to the point of
> downloading Ivy? I guess I need to wise up on Ant and Ivy myself.
>
> -Paul
>
> 
> From: Eric Evans [eev...@rackspace.com]
> Sent: Friday, March 04, 2011 7:36 PM
> To: user@cassandra.apache.org
> Subject: Re: build.xml issue with 0.7.3?
>
> On Sat, 2011-03-05 at 01:23 +, Paul Choi wrote:
> > We're running 0.7.0, and we want to upgrade to 0.7.3 ASAP.
> >
> > This worked in our RPM SPEC file with 0.7.0:
> > %build
> > export JAVA_HOME=/usr/java/latest
> > ant clean jar -Drelease=true
> >
> > Now running "ant jar" throws some kind of build.xml error at line 155 -
> typedef is undefined. I'm running ant 1.6.5-2jpp.2 that comes with CentOS
> 5.5. Unfortunately, I'm no Ant expert, so I'm stumped. Does anyone have any
> idea why this is happening?
>
> 0.7.0 pulled down dependencies using Ivy, 0.7.3 uses maven-ant-tasks.
> This error occurred when trying to create the artifact typedef for
> maven-ant-tasks, though I don't know why (it works here).
>
> What do the contents of build/ look like after the error?
>
> > Thanks for your help.
> > BTW, I grabbed the tarball from http://apache.org/dist/cassandra/0.7.3/,
> since the mirrors didn't have it. I hope it was ok to get this one.
> >
> > [paulchoi@build02 apache-cassandra-0.7.3-src]$ ant jar
> > Buildfile: build.xml
> >
> > maven-ant-tasks-download:
> >  [echo] Downloading Maven ANT Tasks...
> > [mkdir] Created dir:
> /home/paulchoi/rpm/SOURCES/apache-cassandra-0.7.3-src/build
> >   [get] Getting:
> http://repo2.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.1/maven-ant-tasks-2.1.1.jar
> >   [get] To:
> /home/paulchoi/rpm/SOURCES/apache-cassandra-0.7.3-src/build/maven-ant-tasks-2.1.1.jar
> >
> > maven-ant-tasks-init:
> > [mkdir] Created dir:
> /home/paulchoi/rpm/SOURCES/apache-cassandra-0.7.3-src/build/lib
> >
> > BUILD FAILED
> > /home/paulchoi/rpm/SOURCES/apache-cassandra-0.7.3-src/build.xml:155:
> name, file or resource attribute of typedef is undefined
> >
> > Total time: 1 second
> > [paulchoi@build02 apache-cassandra-0.7.3-src]$
>
>
> --
> Eric Evans
> eev...@rackspace.com
>


Re: [RELEASE] 0.7.3

2011-03-07 Thread Stephen Connolly
Artifacts should be in the Maven Central Repository by now

-Stephen

On 4 March 2011 21:52, Eric Evans  wrote:

>
> It's only been a couple of weeks since the last release, but a rather
> nasty bug (some details here[1]) has since been fixed, and it seemed
> best to get that out to folks sooner rather than later.
>
> The issue in question is well explained in the release notes[3], but the
> TL;DR is that users of 0.7.1 and 0.7.2 are encouraged to upgrade to
> 0.7.3, and run `nodetool scrub' to address bloom filters that may have
> been incorrectly generated when compacting older SSTables.
>
> There have been a number of other fixes since 0.7.2 as well, so even if
> you're running < 0.7.1, it's still worth the update[5,6].
>
> As always, be sure to read through all of the changes[2] and release
> notes[3]. Report any problems you find[4], and if you have any
> questions, don't hesitate to ask.
>
>
> [1]: https://issues.apache.org/jira/browse/CASSANDRA-2217
> [2]: http://goo.gl/hX02M (CHANGES.txt)
> [3]: http://goo.gl/HXlNH (NEWS.txt)
> [4]: https://issues.apache.org/jira/browse/CASSANDRA
> [5]: http://cassandra.apache.org/download
> [6]: http://wiki.apache.org/cassandra/DebianPackaging
>
> --
> Eric Evans
> eev...@rackspace.com
>
>
>
>


[ANN] Mojo's Cassandra Maven Plugin 0.7.2-1 released

2011-02-21 Thread Stephen Connolly
Hi,

The Mojo team is pleased to announce the release of Mojo's Cassandra
Maven Plugin version 0.7.2-1.

Mojo's Cassandra Plugin is used when you want to install and control a
test instance of Apache Cassandra from within your Apache Maven build.

The plugin has the following goals.

  * cassandra:start Starts up a test instance of Cassandra in the
background.
  * cassandra:stop Stops the test instance of Cassandra that was started
using cassandra:start.
  * cassandra:run Starts up a test instance of Cassandra in the foreground.
  * cassandra:load Runs a cassandra-cli script against the test instance
of Cassandra.
  * cassandra:repair Runs nodetool repair against the test instance of
Cassandra.
  * cassandra:flush Runs nodetool flush against the test instance of
Cassandra.
  * cassandra:compact Runs nodetool compact against the test instance of
Cassandra.
  * cassandra:cleanup Runs nodetool cleanup against the test instance of
Cassandra.
  * cassandra:delete Deletes the the test instance of Cassandra.

http://mojo.codehaus.org/cassandra-maven-plugin/

To use this version, simply specify the version in your project's
plugin configuration:


  org.codehaus.mojo
  cassandra-maven-plugin
  0.7.2-1


Release Notes - Mojo's Cassandra Maven Plugin - Version 0.7.2-1

** Improvement
* [MCASSANDRA-2] - Upgrade to Cassandra 0.7.1
* [MCASSANDRA-5] - Upgrade to Cassandra 0.7.2

Enjoy,

The Mojo team.

Stephen Connolly

Apache, Apache Maven, Apache Cassandra, Maven and Cassandra are
trademarks of The Apache Software Foundation.


Re: [RELEASE] 0.7.2

2011-02-17 Thread Stephen Connolly
On 17 February 2011 00:56, Eric Evans  wrote:
>
> CASSANDRA-2165[1] became evident almost as soon as 0.7.1 released, and
> it's ugly enough that we didn't want to wait.
>
> Be sure you've read the changelog[2] and release notes[3], and let us
> know[4] if you encounter any problems.
>
> Thanks!
>
>
> [1]: https://issues.apache.org/jira/browse/CASSANDRA-2165
> [2]: http://goo.gl/iI7U2 (CHANGES.txt)
> [3]: http://goo.gl/b2dCq (NEWS.txt)
> [4]: https://issues.apache.org/jira/browse/CASSANDRA
>
> --
> Eric Evans
> eev...@rackspace.com
>
>

The 0.7.2 release has been pushed to the Maven Central Repository and
will be available within the next 4 hours

-Stephen


[ANN] Mojo's Cassandra Maven Plugin 0.7.0-1 released

2011-02-12 Thread Stephen Connolly
Hi,

The Mojo team is pleased to announce the release of Mojo's Cassandra
Maven Plugin version 0.7.0-1.

Mojo's Cassandra Plugin is used when you want to install and control a
test instance of Apache Cassandra from within your Apache Maven build.

The plugin has the following goals.

* cassandra:start Starts up a test instance of Cassandra in the background.
* cassandra:stop Stops the test instance of Cassandra that was started
using cassandra:start.
* cassandra:run Starts up a test instance of Cassandra in the foreground.
* cassandra:load Runs a cassandra-cli script against the test instance
of Cassandra.
* cassandra:repair Runs nodetool repair against the test instance of Cassandra.
* cassandra:flush Runs nodetool flush against the test instance of Cassandra.
* cassandra:compact Runs nodetool compact against the test instance of
Cassandra.
* cassandra:cleanup Runs nodetool cleanup against the test instance of
Cassandra.
* cassandra:delete Deletes the the test instance of Cassandra.

http://mojo.codehaus.org/cassandra-maven-plugin/

To use this version, simply specify the version in your project's
plugin configuration:


   org.codehaus.mojo
   cassandra-maven-plugin
   0.7.0-1


Release Notes

This is the first release of Mojo's Cassandra Maven Plugin

Enjoy,

The Mojo team.

Stephen Connolly

Apache, Apache Maven, Apache Cassandra, Maven and Cassandra are
trademarks of The Apache Software Foundation.


Re: Anyone want to help out with http://wiki.apache.org/cassandra/MavenPlugin

2011-02-09 Thread Stephen Connolly
oh you might have to check out and install mojo-sandbox-parent (a sibling
svn url) sandbox projects are not allowed to deploy releases... the vote on
dev@mojo will promote from sandbox and release in one vote 32 h to go

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 9 Feb 2011 16:35, "Nate McCall"  wrote:
> Stephen,
> I get an error regarding a non-resolvable parent pom. Is there any
> local additional local configuration or parameters that should be
> passed with the install phase?
>
> I'd be happy to look at this over the next several days as it would
> make the Hector integration testing setup and tear down much easier.
>
> -Nate
>
> On Wed, Feb 9, 2011 at 5:41 AM, Stephen Connolly
>  wrote:
>> Until the release vote passes at mojo, you will need to do the
>> following to follow the example:
>>
>> svn co https://svn.codehaus.org/mojo/trunk/sandbox/cassandra-maven-plugin
>> cd cassandra-maven-plugin
>> mvn install
>> cd ..
>>
>> Otherwise the example should be fine.
>>
>> It's a wiki page, so I'm hoping that people can make the example a bit
>> better... specifically some hector people might be able to put in
>> actual example code for accessing cassandra from the index.jsp.
>>
>> -Stephen
>>


Anyone want to help out with http://wiki.apache.org/cassandra/MavenPlugin

2011-02-09 Thread Stephen Connolly
Until the release vote passes at mojo, you will need to do the
following to follow the example:

svn co https://svn.codehaus.org/mojo/trunk/sandbox/cassandra-maven-plugin
cd cassandra-maven-plugin
mvn install
cd ..

Otherwise the example should be fine.

It's a wiki page, so I'm hoping that people can make the example a bit
better... specifically some hector people might be able to put in
actual example code for accessing cassandra from the index.jsp.

-Stephen


Re: cassandra-cli (output) broken for super columns

2011-02-08 Thread Stephen Connolly
On 8 February 2011 10:38, Timo Nentwig  wrote:
> This is not what it's supposed to be like, is it?
>
> [default@foo] get foo[page-field];
> => (super_column=20110208,
>     (column=82f4c650-2d53-11e0-a08b-58b035f3f60d, value=msg1, 
> timestamp=1297159430471000)
>     (column=82f4c650-2d53-11e0-a08b-58b035f3f60e, value=msg2, 
> timestamp=1297159437423000)
>     (column=82f4c650-2d53-11e0-a08b-58b035f3f60f, value=msg3, 
> timestamp=1297159439855000))
> Returned 1 results.
>
> [default@foo] get foo[page-field][20110208];
> , value=msg1, timestamp=1297159430471000)
> => (column=???P-S???X?5??, value=msg2, timestamp=1297159437423000)
> => (column=???P-S???X?5??, value=msg3, timestamp=1297159439855000)
> Returned 3 results.
>
> [default@foo] get 
> foo[page-field][20110208][82f4c650-2d53-11e0-a08b-58b035f3f60d];
> , value=msg1, timestamp=1297159430471000)
>
> [default@foo] get 
> foo[page-field][20110208][82f4c650-2d53-11e0-a08b-58b035f3f60e];
> => (column=???P-S???X?5??, value=msg2, timestamp=1297159437423000)
>
>
>        - name: foo
>          column_type: Super
>          compare_with: AsciiType
>          compare_subcolumns_with: TimeUUIDType
>          default_validation_class: AsciiType

Is it the ?'s that you are complaining about or is it something else?

If it is the ?'s have you got a mismatch between the character
encoding in your shell and UTF-8?


Re: Best Approaches for Developer Integration

2011-02-08 Thread Stephen Connolly
On 8 February 2011 06:40, Paul Brown  wrote:
>
> On Feb 7, 2011, at 10:28 PM, Paul Querna wrote:
>> So, I guess this is coming down to:
>>  1) Has anyone built any easy to install packages of Cassandra?
>
> I didn't find it necessary.  I implemented a simple embedding wrapper for 
> Cassandra so that it could be started as part of a web application lifecycle 
> (Spring-managed).  Developers just start up a personal copy of the service as 
> part of "mvn -Pembedded-cassandra jetty:run" being none the wiser about the 
> Cassandra underneath unless they care to be.
>

Mojo's Cassandra Maven Plugin makes this even easier

mvn cassandra:start jetty:run

If you don't want to modify your pom.xml to switch either jetty or
cassandra off of port 8080 then you'll end up with

mvn cassandra:start jetty:run -Dcassandra.jmxPort=_

Note: the plugin has not been released yet... 48hr left on the release vote

>>  2) Can anyone explain their experience with getting non-Cassandra
>> developers up and running in a development environment? What worked?
>> What was hard?
>
> I've had technically savvy non-developer resources perfectly happy to work 
> with the system via Ruby, PHP, or even the Cassandra CLI.  "Just do mvn 
> -Pembedded-cassandra jetty:run" was too much in that case, but "Here are some 
> useful libraries and here's where the prod/staging clusters are" was fine.
>
> -- Paul B


Re: row keys

2011-02-05 Thread Stephen Connolly
you really need to know how you will be pulling the data back out again. you
could use the object id as the row key, timestamp as the column name and
long/lat as the value... that would allow you to query by object is and get
the time sorted location trace... but if you have a lot of frequent readings
for each object, that would be a poor model because very large rows can
impact performance... in that case you might use the object id combined with
the timestamp rounded to the nearest hour (say) to keep the row size
lower...

but if you are more interested in tracking multiple objects per time, you
might use the timestamp as row key, object id as column name, etc...

with cassandra you need to know what queries you will want to make and
design for that

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 5 Feb 2011 18:17, "Sean Ochoa"  wrote:


Re: cassandra 0.6.11 binary package problem

2011-02-04 Thread Stephen Connolly
That's because of an issue I found in the ANT scripts while doing the
maven-ant-tasks switch on 0.7.0.

Any jar in build will be bundled... (so ivy goes into the bin dist...
when I did the m-a-t version eric was wondering why i was including
m-a-t in the bin dist, and I said I was being symmetric with the ivy
version... he said it was a failed experiment that had been left
in...)

For 0.7.x there should just be the one jar.

For the 0.6.x dists if you have forgotten to run ant realclean, then
there could be earlier versions present

-Stephen

On 3 February 2011 14:36, Jonathan Ellis  wrote:
> Well, that's odd. :)
>
> Do any of the other tar.gz balls contain multiple jars?
>
> On Thu, Feb 3, 2011 at 6:06 AM, Jean-Yves LEBLEU  wrote:
>> Hi all,
>>
>> Just for info, in apache-cassandra-0.6.11-bin.tar.gz there are both
>> apache-cassandra-0.6.10.jar  and apache-cassandra-0.6.11.jar in the
>> lib directory.
>>
>> Causing troubles to my upgrade scripts which use this file to get
>> installed version and check if upgrade needed . :(
>>
>> Thanks for the good job.
>> Jean-Yves
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: How do I get 0.7.1?

2011-02-02 Thread Stephen Connolly
the take #2 vote was canceled due to a couple of issues... take #3 had not
been called yet

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 2 Feb 2011 23:29, "Sal Fuentes"  wrote:


Re: Can a same key exists for two rows in two different column families without clashing ?

2011-02-02 Thread Stephen Connolly
On 2 February 2011 10:03, Ertio Lew  wrote:
> Can a same key exists for two rows in two different column families without
> clashing ?  Other words, does the same algorithm needs to enforced for
> generating keys for different column families or can different
> algorithms(for generating keys) be enforced on column family basis?
>
> I have tried out that they can, but I wanted to know if there may be any
> problems associated with this.
>
> Thanks.
> Ertio Lew
>

it is a bad analogy for many reasons but if you replace "row key" with
"primary key" and "column family" with "table" then you might get an
answer.

a better analogy is to think of the following.

public class Keyspace {

  public final Map> columnFamily1;

  public final Map> columnFamily2;

  public final Map>> superColumnFamily3;

}

(still not quite correct, but mostly so for our purposes);

you are asking given

Keyspace keyspace;
String key1 = makeKeyAlg1();
keyspace.columnFamily1.put(key1,...);

String key2 = makeKeyAlg2();
keyspace.columnFamily2.put(key2,...);

when key1.equals(key2)

then is there a problem?

They are two separate maps... why would there be.

-Stephen


Re: Upgrading from 0.6 to 0.7.0

2011-01-21 Thread Stephen Connolly
the maven shade plugin might be able to help somewhat... if I get some spare
cycles I'll have a look at knocking up a thrift proxy that either makes 0.7
appear as 0.6 or vice versa

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 21 Jan 2011 22:00, "Anthony Molinaro" 
wrote:


Re: Upgrading from 0.6 to 0.7.0

2011-01-19 Thread Stephen Connolly
an alternative might be a thrift proxy service... mapping the old thrift api
onto the new.

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 20 Jan 2011 05:11, "Jonathan Ellis"  wrote:


Re: should "nodetool repair " run periodic to keep consistency?

2011-01-19 Thread Stephen Connolly
On 19 January 2011 12:15, Donal Zang  wrote:
> Just to ensure.
> So this should be done manually by the cluster operators?

you could use crontab to automate it according to a schedule

>
> Thanks!
>
> --
>
>
>
>


Re: Multi-tenancy, and authentication and authorization

2011-01-18 Thread Stephen Connolly
I would imagine it to be somewhat easy to implement this via a thrift
wrapper so that each tenant is connecting to the proxy thrift server that
masks the fact that there are multiple tenants... or is that how people are
thinking about this

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 18 Jan 2011 21:20, "Aaron Morton"  wrote:
> As everyone says, it's not issues with the Keyspace directly as they are
just a container. It's the CF's in the keyspace, but let's just say keyspace
cause it's easier.
>
> As things stand, if you allow point and click creation for keyspaces you
will hand over control of the memory requirements to the users. This will be
a bad thing. E.g. Lots of cf's will get created and you will run out of
memory, or cf's will get created with huge Memtable settings and you will
run out of memory, or caches will get set huge and you get the picture. One
badly behaving keyspace or column family can take down a node / cluster.
>
> IMHO currently the best way to share a Cassandra cluster is through some
sort of application layer that uses as static keyspace. Others have a better
understanding of the internals and may have ideas about how this could
change in the future.
>
> Aaron
>
> On 19/01/2011, at 9:07 AM, Ed Anuff  wrote:
>
>> Hi Jeremy, thanks, I was really coming at it from the question of whether
keyspaces were a functional basis for multitenancy in Cassandra. I think the
MT issues discussed on the wiki page are the , but I'd like to get a better
understanding of the core issue of keyspaces and then try to get that onto
the page as maybe the first section.
>>
>> Ed
>>
>> On Tue, Jan 18, 2011 at 11:42 AM, Jeremy Hanna <
jeremy.hanna1...@gmail.com> wrote:
>> Feel free to use that wiki page or another wiki page to collaborate on
more pressing multi tenant issues. The wiki is editable by all. The
MultiTenant page was meant as a launching point for tracking progress on
things we could think of wrt MT.
>>
>> Obviously the memtable problem is the largest concern at this point. If
you have any ideas wrt that and want to collaborate on how to address that,
perhaps even in a way that would get accepted in core cassandra, feel free
to propose solutions in a jira ticket or on the list.
>>
>> A caveat to getting things into core cassandra - make sure anything you
do is considerate of single-tenant cassandra. If possible, make things
pluggable and optional. The round robin request scheduler is an example. The
functionality is there but you have to enable it. If it can't be made
pluggable/optional, you can get good feedback from the community about
proposed solutions in core Cassandra (like for the memtable issue in
particular).
>>
>> Anyway, just wanted to chime in with 2 cents about that page (since I
created it and was helping maintain it before getting pulled off onto other
projects).
>>
>> On Jan 18, 2011, at 1:12 PM, Ed Anuff wrote:
>>
>> > Hi Indika, I've done a lot of work using the keyspace per tenant model,
and I'm seeing big problems with the memory consumption, even though it's
certainly the most clean way to implement it. Luckily, before I used the
keyspace per tenant approach, I'd implemented my system using a single
keyspace approach and can still revert back to that. The rest of the stuff
for multi-tenancy on the wiki is largely irrelevant, but the keyspace issue
is a big concern at the moment.
>> >
>> > Ed
>> >
>> > On Tue, Jan 18, 2011 at 9:40 AM, indika kumara 
wrote:
>> > Hi Aaron,
>> >
>> > I read some articles about the Cassandra, and now understand a little
bit about trade-offs.
>> >
>> > I feel the goal should be to optimize memory as well as performance. I
have to consider the number of column families, the columns per a family,
the number of rows, the memtable’s threshold, and so on. I also have to
consider how to maximize resource sharing among tenants. However, I feel
that a keyspace should be able to be configured based on the tenant’s class
(e.g replication factor). As per some resources, I feel that the issue is
not in the number of keyspaces, but with the number of CF, the number of the
rows in a CF, the numbers of columns, the size of the data in a column, and
so on. Am I correct? I appreciate your opinion.
>> >
>> > What would be the suitable approach? A keyspace per tenant (there would
be a limit on the tenants per a Cassandra cluster) or a keyspace for all
tenant.
>> >
>> > I still would love to expose the Cassandra ‘as-is’ to a tenant
virtually yet with acceptable memory consumption and performance.
>> >
>> > Thanks,
>> >
>> > Indika
>> >
>> >
>>
>>


Re: quorum calculation seems to depend on previous selected nodes

2011-01-18 Thread Stephen Connolly
On 18 January 2011 07:15, Samuel Benz  wrote:
> On 01/17/2011 09:28 PM, Jonathan Ellis wrote:
>> On Mon, Jan 17, 2011 at 2:10 PM, Samuel Benz  wrote:
> Case1:
> If 'TEST' was previous stored on Node1, Node2, Node3 -> The update will
> succeed.
>
> Case2:
> If 'TEST' was previous stored on Node2, Node3, Node4 -> The update will
> not work.

 If you have RF=2 then it will be stored on 2 nodes, not 3.  I think
 this is the source of the confusion.

>>>
>>> I checked the existence of the row on the different serverver with
>>> sstablekeys after flushing. So I saw three copies of every key in the
>>> cluster.
>>
>> If you want to be guaranteed to be able to read with two nodes down
>> and RF=3, you have to read at CL.ONE, since if the two nodes that are
>> down are replicas of the data you are reading (as in the 2nd case
>> here) Cassandra will be unable to achieve quorum (quorum of 3 is 2
>> live nodes).
>>
>
> Now it seems clear to me. Thanks!
>
> I was confused by the fact that: "live nodes" != "replica live nodes"
>
> Correct me if I'm wrong, but even in a cluster with 1000 nodes and RF=3,
> if I shut down the wrong two nodes, i have the same problem as in my
> mini cluster.

Correct

>
>
> --
> Sam
>


Re: Super CF or two CFs?

2011-01-17 Thread Stephen Connolly
On 17 January 2011 22:36, Steven Mac  wrote:
> Sure, consider stock data, where the stock symbol is the row key. The stock
> data consists of a rather stable part and a very volatile part, both of
> which would be a super column. The stable super column would contain
> subcolumns such as company name, address, and some annual or quarterly data.
> The volatile super column would contain periodic stock data, such as current
> price, last trade times, volumes, buyers, sellers, etc.
>
> The volatile super columns would be updated every few minutes, many rows at
> once using a single batch_mutate. The data would be read using a get on a
> single row key, returning both supercolumns and all subcolumns.
>
> The data could also be split over two column families, one for the stable
> part and one for the volatile part. The updates would be the same, while a
> read would require two get operations.

I'm not seeing why you need to use supercolumns for this at all.

Standard columns would seem just fine in this case (as long as you
have good naming for your columns)

And you probably only need one column family... but people more expert
than me could advise better...

I guess the question I have is why you feel the solution should
involve supercolumns

-Stephen

>
> Regards, Steven.
>
> 
> Date: Mon, 17 Jan 2011 12:20:46 -0800
> Subject: Re: Super CF or two CFs?
> From: davevi...@gmail.com
> To: user@cassandra.apache.org
>
> can you give an example of the data and how you'd access it?
> what would your expected columns (and/or supercolumns) be?
>
> Dave Viner
> On Mon, Jan 17, 2011 at 11:05 AM, Steven Mac  wrote:
>
> How can I best map an object containing two maps, one of which is updated
> very frequently and the other only occasionally?
>
> a) As one super CF, which each map in a separate supercolumn and the map
> entries being the subcolumns?
> b) As two CFs, one for each map.
>
> I'd like to discuss the why behind a choice, in order to learn about the
> impact of a design choice on performance, SStable size/disk usage,
> compactions, etc.
>
> Regards, Steven.
>
> PS: Objects will always be read as a whole.
>


Re: Cassandra-Maven-Plugin

2011-01-17 Thread Stephen Connolly
https://issues.apache.org/jira/browse/CASSANDRA-1997

On 16 January 2011 19:59, Stephen Connolly
 wrote:
> it will be an attachment to an as yet un raised jira. look out for it
> tomorrow/tuesday
>
> - Stephen
>
> ---
> Sent from my Android phone, so random spelling mistakes, random nonsense
> words and other nonsense are a direct result of using swype to type on the
> screen
>
> On 16 Jan 2011 17:52, "Hellmut Adolphs"  wrote:
>


Re: Cassandra-Maven-Plugin

2011-01-16 Thread Stephen Connolly
it will be an attachment to an as yet un raised jira. look out for it
tomorrow/tuesday

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 16 Jan 2011 17:52, "Hellmut Adolphs"  wrote:


Cassandra-Maven-Plugin

2011-01-14 Thread Stephen Connolly
OK,

I nearly have the Cassandra-Maven-Plugin ready.

It has the following goals:
  run: launches Cassandra in the foreground and blocks until you press
^C at which point Maven terminates. Use-case: Running integration
tests from your IDE. Live development from your IDE.

  start: launches Cassandra in the background. Cassandra will be torn
down when Maven ends or if the stop goal is called. Use-case: Running
integration tests from Maven. Live development from your IDE with e.g.
jetty

  clean: Clears out the Cassandra database directory in
${basedir}/target/cassandra. Use-case: Resetting the dataset.

  load: Runs the cassandra-cli with a file as input.  Use-case:
Creating Keyspaces & pre-populating the dataset

  stop: Shuts down the background Cassandra instance started by start.
Use-case: Running integration tests from Maven.

So for example, if you are developing a web application using Maven
you would use a command like:

mvn cassandra:clean cassandra:start cassandra:load jetty:run

which would start up cassandra with a clean dataset and then start up
jetty (which presumably connects via a client library to cassandra).

Similarly, you can use cassandra-maven-plugin, jetty-maven-plugin,
maven-failsafe-plugin and selenium-maven-plugin to run web integration
tests as part of your build.

So I have some questions:

1. Is there a standard file extension for the scripts that get passed
to cassandra-cli?

2. Is there any other obvious goal I have missed out on?

There is a small bit of tidy-up left and then I just have to add some
integration tests and the site documentation.  Once I have all that in
place I will raise a JIRA with the full source code against CASSANDRA
and hopefully a friendly committer will pick it up and commit it into
the tree. While waiting for a committer testers will be welcome.

If it gets accepted I will then see about getting it released and
published on central.

Expect to see the JIRA sometime Monday or Tuesday.

-Stephen


Re: how to do a get_range_slices where all keys start with same string

2011-01-12 Thread Stephen Connolly
or set the end key to "com.googlf"

On 12 January 2011 02:49, Aaron Morton  wrote:

> If you were using OPP and get_range_slices then set the start_key to be
> "com.google" and the end_key to be "". Get is slices of say 1,000 (use the
> last key read as the next start_ket) and when you see the first key that
> does not start with com.google top making calls.
>
> If you move the data from rows to columns, you can use the same approach.
>
> Aaron
>
>
> On 12 Jan, 2011,at 03:25 PM, Roshan Dawrani 
> wrote:
>
> On Wed, Jan 12, 2011 at 7:41 AM, Koert Kuipers <
> koert.kuip...@diamondnotch.com> wrote:
>
>> Ok I see get_range_slice is really only useful for paging with RP..
>>
>> So if I were using OPP (which I am not) and I wanted all keys starting
>> with "com.google", what should my start_key and end_key be?
>>
>
> I think you can't. It's the columns that are sorted, and not the rows (if u
> r not using OPP). With your "com.google." data arranged in columns
> instead of rows, you should be able to specify start_col, end_col to filter
> it.
>
>
>
>


Re: Cassandra 0.7.0 Release in Riptano public repository?

2011-01-10 Thread Stephen Connolly
On 10 January 2011 19:44, Oleg Tsvinev  wrote:
> Hi,
> http://cassandra.apache.org/download/ shows that
> "The latest stable release of Apache Cassandra is 0.7.0 (released on
> 2011-01-09). If you're just starting out, download this one."
> However, I don't see this version in  Riptano public repository. The latest
> there is still  0.7.0-rc4.
> Is there any chance to get it from there?
> Thank you,
>   Oleg
>

Just on the off-chance that you are talking about the riptano public
maven repository.  The cassandra artifacts from 0.7.0-rc4 are
available from the maven central repository.

-Stephen


Re: maven cassandra plugin

2011-01-06 Thread Stephen Connolly
already doing that

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 6 Jan 2011 21:09, "Ran Tavory"  wrote:


Re: maven cassandra plugin

2011-01-06 Thread Stephen Connolly
fine by me

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 6 Jan 2011 21:40, "Jonathan Ellis"  wrote:
> We're planning to clean out contrib:
> https://issues.apache.org/jira/browse/CASSANDRA-1805
>
> Maybe tools?
>
> On Thu, Jan 6, 2011 at 2:43 PM, Stephen Connolly
>  wrote:
>> I nearly have one ready...
>>
>> my plan is to have it added to contrib... if the cassandra devs agree
>>
>> -stephen
>>
>> - Stephen
>>
>> ---
>> Sent from my Android phone, so random spelling mistakes, random nonsense
>> words and other nonsense are a direct result of using swype to type on
the
>> screen
>>
>> On 6 Jan 2011 19:38, "B. Todd Burruss"  wrote:
>>> has anyone created a maven plugin, like cargo for tomcat, for automating
>>> starting/stopping a cassandra instance?
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com


Re: maven cassandra plugin

2011-01-06 Thread Stephen Connolly
testers welcome

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 6 Jan 2011 20:45, "B. Todd Burruss"  wrote:
> would u like some testers? we were about to write one.
>
> On 01/06/2011 12:43 PM, Stephen Connolly wrote:
>>
>> I nearly have one ready...
>>
>> my plan is to have it added to contrib... if the cassandra devs agree
>>
>> -stephen
>>
>> - Stephen
>>
>> ---
>> Sent from my Android phone, so random spelling mistakes, random
>> nonsense words and other nonsense are a direct result of using swype
>> to type on the screen
>>
>> On 6 Jan 2011 19:38, "B. Todd Burruss" > <mailto:bburr...@real.com>> wrote:
>> > has anyone created a maven plugin, like cargo for tomcat, for
>> automating
>> > starting/stopping a cassandra instance?


Re: maven cassandra plugin

2011-01-06 Thread Stephen Connolly
capistrano is a different use case. the maven plugin is for integration
testing, not live deployment

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 6 Jan 2011 21:32, "shimi"  wrote:


Re: maven cassandra plugin

2011-01-06 Thread Stephen Connolly
I nearly have one ready...

my plan is to have it added to contrib... if the cassandra devs agree

-stephen

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 6 Jan 2011 19:38, "B. Todd Burruss"  wrote:
> has anyone created a maven plugin, like cargo for tomcat, for automating
> starting/stopping a cassandra instance?


[ANN] Cassandra 0.7.0-rc4 available from Maven Central Repository

2011-01-06 Thread Stephen Connolly
Hi All,

Cassandra 0.7.0-rc4 is now available from the Maven Central Repository.

Apache Maven
-


org.apache.cassandra
cassandra-all
0.7.0-rc4


Apache Ivy
-



Groovy Grape
-

@Grapes(
@Grab(group='org.apache.cassandra', module='cassandra-all',
version='0.7.0-rc4')
)

Apache Buildr
-

'org.apache.cassandra:cassandra-all:jar:0.7.0-rc4'


Enjoy!

-Stephen

P.S.
Hopefully as projects dependent on cassandra move to central the Maven
experience for all will considerably improve. Having looked at the
crazy hacks that people have used to work around artifacts not being
in central I can only say that it would not have resulted in a good
Maven experience.

P.P.S.
AFAIK the plan is to split cassandra into multiple artifacts for the
different use cases, this will result in clients being able to depend
on a much smaller jar with a reduced dependency tree. The
cassandra-all jar is just a stop-gap until the cassandra build is
refactored to produce the required artifacts.