Re: CSV Import is taking huge time

2014-07-23 Thread Akshay Ballarpure
Thanks Jack for quick reply. i didn't understood your question completely. 
i am very new to Cassandra. I just installed single node cluster

[root@CSL-simulation bin]# ./nodetool -host 10.59.18.206 -p 7199 status
Note: Ownership information does not include topology; for complete 
information, specify a keyspace
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens  Owns   Host ID Rack
UN  127.0.0.1  55.65 MB   256 100.0% 
1159cda0-6a8c-423d-9a20-cdedd4db9907  rack1

Thanks & Regards
Akshay Ghanshyam Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarp...@tcs.com
Website: http://www.tcs.com

Experience certainty.   IT Services
Business Solutions
Consulting




From:
"Jack Krupansky" 
To:

Date:
07/23/2014 06:39 PM
Subject:
Re: CSV Import is taking huge time



Is it compute bound or I/O bound?
 
What does your cluster look like?
 
-- Jack Krupansky
 
From: Akshay Ballarpure 
Sent: Wednesday, July 23, 2014 5:00 AM
To: user@cassandra.apache.org 
Subject: CSV Import is taking huge time
 
Hello, 
I am trying copy command in Cassandra to import CSV file in to DB, Import 
is taking huge time, any suggestion to improve it? 

id,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z 
100,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26 
101,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26 
 
-- 
-- 

there are ~ 50 K lines in this file , size is ~ 5 MB. 
  
I have created table as per below: 

create table csldata4 ( id int PRIMARY KEY,a int , b int, c int, d int, e 
int, f int, 
g int, h int,i int, j int, k int, l int,m int, n 
int, o int, p int, q int, r int, s 
int, t int, u int, v int, w int, x int, y int , z int); 
Copy Command: 

COPY csldata4 (id , a , b , c , d , e , f , g , h , i , j , k , l , m , n 
, o , p , q , r , s , t , u , v , w , x , y , z ) FROM 'csldata1.csv' WITH 
HEADER=TRUE; 
  
Issue here is it's taking huge time to import 

cqlsh:mykeyspace> COPY csldata (id , a , b , c , d , e , f , g , h , i , j 
, k , l , m , n , o , p , q , r , s , t , u , v , w , x , y , z ) FROM 
'csldata1.csv' WITH HEADER=TRUE; 
66215 rows imported in 1 minute and 31.044 seconds. 


Thanks & Regards
Akshay Ghanshyam Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarp...@tcs.com
Website: http://www.tcs.com

Experience certainty.IT Services
   Business Solutions
   Consulting
 
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you



Cassandra on AWS suggestions for data safety

2014-07-23 Thread Hao Cheng
Hello,

Based on what I've read in the archives here and on the documentation on
Datastax and the Cassandra Community, EBS volumes, even provisioned IOPS
with EBS optimized instances, are not recommended due to inconsistent
performance. This I can deal with, but I was hoping for some
recommendations from the community as far as solutions for data safety.

I have a few ideas in mind:

1. Instance store for the database, then cassandra snapshots (via
nodetool), stored on an EBS provisioned IOPS volume attached to the
instance. That volume would serve to keep the DB safe in case of instance
downtime, and I would set up regular snapshotting on the EBS volume for
data safety (pushed to S3 and eventually glacier)

2. Instance store used as a bcache write-through cache for attached EBS
volumes. The attached volumes persist all writes and are again snapshotted
regularly.

3. Using a backup system, either manually via rsync or through something
like Priam, to directly push backups of the data on ephemeral storage to S3.

>From where I'm sitting, #2 seems the easiest to set up, but could
potentially cause problems if the EBS volume backing writes sees a spike in
latency, driving up write times even if read times would remain fairly
consistent.

Do any of you all have recommendations or suggestions for a system like
this?

Thanks in advance!

--Bryan


What is C*?

2014-07-23 Thread jcllings
Keep seeing refs to C*.

I assume that C* == Cassandra?  IMHO not a good ref to use what with C,
C++, C#.  A language called C* can't be far behind assuming it doesn't
already exist.
;-)

Jim C.



signature.asc
Description: OpenPGP digital signature


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread Robert Coli
On Wed, Jul 23, 2014 at 1:18 PM, DuyHai Doan  wrote:

> Why that ? In worst case, CL.ANY will write hints for replicas that are
> down. If will be extraordinary unlucky to have all replicas down at the
> same time
>

Hints are not writes for the purposes of consistency or durability, so your
write hasn't actually succeeded. Most people don't have applications which
need a database to potentially persist a write.

In addition, the implementation details of Hinted Handoff can make ANY a
meaningful contributor to cascading failure mode when nodes are actually
hard down, because instead of excepting with not available exception (which
gives your app a chance to back off), you write hints. There is some
throttling in terms of how many hints can be "in flight" at once, but ones
over the threshold are dropped on the floor. I've seen nodes with more
hints data than actual data, and which were completely unable to ever
deliver and purge these hints, though they uselessly compacted them for
weeks on end. In most configs, you will end up discarding some subset of
these hints in the course of your cascading failure, but you will probably
not know which ones. You will also discard 100% of hints after three hours
in the default config. You might be happier to just get an exception at the
start of the incident, back off your application access a bit, and fix the
small subset of affected nodes?

In the future when hints are not handled via Column Families, ANY probably
gets a lot less risky in terms of overload-with-undelivered-hints, but
probably still doesn't actually provide what I consider worthwhile benefit.
It is of course possible that I have just never had or heard of a case for
which it was appropriate or necessary.

tl;dr - CL.ANY creates more risk of cases where you will write a bunch of
hints, and cases where you write a bunch of hints are almost never the
solution to any actual problem, because hints are not writes. If you really
really need "extreme" availability and can't do it via increasing RF, maybe
you might want to consider using CL.ANY. But probably not.

=Rob


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread graham sanderson
I was being a little tongue in cheek!

On Jul 23, 2014, at 3:20 PM, Jack Krupansky  wrote:

> Granted, for “normal” apps it is unlikely to be appropriate but...
>  
> From an old post by Jonathan:
> ---
> Extreme write availability
>  
> For applications that want Cassandra to accept writes even when all the 
> normal replicas are down (so even ConsistencyLevel.ONE cannot be satisfied), 
> Cassandra provides ConsistencyLevel.ANY. ConsistencyLevel.ANY guarantees that 
> the write is durable and will be readable once an appropriate replica target 
> becomes available and receives the hint replay.
> ---
> See:
> http://www.datastax.com/dev/blog/understanding-hinted-handoff
>  
> I can think of a couple of use cases: sensor data where the devices are 
> streaming frequently, so losing a reading is not a big deal because another 
> reading is coming soon anyway, and a Twitter firehose where you are after a 
> robust sample rather than absolute consistency. Minimizing network latency 
> may be a bigger deal than whether immediate queries can see the data.
>  
> And as the description notes, hinted handoff will eventually propagate the 
> data (unless it times out and drops the hint.)
>  
> -- Jack Krupansky
>  
> From: Robert Coli
> Sent: Wednesday, July 23, 2014 1:15 PM
> To: user@cassandra.apache.org
> Cc: Kevin Burton
> Subject: Re: All writes fail with ONE consistency level when adding second 
> node to cluster?
>  
> On Tue, Jul 22, 2014 at 7:46 PM, Andrew  wrote:
>  
> ONE means write to one replica (in addition to the original).  If you want to 
> write to any of them, use ANY.  Is that the right understanding?
>  
>  
> This has come up a few times, so let me be unambiguous about when to use 
> CL.ANY :
>  
> NEVER EVER USE CL.ANY. IT ALMOST CERTAINLY SHOULD NOT EVEN EXIST.
>  
> IF YOU THINK YOU NEED TO USE IT, YOU ARE ALMOST CERTAINLY WRONG.
>  
> ;D
>  
> =Rob
>  



smime.p7s
Description: S/MIME cryptographic signature


Re: CSV Import is taking huge time

2014-07-23 Thread Tyler Hobbs
See https://issues.apache.org/jira/browse/CASSANDRA-7405.

Currently cqlsh's COPY FROM just uses a single-threaded for-loop with
synchronous inserts.


On Wed, Jul 23, 2014 at 8:09 AM, Jack Krupansky 
wrote:

>   Is it compute bound or I/O bound?
>
> What does your cluster look like?
>
> -- Jack Krupansky
>
>  *From:* Akshay Ballarpure 
> *Sent:* Wednesday, July 23, 2014 5:00 AM
> *To:* user@cassandra.apache.org
> *Subject:* CSV Import is taking huge time
>
> Hello,
> I am trying copy command in Cassandra to import CSV file in to DB, Import
> is taking huge time, any suggestion to improve it?
>
> id,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z
> 100,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
> 101,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
> 
> --
> --
>
> there are ~ 50 K lines in this file , size is ~ 5 MB.
>
> I have created table as per below:
>
> create table csldata4 ( id int PRIMARY KEY,a int , b int, c int, d int, e
> int, f int,
> g int, h int,i int, j int, k int, l int,m int, n
> int, o int, p int, q int, r int, s
> int, t int, u int, v int, w int, x int, y int , z int);
> Copy Command:
>
> COPY csldata4 (id , a , b , c , d , e , f , g , h , i , j , k , l , m , n
> , o , p , q , r , s , t , u , v , w , x , y , z ) FROM 'csldata1.csv' WITH
> HEADER=TRUE;
>
> Issue here is it's taking huge time to import
>
> cqlsh:mykeyspace> COPY csldata (id , a , b , c , d , e , f , g , h , i , j
> , k , l , m , n , o , p , q , r , s , t , u , v , w , x , y , z ) FROM
> 'csldata1.csv' WITH HEADER=TRUE;
> 66215 rows imported in *1 minute and 31.044 seconds*.
>
>
> Thanks & Regards
> Akshay Ghanshyam Ballarpure
> Tata Consultancy Services
> Cell:- 9985084075
> Mailto: akshay.ballarp...@tcs.com
> Website: http://www.tcs.com
> 
> Experience certainty.IT Services
>Business Solutions
>Consulting
> 
>
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>



-- 
Tyler Hobbs
DataStax 


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread Jack Krupansky
Granted, for “normal” apps it is unlikely to be appropriate but...

>From an old post by Jonathan:
---
Extreme write availability

For applications that want Cassandra to accept writes even when all the normal 
replicas are down (so even ConsistencyLevel.ONE cannot be satisfied), Cassandra 
provides ConsistencyLevel.ANY. ConsistencyLevel.ANY guarantees that the write 
is durable and will be readable once an appropriate replica target becomes 
available and receives the hint replay.
---
See:
http://www.datastax.com/dev/blog/understanding-hinted-handoff

I can think of a couple of use cases: sensor data where the devices are 
streaming frequently, so losing a reading is not a big deal because another 
reading is coming soon anyway, and a Twitter firehose where you are after a 
robust sample rather than absolute consistency. Minimizing network latency may 
be a bigger deal than whether immediate queries can see the data.

And as the description notes, hinted handoff will eventually propagate the data 
(unless it times out and drops the hint.)

-- Jack Krupansky

From: Robert Coli 
Sent: Wednesday, July 23, 2014 1:15 PM
To: user@cassandra.apache.org 
Cc: Kevin Burton 
Subject: Re: All writes fail with ONE consistency level when adding second node 
to cluster?

On Tue, Jul 22, 2014 at 7:46 PM, Andrew  wrote:

ONE means write to one replica (in addition to the original).  If you want to 
write to any of them, use ANY.  Is that the right understanding?


This has come up a few times, so let me be unambiguous about when to use CL.ANY 
:

NEVER EVER USE CL.ANY. IT ALMOST CERTAINLY SHOULD NOT EVEN EXIST.

IF YOU THINK YOU NEED TO USE IT, YOU ARE ALMOST CERTAINLY WRONG.

;D

=Rob


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread DuyHai Doan
Why that ? In worst case, CL.ANY will write hints for replicas that are
down. If will be extraordinary unlucky to have all replicas down at the
same time


On Wed, Jul 23, 2014 at 9:26 PM, Robert Coli  wrote:

> On Wed, Jul 23, 2014 at 12:01 PM, graham sanderson 
> wrote:
>
>> Hey now; it is GREAT for a 100% write only use case ;-)
>>
>
> A well WORN [1] path in databases, for sure.
>
> =Rob
> [1] Write Once Read Never
>


Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Jack Krupansky
Out of curiosity, did you look at or utilize DataStax’s free online training?

See:
http://www.datastax.com/what-we-offer/products-services/training/virtual-training

Any feedback? Any suggestions as to what needs it does or doesn’t fulfill?

-- Jack Krupansky

From: Nicholas Okunew 
Sent: Wednesday, July 23, 2014 8:29 AM
To: user@cassandra.apache.org 
Subject: Re: Why is the cassandra documentation such poor quality?

I think the problem is a little deeper than that. I've been working with 
cassandra for about 7 months now - it was very challenging to find out any real 
information about using cassandra, and even harder to get clear information on 
operating it. There's a truckload of reading you have to do, and no one place 
you can find it.  

Searching google largely turns up datastax blog posts and DSE docs, almost all 
of it out of date (not to say the docs aren't up to date, but the search 
results often result in old docs, and old blog posts).

Deeper searching usually results in a link to JIRA. No offense to anyone 
involved, but when your first experience of trying to learn an open source tool 
is the realisation that all the information you need is simply spread across 
~7000 jira tickets, it doesn't make the knowledge feel very accessible.

As an example, finding a java driver with a useful abstraction was non-trivial 
- it appeared on the surface that there wasn't really one, that you had to 
write everything yourself on top of CQL. Now I (as everyone else on this list 
knows) that datastax provide one. At the time I never found a simple page that 
just pointed me in the direction, and showed a basic usage example.

Another example is that there is constant confusion about nonclamenture on this 
list, because naming has changed over time. If you don't know you're reading 
old information, or what the significant changes are between 0.whatever, 
1.whatever and 2.whatever its very hard to know whether you're even googling 
for the right thing. Dynamic columns are a great example of this. I think the 
fact that it keeps coming up on this list is a strong indicator that the 
information is not available in a 'sufficient' way.

Another way of putting it is, when I started trying to learn about cassandra, 
pretty much every piece of consumable information I was able to find was out of 
date, but it wasn't always obvious that this was the case.

Having said all that, everything I've seen on this list points to prompt, 
useful and friendly assistance, even for questions that are frequently asked. I 
have no stake either way in what the rules on who can contribute are, but I can 
definitely say I would have very much enjoyed a much softer landing when trying 
to learn cassandra, from the basics all the way through to the detail of ops.





On 23 July 2014 21:55, Jason Wee  wrote:

  I agree to the people here already sharing their ways to access 
documentation. If you are starter, you should better spend time to search for 
documentation (like using google) or hours to read. Then start ask specific 
question. Coming here kpkb about poor quality of documentation just does not 
cut it.  

  If you find documentation is outdated, you can email to the people in charge 
and tell them what is wrong and what you think will improve. There are some 
documentation which is left there so that we can read and understand history 
where it came from and some may still use old version of cassandra.



  On Wed, Jul 23, 2014 at 7:49 PM, Jack Krupansky  
wrote:

And the simplest and easiest thing to do is simply email this list when you 
see something wrong or missing in the DataStax Cassandra doc, or for anything 
that is not adequately anywhere. I work with the doc people there, so I can 
make sure they see corrections and improvements. And simply sharing knowledge 
on this list is always a big step forward.

-- Jack Krupansky

From: spa...@gmail.com 
Sent: Wednesday, July 23, 2014 4:25 AM
To: user@cassandra.apache.org 
Subject: Re: Why is the cassandra documentation such poor quality?

I would like to help out with the documentation of C*. How do I start?




On Wed, Jul 23, 2014 at 12:46 PM, Robert Stupp  wrote:

  Just a note:
  If you have suggestions how to improve documentation on the datastax 
website, write them an email to d...@datastax.com. They appreciate proposals :)

  Am 23.07.2014 um 09:10 schrieb Mark Reddy :


Hi Kevin,

The difference here is that the Apache Cassandra site is maintained by 
the community whereas the DataStax site is maintained by paid employees with a 
vested interest in producing documentation. 

With DataStax having some comprehensive docs, I guess the desire for 
people to maintain the Apache site has dwindled. However, if you are interested 
in contributing to it and bringing it back up to standard you can, thus is the 
freedom of open source. 


Mark



On Wed, Jul 23, 2014 at 2:54 AM, Kevin Bu

Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread Robert Coli
On Wed, Jul 23, 2014 at 12:01 PM, graham sanderson  wrote:

> Hey now; it is GREAT for a 100% write only use case ;-)
>

A well WORN [1] path in databases, for sure.

=Rob
[1] Write Once Read Never


Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Robert Coli
On Wed, Jul 23, 2014 at 11:44 AM, Kevin Burton  wrote:

> I have a lot of experience in distribute systems, understand the space
> well, but I just can't find documentation on how cassandra does things from
> a high level perspective.
>

My belief as to the reason why you are unable to find design documents
which explicate the design principles behind Apache Cassandra is that
neither exist.

I continue to be a firm believer that a documented set of design principles
we all agree are "Cassandric" would be very helpful in cases where we need
to evaluate whether proposed patches are "Cassandric" enough to merge.

My plan is to just use the source code for reference.  Not everyone has
> that option though…
>

This, and people who have done the same and combined it with following JIRA
for regressions/bugfixes, are the path to the most accurate understanding
of how any given release of Cassandra works.

=Rob


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread graham sanderson
Hey now; it is GREAT for a 100% write only use case ;-)

On Jul 23, 2014, at 12:15 PM, Robert Coli  wrote:

> On Tue, Jul 22, 2014 at 7:46 PM, Andrew  wrote:
> ONE means write to one replica (in addition to the original).  If you want to 
> write to any of them, use ANY.  Is that the right understanding?
> 
> This has come up a few times, so let me be unambiguous about when to use 
> CL.ANY :
> 
> NEVER EVER USE CL.ANY. IT ALMOST CERTAINLY SHOULD NOT EVEN EXIST.
> 
> IF YOU THINK YOU NEED TO USE IT, YOU ARE ALMOST CERTAINLY WRONG.
> 
> ;D
> 
> =Rob
> 



smime.p7s
Description: S/MIME cryptographic signature


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread Kevin Burton
Interesting.. it was unclear what it does… ONE sounds right to me so I was
curious what was up with ANY.  We just set it to ANY so that we could track
down what was causing this bug.


On Wed, Jul 23, 2014 at 10:15 AM, Robert Coli  wrote:

> On Tue, Jul 22, 2014 at 7:46 PM, Andrew  wrote:
>
>> ONE means write to one replica (in addition to the original).  If you
>> want to write to any of them, use ANY.  Is that the right understanding?
>>
>
> This has come up a few times, so let me be unambiguous about when to use
> CL.ANY :
>
> NEVER EVER USE CL.ANY. IT ALMOST CERTAINLY SHOULD NOT EVEN EXIST.
>
> IF YOU THINK YOU NEED TO USE IT, YOU ARE ALMOST CERTAINLY WRONG.
>
> ;D
>
> =Rob
>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Kevin Burton
This is a great post… and really does a great job of summarizing the
problems I have with the cassandra documentation.

I realize that Datastax is trying to step in and fix the problem.. but
there are now a lot of parties involved (JIRA tickets, blog posts, datastax
doc, apache doc, etc) … it's all over the place.

Some coordination is needed.

My suggestions would be:

- take the apache wiki and documentation offline or put it in legacy… it's
NOT helping Cassandra or Datastax new users when people stumble on old
documentation giving them bad advice for commands that no longer exist.

- take the datastax documentation and perhaps have Apache link to it or
have Datastax contribute the documentation to Apache.  It's not helping
Datastax when their potential customers look at Cassandra and see poor and
inconsistent documentation.

- take some of the blog posts / JIRA tickets, and document the outcome
somewhere.  We can't have documentation hidden.

- make sure Google returns the right documentation for queries.

- make sure to include the version in the documentation on which version of
cassandra it applies to…

- consider using the MySQL documentation as an example.  They've done a
GREAT job maintaining their documentation over the years. IMO.

… I think part of the problem is that the cassandra community has a lot of
people who already know the ins and outs… so they don't see how difficult
the problem is from a new user perspective.

I have a lot of experience in distribute systems, understand the space
well, but I just can't find documentation on how cassandra does things from
a high level perspective.

My plan is to just use the source code for reference.  Not everyone has
that option though…

Kevin


On Wed, Jul 23, 2014 at 5:29 AM, Nicholas Okunew  wrote:

> I think the problem is a little deeper than that. I've been working with
> cassandra for about 7 months now - it was very challenging to find out any
> real information about using cassandra, and even harder to get clear
> information on operating it. There's a truckload of reading you have to do,
> and no one place you can find it.
>
> Searching google largely turns up datastax blog posts and DSE docs, almost
> all of it out of date (not to say the docs aren't up to date, but the
> search results often result in old docs, and old blog posts).
>
> Deeper searching usually results in a link to JIRA. No offense to anyone
> involved, but when your first experience of trying to learn an open source
> tool is the realisation that all the information you need is simply spread
> across ~7000 jira tickets, it doesn't make the knowledge feel very
> accessible.
>
> As an example, finding a java driver with a useful abstraction was
> non-trivial - it appeared on the surface that there wasn't really one, that
> you had to write everything yourself on top of CQL. Now I (as everyone else
> on this list knows) that datastax provide one. At the time I never found a
> simple page that just pointed me in the direction, and showed a basic usage
> example.
>
> Another example is that there is constant confusion about nonclamenture on
> this list, because naming has changed over time. If you don't know you're
> reading old information, or what the significant changes are between
> 0.whatever, 1.whatever and 2.whatever its very hard to know whether you're
> even googling for the right thing. Dynamic columns are a great example of
> this. I think the fact that it keeps coming up on this list is a strong
> indicator that the information is not available in a 'sufficient' way.
>
> Another way of putting it is, when I started trying to learn about
> cassandra, pretty much every piece of consumable information I was able to
> find was out of date, but it wasn't always obvious that this was the case.
>
> Having said all that, everything I've seen on this list points to prompt,
> useful and friendly assistance, even for questions that are frequently
> asked. I have no stake either way in what the rules on who can contribute
> are, but I can definitely say I would have very much enjoyed a much softer
> landing when trying to learn cassandra, from the basics all the way through
> to the detail of ops.
>
>
-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Kevin Burton
On Wed, Jul 23, 2014 at 4:55 AM, Jason Wee  wrote:

> I agree to the people here already sharing their ways to access
> documentation. If you are starter, you should better spend time to search
> for documentation (like using google) or hours to read. Then start ask
> specific question. Coming here kpkb about poor quality of documentation
> just does not cut it.
>
>
You clearly didn't look at my post history.

Further, you CAN'T just spend time searching google and reading, because
the documentation is all over the place, inaccurate, etc.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




Re: cluster rebalancing…

2014-07-23 Thread Robert Coli
On Tue, Jul 22, 2014 at 7:03 PM, Kevin Burton  wrote:

> So , shouldn't it be easy to rebalance a cluster?
>
> I'm not super excited to type out 200 commands to move around individual
> tokens.
>

That's why vnodes exist? Before vnodes, the only sane option was to double
your cluster size...

=Rob


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread Robert Coli
On Tue, Jul 22, 2014 at 7:46 PM, Andrew  wrote:

> ONE means write to one replica (in addition to the original).  If you want
> to write to any of them, use ANY.  Is that the right understanding?
>

This has come up a few times, so let me be unambiguous about when to use
CL.ANY :

NEVER EVER USE CL.ANY. IT ALMOST CERTAINLY SHOULD NOT EVEN EXIST.

IF YOU THINK YOU NEED TO USE IT, YOU ARE ALMOST CERTAINLY WRONG.

;D

=Rob


Re: Cassandra select results differs

2014-07-23 Thread Russell Bradberry
try running at CL QUORUM and see if the problem goes away, if it does then it 
might be a consistency issue.

also, what version of C*, how many nodes, what is your RF and what CL do you 
normally read?



On July 23, 2014 at 12:55:32 PM, Batranut Bogdan (batra...@yahoo.com) wrote:

I have cron jobs that repair every week. node 1 - monday , node 2 tuesday .


On Wednesday, July 23, 2014 7:52 PM, Russell Bradberry  
wrote:


sounds like you may need to run a repair



On July 23, 2014 at 12:50:23 PM, Batranut Bogdan (batra...@yahoo.com) wrote:
Hello all,

I have a CF 


CREATE TABLE cf (
  a text,
  b int,
  c int,
  d int,
  e int,
  PRIMARY KEY (a)
)  WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};
 

there are 3 rows there 

when I do select * from cf;

I get random number of rows in results... like 0 , 2, or 3 rows. 

What is the problem?






Re: Cassandra select results differs

2014-07-23 Thread Batranut Bogdan
I have cron jobs that repair every week. node 1 - monday , node 2 tuesday .


On Wednesday, July 23, 2014 7:52 PM, Russell Bradberry  
wrote:
 


sounds like you may need to run a repair



On July 23, 2014 at 12:50:23 PM, Batranut Bogdan (batra...@yahoo.com) wrote:
Hello all,

I have a CF 


CREATE TABLE
cf (
  a
text,
  b
int,
  c
int,
  d
int,
  e
int,
  PRIMARY
KEY (a)
)  WITH
 
bloom_filter_fp_chance=0.01 AND
 
caching='KEYS_ONLY' AND
 
comment='' AND
 
dclocal_read_repair_chance=0.00 AND
 
gc_grace_seconds=864000 AND
 
index_interval=128 AND
 
read_repair_chance=0.10 AND
 
replicate_on_write='true' AND
 
populate_io_cache_on_flush='false' AND
 
default_time_to_live=0 AND
 
speculative_retry='99.0PERCENTILE' AND
 
memtable_flush_period_in_ms=0 AND
 
compaction={'class': 'SizeTieredCompactionStrategy'} AND
 
compression={'sstable_compression': 'LZ4Compressor'};
 

there are 3 rows there 

when I do select * from cf;

I get random number of rows in results...
like 0 , 2, or 3 rows. 

What is the problem?

Re: Cassandra select results differs

2014-07-23 Thread Russell Bradberry
sounds like you may need to run a repair



On July 23, 2014 at 12:50:23 PM, Batranut Bogdan (batra...@yahoo.com) wrote:

Hello all,

I have a CF 


CREATE TABLE cf (
  a text,
  b int,
  c int,
  d int,
  e int,
  PRIMARY KEY (a)
)  WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};
 

there are 3 rows there 

when I do select * from cf;

I get random number of rows in results... like 0 , 2, or 3 rows. 

What is the problem?




Cassandra select results differs

2014-07-23 Thread Batranut Bogdan
Hello all,

I have a CF 


CREATE TABLE cf (
  a text,
  b int,
  c int,
  d int,
  e int,
  PRIMARY KEY (a)
)  WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};
 

there are 3 rows there 

when I do select * from cf;

I get random number of rows in results... like 0 , 2, or 3 rows. 

What is the problem?

Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Jeremy Jongsma
My experience is similar to Nicholas'. Basic usage was easy to get a handle
on, but the advanced tuning/tweaking info is scattered EVERYWHERE around
the web, mostly on personal blogs. It feels like it took way too long to
become confident enough in my understanding of Cassandra that I trust our
deployment configuration in production.

Without this mailing list I would still be on the fence.


On Wed, Jul 23, 2014 at 8:20 AM, Peter Lin  wrote:

> @benedict - you're right that I've haven't requested permission to edit.
> You're also right that I've given up on getting edit permission to
> cassandra wiki. I've been struggling and struggled with "how" to manage
> open source projects, so I totally get it. Managing projects is a thankless
> job most of the time. Pleasing everyone is totally impossible. Apache isn't
> alone in this. I've submitted stuff to google's open source projects in the
> past and had it go into a black hole. We all struggle with managing open
> source projects.
>
> I am committed to contributing Cassandra community, but just not through
> the wiki. There's lots of different ways to contribute. The jira tickets
> I've submitted have gotten good responses generally. It does take several
> days depending on how busy the committers are, but that's normal for all
> projects.
>
>
>
> On Wed, Jul 23, 2014 at 9:00 AM, Benedict Elliott Smith <
> belliottsm...@datastax.com> wrote:
>
>> Requesting a change is very different to requesting permission to edit
>> (which, I note, still hasn't been made); we do our best to promote
>> community engagement, so granting a privilege request has a different
>> mental category to a random edit request, which is much more likely to be
>> forgotten by any particular committer in the process of attending to their
>> more pressing work.
>>
>> The relationship between committers and the community is debated at
>> length in all projects, often by vocal individuals such as yourselves who
>> are unhappy in some way with how the project is being run. However it is
>> very hard to please everyone - most of the time we can't even please all
>> the committers, and that is a much smaller and more homogenous group.
>>
>>
>>
>>
>>
>> On Wed, Jul 23, 2014 at 2:30 PM, Peter Lin  wrote:
>>
>>>
>>> I sent a request to add a link my .Net driver for cassandra to the wiki
>>> over 5 weeks back and no response at all.
>>>
>>> I sent another request way back in 2013 and got zero response. Again, I
>>> totally understand people are busy and I'm just as guilty as everyone else
>>> of letting requests slip by. It's the reality of contributing to open
>>> source as a hobby. If I wasn't serious about contributing to cassandra
>>> community, I wouldn't have spent 2.5 months porting Hector to C# manually.
>>>
>>> Perhaps the real cause is that some committers can't "empathise" with
>>> others in the community?
>>>
>>>
>>> On Wed, Jul 23, 2014 at 8:22 AM, Benedict Elliott Smith <
>>> belliottsm...@datastax.com> wrote:
>>>
 All requests I've seen in the past year to edit the wiki (admittedly
 only 2-3) have been answered promptly with editing privileges. Personally I
 don't have a major preference either way for policy - there are positives
 and negatives to each approach - but, like I said, raise it on the dev list
 and see if anybody else does.

 However I must admit I cannot empathise with your characterisation of
 requesting permission as 'begging', or a 'slap in the face', or that it is
 even particularly onerous. It is a slight psychological barrier, but in my
 personal experience when a psychological barrier as low as this prevents me
 from taking action, it's usually because I don't have as much desire to
 contribute as I thought I did.




 On Wed, Jul 23, 2014 at 1:54 PM, Peter Lin  wrote:

>
> I've submitted requests to edit the wiki in the past and nothing ever
> got done.
>
> Having been an apache committer and contributor over the years, I can
> totally understand that people are busy. I also understand that "most"
> developer find writing docs tedious.
>
> I'd rather not harass the committers about wiki edits, since I didn't
> like it when it happened to me in the past. That's why many apache 
> projects
> keep their wiki's open. Honestly, as much as I find writing docs
> challenging and tedious, it's critical and important. For my other open
> source projects, I force myself to write docs.
>
> my point is, the wiki should be open and the barrier should be
> removed. Having to "beg/ask" to edit the wiki feels like a slap in the 
> face
> to me, but maybe I'm alone in this. Then again, I've heard the same
> sentiment from other people about cassandra's wiki. The thing is, they 
> just
> chalk it up to "cassandra committers don't give a crap about docs". I do 
> my
> best to defend the committers and point out some 

Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Jake Luciani
I'll note that historically the wiki used to be open to all and due massive
amounts of spam it was put on lockdown by the ASF.

If there is a better platform the community feels would make it simpler to
provide community based documentation then we should consider it.
The ASF also has confluence wiki which might be simpler for users to
contribute to? (at least they have captchas)

-Jake



On Wed, Jul 23, 2014 at 9:20 AM, Peter Lin  wrote:

> @benedict - you're right that I've haven't requested permission to edit.
> You're also right that I've given up on getting edit permission to
> cassandra wiki. I've been struggling and struggled with "how" to manage
> open source projects, so I totally get it. Managing projects is a thankless
> job most of the time. Pleasing everyone is totally impossible. Apache isn't
> alone in this. I've submitted stuff to google's open source projects in the
> past and had it go into a black hole. We all struggle with managing open
> source projects.
>
> I am committed to contributing Cassandra community, but just not through
> the wiki. There's lots of different ways to contribute. The jira tickets
> I've submitted have gotten good responses generally. It does take several
> days depending on how busy the committers are, but that's normal for all
> projects.
>
>
>
> On Wed, Jul 23, 2014 at 9:00 AM, Benedict Elliott Smith <
> belliottsm...@datastax.com> wrote:
>
>> Requesting a change is very different to requesting permission to edit
>> (which, I note, still hasn't been made); we do our best to promote
>> community engagement, so granting a privilege request has a different
>> mental category to a random edit request, which is much more likely to be
>> forgotten by any particular committer in the process of attending to their
>> more pressing work.
>>
>> The relationship between committers and the community is debated at
>> length in all projects, often by vocal individuals such as yourselves who
>> are unhappy in some way with how the project is being run. However it is
>> very hard to please everyone - most of the time we can't even please all
>> the committers, and that is a much smaller and more homogenous group.
>>
>>
>>
>>
>>
>> On Wed, Jul 23, 2014 at 2:30 PM, Peter Lin  wrote:
>>
>>>
>>> I sent a request to add a link my .Net driver for cassandra to the wiki
>>> over 5 weeks back and no response at all.
>>>
>>> I sent another request way back in 2013 and got zero response. Again, I
>>> totally understand people are busy and I'm just as guilty as everyone else
>>> of letting requests slip by. It's the reality of contributing to open
>>> source as a hobby. If I wasn't serious about contributing to cassandra
>>> community, I wouldn't have spent 2.5 months porting Hector to C# manually.
>>>
>>> Perhaps the real cause is that some committers can't "empathise" with
>>> others in the community?
>>>
>>>
>>> On Wed, Jul 23, 2014 at 8:22 AM, Benedict Elliott Smith <
>>> belliottsm...@datastax.com> wrote:
>>>
 All requests I've seen in the past year to edit the wiki (admittedly
 only 2-3) have been answered promptly with editing privileges. Personally I
 don't have a major preference either way for policy - there are positives
 and negatives to each approach - but, like I said, raise it on the dev list
 and see if anybody else does.

 However I must admit I cannot empathise with your characterisation of
 requesting permission as 'begging', or a 'slap in the face', or that it is
 even particularly onerous. It is a slight psychological barrier, but in my
 personal experience when a psychological barrier as low as this prevents me
 from taking action, it's usually because I don't have as much desire to
 contribute as I thought I did.




 On Wed, Jul 23, 2014 at 1:54 PM, Peter Lin  wrote:

>
> I've submitted requests to edit the wiki in the past and nothing ever
> got done.
>
> Having been an apache committer and contributor over the years, I can
> totally understand that people are busy. I also understand that "most"
> developer find writing docs tedious.
>
> I'd rather not harass the committers about wiki edits, since I didn't
> like it when it happened to me in the past. That's why many apache 
> projects
> keep their wiki's open. Honestly, as much as I find writing docs
> challenging and tedious, it's critical and important. For my other open
> source projects, I force myself to write docs.
>
> my point is, the wiki should be open and the barrier should be
> removed. Having to "beg/ask" to edit the wiki feels like a slap in the 
> face
> to me, but maybe I'm alone in this. Then again, I've heard the same
> sentiment from other people about cassandra's wiki. The thing is, they 
> just
> chalk it up to "cassandra committers don't give a crap about docs". I do 
> my
> best to defend the committers and point out some are

Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread Jack Krupansky
Besides the obviously confusing error message, this particular case could 
simply be that the hash value of the primary key belonged to the other node 
that wasn’t up, so even though one node was up, it didn’t own that particular 
hash value or token, so CL=ONE could not succeed.

What was RF set to for this two node cluster?

-- Jack Krupansky

From: Andrew 
Sent: Wednesday, July 23, 2014 1:02 AM
To: graham sanderson ; user@cassandra.apache.org 
Cc: Kevin Burton 
Subject: Re: All writes fail with ONE consistency level when adding second node 
to cluster?

I looked into this; ONE means it must be written to one replica—i.e., a node 
the data is supposed to be written to.  ANY means a hinted handoff will 
“count”.  So as long as it writes to any node on the cluster—even one that it’s 
not supposed to be on—it will be a success.  Good to know.

Andrew


On July 22, 2014 at 8:13:57 PM, graham sanderson (gra...@vast.com) wrote:

  Incorrect, ONE does not refer to the number of “other" nodes, it just refers 
to the number of nodes. so ONE under normal circumstances would only require 
one node to acknowledge the write. 

  The confusing error message you are getting is related to 
https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are correct in 
that normally that error message would make no sense.

  I don’t have much experience adding/removing nodes, but I think what is 
happening is that your new node is in the middle of taken over ownership of a 
token range - while that happens C* is trying to write to both the old owner 
(your original node), AND (hence the 2 not 1 in the error message) the new 
owner (the new node) so that once the bootstrapping of the new node is 
complete, it is immediately safe to delete the (no longer owned data) from the 
old node. For whatever reason the write to the new node is timing out, causing 
the exception, and the error message is exposing the “2” which happens to be 
how many C* thinks it is waiting for at the time (i.e. how many it should be 
waiting for based on the consistency level (1) plus this extra node).


  On Jul 22, 2014, at 9:46 PM, Andrew  wrote:


ONE means write to one replica (in addition to the original).  If you want 
to write to any of them, use ANY.  Is that the right understanding?

http://www.datastax.com/docs/1.0/dml/data_consistency

Andrew


On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote:

  I'm super confused by this.. and disturbed that this was my failure 
scenario :-( 


  I had one cassandra node for the alpha of my app… and now we're moving 
into beta… which means three replicas.


  So I added the second node… but my app immediately broke with:


  ""Cassandra timeout during write query at consistency ONE (2 replica were 
required but only 1 acknowledged the write)""


  … but that makes no sense… if I'm at ONE and I have one acknowledged 
write, why does it matter that the second one hasn't ack'd yet…


  ?


  --


  Founder/CEO Spinn3r.com

  Location: San Francisco, CA

  blog: http://burtonator.wordpress.com
  … or check out my Google+ profile 



--


Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Dave Brosius
 

We had a massive spam problem before we locked down the wiki, so
unfortunately that was the choice we had to make. But as stated we can
add you to the contributers list. 

What is your Wiki user name? 

On 2014-07-23 07:33, Peter Lin wrote: 

> I've tried to contribute docs to Cassandra wiki in the past, but there's an 
> obstacle.
> 
> currently wiki.apache.org/cassandra [1] is locked down, so only commiters can 
> edit it. I really wish that wasn't the case, since it wastes time. the 
> commiters are busy writing code. Having to email a commiter and ask them to 
> update it feels silly to me and kind of goes against openness. Back when I 
> was active with JMeter, we decided to leave it open so that anyone can edit 
> the docs.
> 
> I can't be the only one that wants to help make the docs better, but get 
> frustrated with the wiki being closed.
> 
> On Wed, Jul 23, 2014 at 4:25 AM,  wrote:
> 
> I would like to help out with the documentation of C*. How do I start? 
> 
> On Wed, Jul 23, 2014 at 12:46 PM, Robert Stupp  wrote:
> 
> Just a note: 
> If you have suggestions how to improve documentation on the datastax website, 
> write them an email to d...@datastax.com. They appreciate proposals :) 
> 
> Am 23.07.2014 um 09:10 schrieb Mark Reddy : 
> 
> Hi Kevin, 
> The difference here is that the Apache Cassandra site is maintained by the 
> community whereas the DataStax site is maintained by paid employees with a 
> vested interest in producing documentation. 
> 
> With DataStax having some comprehensive docs, I guess the desire for people 
> to maintain the Apache site has dwindled. However, if you are interested in 
> contributing to it and bringing it back up to standard you can, thus is the 
> freedom of open source. 
> 
> Mark 
> 
> On Wed, Jul 23, 2014 at 2:54 AM, Kevin Burton  wrote:
> 
> This document: 
> 
> https://wiki.apache.org/cassandra/Operations [2] 
> 
> … for example. Is extremely out dated… does NOT reflect 2.x releases 
> certainly. Mentions commands that are long since removed/deprecated. 
> 
> Instead of giving bad documentation, maybe remove this and mark it as 
> obsolete. 
> The datastax documentation… is … acceptable I guess. My main criticism there 
> is that a lot of it it is in their blog. 
> 
> Kevin
> 
> -- 
> 
> Founder/CEO Spinn3r.com [3] 
> Location: SAN FRANCISCO, CA 
> blog: http://burtonator.wordpress.com [4] 
> … or check out my Google+ profile [5] 
> [3]

 -- 
http://spawgi.wordpress.com [6]
 We can do it and do it better. 

Links:
--
[1] http://wiki.apache.org/cassandra
[2] https://wiki.apache.org/cassandra/Operations
[3] http://spinn3r.com/
[4] http://burtonator.wordpress.com/
[5] https://plus.google.com/102718274791889610666/posts
[6] http://spawgi.wordpress.com


RE: EXTERNAL: Re: Running Cassandra Server in an OSGi container

2014-07-23 Thread Rodgers, Hugh
Yes, the application includes the C* server and client.

From: Robert Stupp [mailto:sn...@snazy.de]
Sent: Wednesday, July 23, 2014 12:19 AM
To: user@cassandra.apache.org
Subject: Re: EXTERNAL: Re: Running Cassandra Server in an OSGi container

You mean "unzip and run" of an application using C* ?

Am 23.07.2014 um 00:34 schrieb Rodgers, Hugh 
mailto:hugh.rodg...@lmco.com>>:


What got our team on the path of trying to embed C* was the wiki page 
http://wiki.apache.org/cassandra/Embedding which implies this can be done. Also 
WSO2 Carbon and Achilles have both embedded C* (not in an OSGi container 
though, and Carbon is with an older C* version).

We are wanting an "unzip and run" system and do not expect the user to have to 
do much, if any, C* configuration.

From: Robert Stupp [mailto:sn...@snazy.de]
Sent: Tuesday, July 22, 2014 1:19 PM
To: user@cassandra.apache.org
Subject: EXTERNAL: Re: Running Cassandra Server in an OSGi container

What's your intention to do this?

There are unit test integrations using C* daemon. A related bug that prevented 
proper shutdown has been closed for C* 2.1-rc1: 
https://issues.apache.org/jira/browse/CASSANDRA-5635
It's perfectly fine to embed C* for unit tests.

But I'd definitely not recommend to use C* within a container in a real 
production environment.
Not just because of the few System.exit calls in CassandraDaemon but also of 
the other places where System.exit is called for very good reasons. These 
reasons include system/node failure scenarios (for example disk failures).

C* is designed to run in its own JVM process using dedicated hardware resources 
on multiple servers using commodity hardware without any virtualization or any 
shared storage. And it just works great with that.

There are good reasons to move computation near to the data - but that's always 
a separate OS process on C* nodes. Examples are Hadoop and Spark.

Am 22.07.2014 um 21:45 schrieb Rodgers, Hugh 
mailto:hugh.rodg...@lmco.com>>:



Hello -

I have a use case where I need to run the Cassandra Server as an OSGi bundle. I 
have been able to embed all of the Cassandra dependencies in an OSGi bundle and 
run it on Karaf container, but I am not happy with the approach I have thus far.

Since CassandraDaemon has System.exit() calls in it, if these execute it will 
bring down my entire OSGi container rather than just the bundle Cassandra is 
running in. I hacked up a copy of CassandraDaemon enough to get it to run in 
the bundle with no System.exit() calls, but the Cassandra StorageService is not 
"aware" of it, i.e., I cannot call the StorageService.registerDaemon(...) 
method because my copy of CassandraDaemon does not extend Apache's. hence I am 
getting exceptions when I do shutdown my container or restart the bundle 
because the StorageService and my CassandraDaemon are not "linked".

I am considering trying to extend Apache's CassandraDaemon and override its 
setup() method with a SecurityManager that disables System.exit() calls. This 
too sounds "hacky".

Does anyone have any better suggestions? Or know of an existing open source 
project that has successfully embedded CassandraServer in an OSGi bundle?

I am using Cassandra v2.0.7 and am currently using CQL (vs. Thrift).

Thanks -

Hugh



Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Peter Lin
@benedict - you're right that I've haven't requested permission to edit.
You're also right that I've given up on getting edit permission to
cassandra wiki. I've been struggling and struggled with "how" to manage
open source projects, so I totally get it. Managing projects is a thankless
job most of the time. Pleasing everyone is totally impossible. Apache isn't
alone in this. I've submitted stuff to google's open source projects in the
past and had it go into a black hole. We all struggle with managing open
source projects.

I am committed to contributing Cassandra community, but just not through
the wiki. There's lots of different ways to contribute. The jira tickets
I've submitted have gotten good responses generally. It does take several
days depending on how busy the committers are, but that's normal for all
projects.



On Wed, Jul 23, 2014 at 9:00 AM, Benedict Elliott Smith <
belliottsm...@datastax.com> wrote:

> Requesting a change is very different to requesting permission to edit
> (which, I note, still hasn't been made); we do our best to promote
> community engagement, so granting a privilege request has a different
> mental category to a random edit request, which is much more likely to be
> forgotten by any particular committer in the process of attending to their
> more pressing work.
>
> The relationship between committers and the community is debated at length
> in all projects, often by vocal individuals such as yourselves who are
> unhappy in some way with how the project is being run. However it is very
> hard to please everyone - most of the time we can't even please all the
> committers, and that is a much smaller and more homogenous group.
>
>
>
>
>
> On Wed, Jul 23, 2014 at 2:30 PM, Peter Lin  wrote:
>
>>
>> I sent a request to add a link my .Net driver for cassandra to the wiki
>> over 5 weeks back and no response at all.
>>
>> I sent another request way back in 2013 and got zero response. Again, I
>> totally understand people are busy and I'm just as guilty as everyone else
>> of letting requests slip by. It's the reality of contributing to open
>> source as a hobby. If I wasn't serious about contributing to cassandra
>> community, I wouldn't have spent 2.5 months porting Hector to C# manually.
>>
>> Perhaps the real cause is that some committers can't "empathise" with
>> others in the community?
>>
>>
>> On Wed, Jul 23, 2014 at 8:22 AM, Benedict Elliott Smith <
>> belliottsm...@datastax.com> wrote:
>>
>>> All requests I've seen in the past year to edit the wiki (admittedly
>>> only 2-3) have been answered promptly with editing privileges. Personally I
>>> don't have a major preference either way for policy - there are positives
>>> and negatives to each approach - but, like I said, raise it on the dev list
>>> and see if anybody else does.
>>>
>>> However I must admit I cannot empathise with your characterisation of
>>> requesting permission as 'begging', or a 'slap in the face', or that it is
>>> even particularly onerous. It is a slight psychological barrier, but in my
>>> personal experience when a psychological barrier as low as this prevents me
>>> from taking action, it's usually because I don't have as much desire to
>>> contribute as I thought I did.
>>>
>>>
>>>
>>>
>>> On Wed, Jul 23, 2014 at 1:54 PM, Peter Lin  wrote:
>>>

 I've submitted requests to edit the wiki in the past and nothing ever
 got done.

 Having been an apache committer and contributor over the years, I can
 totally understand that people are busy. I also understand that "most"
 developer find writing docs tedious.

 I'd rather not harass the committers about wiki edits, since I didn't
 like it when it happened to me in the past. That's why many apache projects
 keep their wiki's open. Honestly, as much as I find writing docs
 challenging and tedious, it's critical and important. For my other open
 source projects, I force myself to write docs.

 my point is, the wiki should be open and the barrier should be removed.
 Having to "beg/ask" to edit the wiki feels like a slap in the face to me,
 but maybe I'm alone in this. Then again, I've heard the same sentiment from
 other people about cassandra's wiki. The thing is, they just chalk it up to
 "cassandra committers don't give a crap about docs". I do my best to defend
 the committers and point out some are volunteers, but it does give the
 public a negative impression. I know the committers care about docs, but
 they don't always have time to do it.

 I know that given a choice between coding or writing docs, 90% of the
 time I'll choose coding. What I've decided instead is to document stuff on
 one of my blogs.  If someone gets lucky, maybe google will return the
 result. I keep asking myself "what's the point of closing a wiki?"



 On Wed, Jul 23, 2014 at 7:40 AM, Benedict Elliott Smith <
 belliottsm...@datastax.com> wrote:
>>

Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Ben Hood
On Wed, Jul 23, 2014 at 1:30 PM, Peter Lin  wrote:
>
> I sent a request to add a link my .Net driver for cassandra to the wiki
> over 5 weeks back and no response at all.
>

TL;DR There is something wrong with Cassandra information sharing, but I am
partly to blame.

My experience has not been too dissimilar to Peter's. I've had raised JIRA
tickets that haven't been answered and requests to the wiki that haven't
been answered. But I do appreciate that everybody is busy and things do
slip through the cracks sometimes, as they do with things I'm responsible
for.

That said, I have received help from community via this mailing list, for
which I am grateful. For example, when I was trying to get an update to the
Apache wiki, Brady from DataStax helped me out with the Planet Cassandra
wiki.

In defence of the people trying to manage Cassandra, some of the
limitations are down to the structure of Apache projects. I don't want to
start an issue tracker flame war, but I feel that a lot of oversight has
gone missing in Apache's JIRA installation. Last week I raised a ticket
that was (incorrectly) marked as a duplicate, due to an oversight in the
affected version field. The reason for this is the unhelpful way the JIRA
page was rendered - I can completely empathize with the person who closed
it.

Because of things like this, I always think four times over before raising
a ticket in JIRA. This is also not the right approach either, and I am
partly to blame. Just this week, a member of the community took the time to
do the right thing when raising a issue and they also went to effort of
supplying a patch for the issue I had chosen not to put into JIRA.

So I think there is something wrong with the way the community is sharing
its common knowledge, but I don't have a better suggestion. How are you
going to get a bunch of disparate people together to write cohesive
documentation that is in sync with the current release?

Furthermore, As a maintainer of a CQL driver, I have little if any contact
with any contact with any of the other driver maintainers. They must be out
there, but I don't know of a forum for them. It might not be needed, but
sometimes I feel like it should. So I'm just as guilty as everybody else of
not creating this community.


Re: CSV Import is taking huge time

2014-07-23 Thread Jack Krupansky
Is it compute bound or I/O bound?

What does your cluster look like?

-- Jack Krupansky

From: Akshay Ballarpure 
Sent: Wednesday, July 23, 2014 5:00 AM
To: user@cassandra.apache.org 
Subject: CSV Import is taking huge time

Hello, 
I am trying copy command in Cassandra to import CSV file in to DB, Import is 
taking huge time, any suggestion to improve it? 

id,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z 
100,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26 
101,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26 
 
-- 
-- 

there are ~ 50 K lines in this file , size is ~ 5 MB. 
  
I have created table as per below: 

create table csldata4 ( id int PRIMARY KEY,a int , b int, c int, d int, e int, 
f int, 
g int, h int,i int, j int, k int, l int,m int, n int, o 
int, p int, q int, r int, s int, t int, 
u int, v int, w int, x int, y int , z int); 
Copy Command: 

COPY csldata4 (id , a , b , c , d , e , f , g , h , i , j , k , l , m , n , o , 
p , q , r , s , t , u , v , w , x , y , z ) FROM 'csldata1.csv' WITH 
HEADER=TRUE; 
  
Issue here is it's taking huge time to import 

cqlsh:mykeyspace> COPY csldata (id , a , b , c , d , e , f , g , h , i , j , k 
, l , m , n , o , p , q , r , s , t , u , v , w , x , y , z ) FROM 
'csldata1.csv' WITH HEADER=TRUE; 
66215 rows imported in 1 minute and 31.044 seconds. 


Thanks & Regards
Akshay Ghanshyam Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarp...@tcs.com
Website: http://www.tcs.com

Experience certainty.IT Services
   Business Solutions
   Consulting
 
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Benedict Elliott Smith
Requesting a change is very different to requesting permission to edit
(which, I note, still hasn't been made); we do our best to promote
community engagement, so granting a privilege request has a different
mental category to a random edit request, which is much more likely to be
forgotten by any particular committer in the process of attending to their
more pressing work.

The relationship between committers and the community is debated at length
in all projects, often by vocal individuals such as yourselves who are
unhappy in some way with how the project is being run. However it is very
hard to please everyone - most of the time we can't even please all the
committers, and that is a much smaller and more homogenous group.





On Wed, Jul 23, 2014 at 2:30 PM, Peter Lin  wrote:

>
> I sent a request to add a link my .Net driver for cassandra to the wiki
> over 5 weeks back and no response at all.
>
> I sent another request way back in 2013 and got zero response. Again, I
> totally understand people are busy and I'm just as guilty as everyone else
> of letting requests slip by. It's the reality of contributing to open
> source as a hobby. If I wasn't serious about contributing to cassandra
> community, I wouldn't have spent 2.5 months porting Hector to C# manually.
>
> Perhaps the real cause is that some committers can't "empathise" with
> others in the community?
>
>
> On Wed, Jul 23, 2014 at 8:22 AM, Benedict Elliott Smith <
> belliottsm...@datastax.com> wrote:
>
>> All requests I've seen in the past year to edit the wiki (admittedly only
>> 2-3) have been answered promptly with editing privileges. Personally I
>> don't have a major preference either way for policy - there are positives
>> and negatives to each approach - but, like I said, raise it on the dev list
>> and see if anybody else does.
>>
>> However I must admit I cannot empathise with your characterisation of
>> requesting permission as 'begging', or a 'slap in the face', or that it is
>> even particularly onerous. It is a slight psychological barrier, but in my
>> personal experience when a psychological barrier as low as this prevents me
>> from taking action, it's usually because I don't have as much desire to
>> contribute as I thought I did.
>>
>>
>>
>>
>> On Wed, Jul 23, 2014 at 1:54 PM, Peter Lin  wrote:
>>
>>>
>>> I've submitted requests to edit the wiki in the past and nothing ever
>>> got done.
>>>
>>> Having been an apache committer and contributor over the years, I can
>>> totally understand that people are busy. I also understand that "most"
>>> developer find writing docs tedious.
>>>
>>> I'd rather not harass the committers about wiki edits, since I didn't
>>> like it when it happened to me in the past. That's why many apache projects
>>> keep their wiki's open. Honestly, as much as I find writing docs
>>> challenging and tedious, it's critical and important. For my other open
>>> source projects, I force myself to write docs.
>>>
>>> my point is, the wiki should be open and the barrier should be removed.
>>> Having to "beg/ask" to edit the wiki feels like a slap in the face to me,
>>> but maybe I'm alone in this. Then again, I've heard the same sentiment from
>>> other people about cassandra's wiki. The thing is, they just chalk it up to
>>> "cassandra committers don't give a crap about docs". I do my best to defend
>>> the committers and point out some are volunteers, but it does give the
>>> public a negative impression. I know the committers care about docs, but
>>> they don't always have time to do it.
>>>
>>> I know that given a choice between coding or writing docs, 90% of the
>>> time I'll choose coding. What I've decided instead is to document stuff on
>>> one of my blogs.  If someone gets lucky, maybe google will return the
>>> result. I keep asking myself "what's the point of closing a wiki?"
>>>
>>>
>>>
>>> On Wed, Jul 23, 2014 at 7:40 AM, Benedict Elliott Smith <
>>> belliottsm...@datastax.com> wrote:
>>>
 It only takes a moment to ask to be added as a wiki contributor; if you
 email the dev list or ask on irc, somebody with privileges will ordinarily
 add you within a day. It may be a psychological barrier, but it isn't
 really a practical one. Still, if you feel the policy is incorrect, raise
 this on the dev list also.


 On Wed, Jul 23, 2014 at 1:33 PM, Peter Lin  wrote:

>
> I've tried to contribute docs to Cassandra wiki in the past, but
> there's an obstacle.
>
> currently wiki.apache.org/cassandra is locked down, so only commiters
> can edit it. I really wish that wasn't the case, since it wastes time. the
> commiters are busy writing code. Having to email a commiter and ask them 
> to
> update it feels silly to me and kind of goes against openness. Back when I
> was active with JMeter, we decided to leave it open so that anyone can 
> edit
> the docs.
>
> I can't be the only one that wants to help make the docs be

Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Peter Lin
it supports CQL, but it's through thrift. I don't currently support native
protocol, since that was evolving rapidly last year when I made the port.

I state clearly on nectar-client wiki on google code that it supports CQL3
via thrift. I've pretty much given up on cassandra wiki. Using my blog to
share knowledge about Cassandra is quicker, easier and immediate. Downside
is it doesn't come up on google search, so it's really just for myself and
my friends.



On Wed, Jul 23, 2014 at 8:52 AM, Jack Krupansky 
wrote:

>   I do recall seeing your announcement of your driver, but I think it got
> lost in the discussion of whether it supported CQL. If you say it supports
> CQL and native protocol, I’m sure it will get very prompt attention.
>
> -- Jack Krupansky
>
>  *From:* Peter Lin 
> *Sent:* Wednesday, July 23, 2014 8:30 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Why is the cassandra documentation such poor quality?
>
>
> I sent a request to add a link my .Net driver for cassandra to the wiki
> over 5 weeks back and no response at all.
>
> I sent another request way back in 2013 and got zero response. Again, I
> totally understand people are busy and I'm just as guilty as everyone else
> of letting requests slip by. It's the reality of contributing to open
> source as a hobby. If I wasn't serious about contributing to cassandra
> community, I wouldn't have spent 2.5 months porting Hector to C# manually.
>
> Perhaps the real cause is that some committers can't "empathise" with
> others in the community?
>
>
> On Wed, Jul 23, 2014 at 8:22 AM, Benedict Elliott Smith <
> belliottsm...@datastax.com> wrote:
>
>> All requests I've seen in the past year to edit the wiki (admittedly only
>> 2-3) have been answered promptly with editing privileges. Personally I
>> don't have a major preference either way for policy - there are positives
>> and negatives to each approach - but, like I said, raise it on the dev list
>> and see if anybody else does.
>>
>> However I must admit I cannot empathise with your characterisation of
>> requesting permission as 'begging', or a 'slap in the face', or that it is
>> even particularly onerous. It is a slight psychological barrier, but in my
>> personal experience when a psychological barrier as low as this prevents me
>> from taking action, it's usually because I don't have as much desire to
>> contribute as I thought I did.
>>
>>
>>
>>
>> On Wed, Jul 23, 2014 at 1:54 PM, Peter Lin  wrote:
>>
>>>
>>> I've submitted requests to edit the wiki in the past and nothing ever
>>> got done.
>>>
>>> Having been an apache committer and contributor over the years, I can
>>> totally understand that people are busy. I also understand that "most"
>>> developer find writing docs tedious.
>>>
>>> I'd rather not harass the committers about wiki edits, since I didn't
>>> like it when it happened to me in the past. That's why many apache projects
>>> keep their wiki's open. Honestly, as much as I find writing docs
>>> challenging and tedious, it's critical and important. For my other open
>>> source projects, I force myself to write docs.
>>>
>>> my point is, the wiki should be open and the barrier should be removed.
>>> Having to "beg/ask" to edit the wiki feels like a slap in the face to me,
>>> but maybe I'm alone in this. Then again, I've heard the same sentiment from
>>> other people about cassandra's wiki. The thing is, they just chalk it up to
>>> "cassandra committers don't give a crap about docs". I do my best to defend
>>> the committers and point out some are volunteers, but it does give the
>>> public a negative impression. I know the committers care about docs, but
>>> they don't always have time to do it.
>>>
>>> I know that given a choice between coding or writing docs, 90% of the
>>> time I'll choose coding. What I've decided instead is to document stuff on
>>> one of my blogs.  If someone gets lucky, maybe google will return the
>>> result. I keep asking myself "what's the point of closing a wiki?"
>>>
>>>
>>>
>>>
>>> On Wed, Jul 23, 2014 at 7:40 AM, Benedict Elliott Smith <
>>> belliottsm...@datastax.com> wrote:
>>>
  It only takes a moment to ask to be added as a wiki contributor; if
 you email the dev list or ask on irc, somebody with privileges will
 ordinarily add you within a day. It may be a psychological barrier, but it
 isn't really a practical one. Still, if you feel the policy is incorrect,
 raise this on the dev list also.


 On Wed, Jul 23, 2014 at 1:33 PM, Peter Lin  wrote:

>
> I've tried to contribute docs to Cassandra wiki in the past, but
> there's an obstacle.
>
> currently wiki.apache.org/cassandra is locked down, so only commiters
> can edit it. I really wish that wasn't the case, since it wastes time. the
> commiters are busy writing code. Having to email a commiter and ask them 
> to
> update it feels silly to me and kind of goes against openness. Back when I
> was active with J

Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Jack Krupansky
I do recall seeing your announcement of your driver, but I think it got lost in 
the discussion of whether it supported CQL. If you say it supports CQL and 
native protocol, I’m sure it will get very prompt attention.

-- Jack Krupansky

From: Peter Lin 
Sent: Wednesday, July 23, 2014 8:30 AM
To: user@cassandra.apache.org 
Subject: Re: Why is the cassandra documentation such poor quality?


I sent a request to add a link my .Net driver for cassandra to the wiki over 5 
weeks back and no response at all.


I sent another request way back in 2013 and got zero response. Again, I totally 
understand people are busy and I'm just as guilty as everyone else of letting 
requests slip by. It's the reality of contributing to open source as a hobby. 
If I wasn't serious about contributing to cassandra community, I wouldn't have 
spent 2.5 months porting Hector to C# manually.


Perhaps the real cause is that some committers can't "empathise" with others in 
the community? 




On Wed, Jul 23, 2014 at 8:22 AM, Benedict Elliott Smith 
 wrote:

  All requests I've seen in the past year to edit the wiki (admittedly only 
2-3) have been answered promptly with editing privileges. Personally I don't 
have a major preference either way for policy - there are positives and 
negatives to each approach - but, like I said, raise it on the dev list and see 
if anybody else does. 

  However I must admit I cannot empathise with your characterisation of 
requesting permission as 'begging', or a 'slap in the face', or that it is even 
particularly onerous. It is a slight psychological barrier, but in my personal 
experience when a psychological barrier as low as this prevents me from taking 
action, it's usually because I don't have as much desire to contribute as I 
thought I did. 





  On Wed, Jul 23, 2014 at 1:54 PM, Peter Lin  wrote:


I've submitted requests to edit the wiki in the past and nothing ever got 
done.


Having been an apache committer and contributor over the years, I can 
totally understand that people are busy. I also understand that "most" 
developer find writing docs tedious.


I'd rather not harass the committers about wiki edits, since I didn't like 
it when it happened to me in the past. That's why many apache projects keep 
their wiki's open. Honestly, as much as I find writing docs challenging and 
tedious, it's critical and important. For my other open source projects, I 
force myself to write docs.


my point is, the wiki should be open and the barrier should be removed. 
Having to "beg/ask" to edit the wiki feels like a slap in the face to me, but 
maybe I'm alone in this. Then again, I've heard the same sentiment from other 
people about cassandra's wiki. The thing is, they just chalk it up to 
"cassandra committers don't give a crap about docs". I do my best to defend the 
committers and point out some are volunteers, but it does give the public a 
negative impression. I know the committers care about docs, but they don't 
always have time to do it.


I know that given a choice between coding or writing docs, 90% of the time 
I'll choose coding. What I've decided instead is to document stuff on one of my 
blogs.  If someone gets lucky, maybe google will return the result. I keep 
asking myself "what's the point of closing a wiki?"


 



On Wed, Jul 23, 2014 at 7:40 AM, Benedict Elliott Smith 
 wrote:

  It only takes a moment to ask to be added as a wiki contributor; if you 
email the dev list or ask on irc, somebody with privileges will ordinarily add 
you within a day. It may be a psychological barrier, but it isn't really a 
practical one. Still, if you feel the policy is incorrect, raise this on the 
dev list also.



  On Wed, Jul 23, 2014 at 1:33 PM, Peter Lin  wrote:


I've tried to contribute docs to Cassandra wiki in the past, but 
there's an obstacle.


currently wiki.apache.org/cassandra is locked down, so only commiters 
can edit it. I really wish that wasn't the case, since it wastes time. the 
commiters are busy writing code. Having to email a commiter and ask them to 
update it feels silly to me and kind of goes against openness. Back when I was 
active with JMeter, we decided to leave it open so that anyone can edit the 
docs.


I can't be the only one that wants to help make the docs better, but 
get frustrated with the wiki being closed.





On Wed, Jul 23, 2014 at 4:25 AM,  wrote:

  I would like to help out with the documentation of C*. How do I start?




  On Wed, Jul 23, 2014 at 12:46 PM, Robert Stupp  wrote:

Just a note:
If you have suggestions how to improve documentation on the 
datastax website, write them an email to d...@datastax.com. They appreciate 
proposals :)

Am 23.07.2014 um 09:10 schrieb Mark Reddy :


  Hi Kevin,

  The difference here is that the Apache Cassandra site is 
maintained by the community whereas the DataS

Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Peter Lin
I sent a request to add a link my .Net driver for cassandra to the wiki
over 5 weeks back and no response at all.

I sent another request way back in 2013 and got zero response. Again, I
totally understand people are busy and I'm just as guilty as everyone else
of letting requests slip by. It's the reality of contributing to open
source as a hobby. If I wasn't serious about contributing to cassandra
community, I wouldn't have spent 2.5 months porting Hector to C# manually.

Perhaps the real cause is that some committers can't "empathise" with
others in the community?


On Wed, Jul 23, 2014 at 8:22 AM, Benedict Elliott Smith <
belliottsm...@datastax.com> wrote:

> All requests I've seen in the past year to edit the wiki (admittedly only
> 2-3) have been answered promptly with editing privileges. Personally I
> don't have a major preference either way for policy - there are positives
> and negatives to each approach - but, like I said, raise it on the dev list
> and see if anybody else does.
>
> However I must admit I cannot empathise with your characterisation of
> requesting permission as 'begging', or a 'slap in the face', or that it is
> even particularly onerous. It is a slight psychological barrier, but in my
> personal experience when a psychological barrier as low as this prevents me
> from taking action, it's usually because I don't have as much desire to
> contribute as I thought I did.
>
>
>
>
> On Wed, Jul 23, 2014 at 1:54 PM, Peter Lin  wrote:
>
>>
>> I've submitted requests to edit the wiki in the past and nothing ever got
>> done.
>>
>> Having been an apache committer and contributor over the years, I can
>> totally understand that people are busy. I also understand that "most"
>> developer find writing docs tedious.
>>
>> I'd rather not harass the committers about wiki edits, since I didn't
>> like it when it happened to me in the past. That's why many apache projects
>> keep their wiki's open. Honestly, as much as I find writing docs
>> challenging and tedious, it's critical and important. For my other open
>> source projects, I force myself to write docs.
>>
>> my point is, the wiki should be open and the barrier should be removed.
>> Having to "beg/ask" to edit the wiki feels like a slap in the face to me,
>> but maybe I'm alone in this. Then again, I've heard the same sentiment from
>> other people about cassandra's wiki. The thing is, they just chalk it up to
>> "cassandra committers don't give a crap about docs". I do my best to defend
>> the committers and point out some are volunteers, but it does give the
>> public a negative impression. I know the committers care about docs, but
>> they don't always have time to do it.
>>
>> I know that given a choice between coding or writing docs, 90% of the
>> time I'll choose coding. What I've decided instead is to document stuff on
>> one of my blogs.  If someone gets lucky, maybe google will return the
>> result. I keep asking myself "what's the point of closing a wiki?"
>>
>>
>>
>> On Wed, Jul 23, 2014 at 7:40 AM, Benedict Elliott Smith <
>> belliottsm...@datastax.com> wrote:
>>
>>> It only takes a moment to ask to be added as a wiki contributor; if you
>>> email the dev list or ask on irc, somebody with privileges will ordinarily
>>> add you within a day. It may be a psychological barrier, but it isn't
>>> really a practical one. Still, if you feel the policy is incorrect, raise
>>> this on the dev list also.
>>>
>>>
>>> On Wed, Jul 23, 2014 at 1:33 PM, Peter Lin  wrote:
>>>

 I've tried to contribute docs to Cassandra wiki in the past, but
 there's an obstacle.

 currently wiki.apache.org/cassandra is locked down, so only commiters
 can edit it. I really wish that wasn't the case, since it wastes time. the
 commiters are busy writing code. Having to email a commiter and ask them to
 update it feels silly to me and kind of goes against openness. Back when I
 was active with JMeter, we decided to leave it open so that anyone can edit
 the docs.

 I can't be the only one that wants to help make the docs better, but
 get frustrated with the wiki being closed.



 On Wed, Jul 23, 2014 at 4:25 AM,  wrote:

> I would like to help out with the documentation of C*. How do I start?
>
>
> On Wed, Jul 23, 2014 at 12:46 PM, Robert Stupp  wrote:
>
>> Just a note:
>> If you have suggestions how to improve documentation on the datastax
>> website, write them an email to d...@datastax.com. They appreciate
>> proposals :)
>>
>> Am 23.07.2014 um 09:10 schrieb Mark Reddy :
>>
>> Hi Kevin,
>>
>> The difference here is that the Apache Cassandra site is maintained
>> by the community whereas the DataStax site is maintained by paid 
>> employees
>> with a vested interest in producing documentation.
>>
>> With DataStax having some comprehensive docs, I guess the desire for
>> people to maintain the Apache site h

Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Nicholas Okunew
I think the problem is a little deeper than that. I've been working with
cassandra for about 7 months now - it was very challenging to find out any
real information about using cassandra, and even harder to get clear
information on operating it. There's a truckload of reading you have to do,
and no one place you can find it.

Searching google largely turns up datastax blog posts and DSE docs, almost
all of it out of date (not to say the docs aren't up to date, but the
search results often result in old docs, and old blog posts).

Deeper searching usually results in a link to JIRA. No offense to anyone
involved, but when your first experience of trying to learn an open source
tool is the realisation that all the information you need is simply spread
across ~7000 jira tickets, it doesn't make the knowledge feel very
accessible.

As an example, finding a java driver with a useful abstraction was
non-trivial - it appeared on the surface that there wasn't really one, that
you had to write everything yourself on top of CQL. Now I (as everyone else
on this list knows) that datastax provide one. At the time I never found a
simple page that just pointed me in the direction, and showed a basic usage
example.

Another example is that there is constant confusion about nonclamenture on
this list, because naming has changed over time. If you don't know you're
reading old information, or what the significant changes are between
0.whatever, 1.whatever and 2.whatever its very hard to know whether you're
even googling for the right thing. Dynamic columns are a great example of
this. I think the fact that it keeps coming up on this list is a strong
indicator that the information is not available in a 'sufficient' way.

Another way of putting it is, when I started trying to learn about
cassandra, pretty much every piece of consumable information I was able to
find was out of date, but it wasn't always obvious that this was the case.

Having said all that, everything I've seen on this list points to prompt,
useful and friendly assistance, even for questions that are frequently
asked. I have no stake either way in what the rules on who can contribute
are, but I can definitely say I would have very much enjoyed a much softer
landing when trying to learn cassandra, from the basics all the way through
to the detail of ops.




On 23 July 2014 21:55, Jason Wee  wrote:

> I agree to the people here already sharing their ways to access
> documentation. If you are starter, you should better spend time to search
> for documentation (like using google) or hours to read. Then start ask
> specific question. Coming here kpkb about poor quality of documentation
> just does not cut it.
>
> If you find documentation is outdated, you can email to the people in
> charge and tell them what is wrong and what you think will improve. There
> are some documentation which is left there so that we can read and
> understand history where it came from and some may still use old version of
> cassandra.
>
>
> On Wed, Jul 23, 2014 at 7:49 PM, Jack Krupansky 
> wrote:
>
>>   And the simplest and easiest thing to do is simply email this list
>> when you see something wrong or missing in the DataStax Cassandra doc, or
>> for anything that is not adequately anywhere. I work with the doc people
>> there, so I can make sure they see corrections and improvements. And simply
>> sharing knowledge on this list is always a big step forward.
>>
>> -- Jack Krupansky
>>
>>  *From:* spa...@gmail.com
>> *Sent:* Wednesday, July 23, 2014 4:25 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Why is the cassandra documentation such poor quality?
>>
>>  I would like to help out with the documentation of C*. How do I start?
>>
>>
>> On Wed, Jul 23, 2014 at 12:46 PM, Robert Stupp  wrote:
>>
>>>  Just a note:
>>> If you have suggestions how to improve documentation on the datastax
>>> website, write them an email to d...@datastax.com. They appreciate
>>> proposals :)
>>>
>>>  Am 23.07.2014 um 09:10 schrieb Mark Reddy :
>>>
>>>  Hi Kevin,
>>>
>>> The difference here is that the Apache Cassandra site is maintained by
>>> the community whereas the DataStax site is maintained by paid employees
>>> with a vested interest in producing documentation.
>>>
>>> With DataStax having some comprehensive docs, I guess the desire for
>>> people to maintain the Apache site has dwindled. However, if you are
>>> interested in contributing to it and bringing it back up to standard you
>>> can, thus is the freedom of open source.
>>>
>>>
>>> Mark
>>>
>>>
>>> On Wed, Jul 23, 2014 at 2:54 AM, Kevin Burton 
>>> wrote:
>>>
  This document:

 https://wiki.apache.org/cassandra/Operations

 … for example.  Is extremely out dated… does NOT reflect 2.x releases
 certainly.  Mentions commands that are long since removed/deprecated.

 Instead of giving bad documentation, maybe remove this and mark it as
 obsolete.

 The datastax documentation… is … acceptable 

Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Benedict Elliott Smith
All requests I've seen in the past year to edit the wiki (admittedly only
2-3) have been answered promptly with editing privileges. Personally I
don't have a major preference either way for policy - there are positives
and negatives to each approach - but, like I said, raise it on the dev list
and see if anybody else does.

However I must admit I cannot empathise with your characterisation of
requesting permission as 'begging', or a 'slap in the face', or that it is
even particularly onerous. It is a slight psychological barrier, but in my
personal experience when a psychological barrier as low as this prevents me
from taking action, it's usually because I don't have as much desire to
contribute as I thought I did.




On Wed, Jul 23, 2014 at 1:54 PM, Peter Lin  wrote:

>
> I've submitted requests to edit the wiki in the past and nothing ever got
> done.
>
> Having been an apache committer and contributor over the years, I can
> totally understand that people are busy. I also understand that "most"
> developer find writing docs tedious.
>
> I'd rather not harass the committers about wiki edits, since I didn't like
> it when it happened to me in the past. That's why many apache projects keep
> their wiki's open. Honestly, as much as I find writing docs challenging and
> tedious, it's critical and important. For my other open source projects, I
> force myself to write docs.
>
> my point is, the wiki should be open and the barrier should be removed.
> Having to "beg/ask" to edit the wiki feels like a slap in the face to me,
> but maybe I'm alone in this. Then again, I've heard the same sentiment from
> other people about cassandra's wiki. The thing is, they just chalk it up to
> "cassandra committers don't give a crap about docs". I do my best to defend
> the committers and point out some are volunteers, but it does give the
> public a negative impression. I know the committers care about docs, but
> they don't always have time to do it.
>
> I know that given a choice between coding or writing docs, 90% of the time
> I'll choose coding. What I've decided instead is to document stuff on one
> of my blogs.  If someone gets lucky, maybe google will return the result. I
> keep asking myself "what's the point of closing a wiki?"
>
>
>
> On Wed, Jul 23, 2014 at 7:40 AM, Benedict Elliott Smith <
> belliottsm...@datastax.com> wrote:
>
>> It only takes a moment to ask to be added as a wiki contributor; if you
>> email the dev list or ask on irc, somebody with privileges will ordinarily
>> add you within a day. It may be a psychological barrier, but it isn't
>> really a practical one. Still, if you feel the policy is incorrect, raise
>> this on the dev list also.
>>
>>
>> On Wed, Jul 23, 2014 at 1:33 PM, Peter Lin  wrote:
>>
>>>
>>> I've tried to contribute docs to Cassandra wiki in the past, but there's
>>> an obstacle.
>>>
>>> currently wiki.apache.org/cassandra is locked down, so only commiters
>>> can edit it. I really wish that wasn't the case, since it wastes time. the
>>> commiters are busy writing code. Having to email a commiter and ask them to
>>> update it feels silly to me and kind of goes against openness. Back when I
>>> was active with JMeter, we decided to leave it open so that anyone can edit
>>> the docs.
>>>
>>> I can't be the only one that wants to help make the docs better, but get
>>> frustrated with the wiki being closed.
>>>
>>>
>>>
>>> On Wed, Jul 23, 2014 at 4:25 AM,  wrote:
>>>
 I would like to help out with the documentation of C*. How do I start?


 On Wed, Jul 23, 2014 at 12:46 PM, Robert Stupp  wrote:

> Just a note:
> If you have suggestions how to improve documentation on the datastax
> website, write them an email to d...@datastax.com. They appreciate
> proposals :)
>
> Am 23.07.2014 um 09:10 schrieb Mark Reddy :
>
> Hi Kevin,
>
> The difference here is that the Apache Cassandra site is maintained by
> the community whereas the DataStax site is maintained by paid employees
> with a vested interest in producing documentation.
>
> With DataStax having some comprehensive docs, I guess the desire for
> people to maintain the Apache site has dwindled. However, if you are
> interested in contributing to it and bringing it back up to standard you
> can, thus is the freedom of open source.
>
>
> Mark
>
>
> On Wed, Jul 23, 2014 at 2:54 AM, Kevin Burton 
> wrote:
>
>> This document:
>>
>> https://wiki.apache.org/cassandra/Operations
>>
>> … for example.  Is extremely out dated… does NOT reflect 2.x releases
>> certainly.  Mentions commands that are long since removed/deprecated.
>>
>> Instead of giving bad documentation, maybe remove this and mark it as
>> obsolete.
>>
>> The datastax documentation… is … acceptable I guess.  My main
>> criticism there is that a lot of it it is in their blog.
>>
>> Kevin
>>
>> --
>>

Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Jason Wee
I agree to the people here already sharing their ways to access
documentation. If you are starter, you should better spend time to search
for documentation (like using google) or hours to read. Then start ask
specific question. Coming here kpkb about poor quality of documentation
just does not cut it.

If you find documentation is outdated, you can email to the people in
charge and tell them what is wrong and what you think will improve. There
are some documentation which is left there so that we can read and
understand history where it came from and some may still use old version of
cassandra.


On Wed, Jul 23, 2014 at 7:49 PM, Jack Krupansky 
wrote:

>   And the simplest and easiest thing to do is simply email this list when
> you see something wrong or missing in the DataStax Cassandra doc, or for
> anything that is not adequately anywhere. I work with the doc people there,
> so I can make sure they see corrections and improvements. And simply
> sharing knowledge on this list is always a big step forward.
>
> -- Jack Krupansky
>
>  *From:* spa...@gmail.com
> *Sent:* Wednesday, July 23, 2014 4:25 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Why is the cassandra documentation such poor quality?
>
>  I would like to help out with the documentation of C*. How do I start?
>
>
> On Wed, Jul 23, 2014 at 12:46 PM, Robert Stupp  wrote:
>
>>  Just a note:
>> If you have suggestions how to improve documentation on the datastax
>> website, write them an email to d...@datastax.com. They appreciate
>> proposals :)
>>
>>  Am 23.07.2014 um 09:10 schrieb Mark Reddy :
>>
>>  Hi Kevin,
>>
>> The difference here is that the Apache Cassandra site is maintained by
>> the community whereas the DataStax site is maintained by paid employees
>> with a vested interest in producing documentation.
>>
>> With DataStax having some comprehensive docs, I guess the desire for
>> people to maintain the Apache site has dwindled. However, if you are
>> interested in contributing to it and bringing it back up to standard you
>> can, thus is the freedom of open source.
>>
>>
>> Mark
>>
>>
>> On Wed, Jul 23, 2014 at 2:54 AM, Kevin Burton  wrote:
>>
>>>  This document:
>>>
>>> https://wiki.apache.org/cassandra/Operations
>>>
>>> … for example.  Is extremely out dated… does NOT reflect 2.x releases
>>> certainly.  Mentions commands that are long since removed/deprecated.
>>>
>>> Instead of giving bad documentation, maybe remove this and mark it as
>>> obsolete.
>>>
>>> The datastax documentation… is … acceptable I guess.  My main criticism
>>> there is that a lot of it it is in their blog.
>>>
>>> Kevin
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com 
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>>  
>>>
>>>
>>
>>
>>
>>
>
>
>
> --
> http://spawgi.wordpress.com
> We can do it and do it better.
>


Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Peter Lin
I've submitted requests to edit the wiki in the past and nothing ever got
done.

Having been an apache committer and contributor over the years, I can
totally understand that people are busy. I also understand that "most"
developer find writing docs tedious.

I'd rather not harass the committers about wiki edits, since I didn't like
it when it happened to me in the past. That's why many apache projects keep
their wiki's open. Honestly, as much as I find writing docs challenging and
tedious, it's critical and important. For my other open source projects, I
force myself to write docs.

my point is, the wiki should be open and the barrier should be removed.
Having to "beg/ask" to edit the wiki feels like a slap in the face to me,
but maybe I'm alone in this. Then again, I've heard the same sentiment from
other people about cassandra's wiki. The thing is, they just chalk it up to
"cassandra committers don't give a crap about docs". I do my best to defend
the committers and point out some are volunteers, but it does give the
public a negative impression. I know the committers care about docs, but
they don't always have time to do it.

I know that given a choice between coding or writing docs, 90% of the time
I'll choose coding. What I've decided instead is to document stuff on one
of my blogs.  If someone gets lucky, maybe google will return the result. I
keep asking myself "what's the point of closing a wiki?"



On Wed, Jul 23, 2014 at 7:40 AM, Benedict Elliott Smith <
belliottsm...@datastax.com> wrote:

> It only takes a moment to ask to be added as a wiki contributor; if you
> email the dev list or ask on irc, somebody with privileges will ordinarily
> add you within a day. It may be a psychological barrier, but it isn't
> really a practical one. Still, if you feel the policy is incorrect, raise
> this on the dev list also.
>
>
> On Wed, Jul 23, 2014 at 1:33 PM, Peter Lin  wrote:
>
>>
>> I've tried to contribute docs to Cassandra wiki in the past, but there's
>> an obstacle.
>>
>> currently wiki.apache.org/cassandra is locked down, so only commiters
>> can edit it. I really wish that wasn't the case, since it wastes time. the
>> commiters are busy writing code. Having to email a commiter and ask them to
>> update it feels silly to me and kind of goes against openness. Back when I
>> was active with JMeter, we decided to leave it open so that anyone can edit
>> the docs.
>>
>> I can't be the only one that wants to help make the docs better, but get
>> frustrated with the wiki being closed.
>>
>>
>>
>> On Wed, Jul 23, 2014 at 4:25 AM,  wrote:
>>
>>> I would like to help out with the documentation of C*. How do I start?
>>>
>>>
>>> On Wed, Jul 23, 2014 at 12:46 PM, Robert Stupp  wrote:
>>>
 Just a note:
 If you have suggestions how to improve documentation on the datastax
 website, write them an email to d...@datastax.com. They appreciate
 proposals :)

 Am 23.07.2014 um 09:10 schrieb Mark Reddy :

 Hi Kevin,

 The difference here is that the Apache Cassandra site is maintained by
 the community whereas the DataStax site is maintained by paid employees
 with a vested interest in producing documentation.

 With DataStax having some comprehensive docs, I guess the desire for
 people to maintain the Apache site has dwindled. However, if you are
 interested in contributing to it and bringing it back up to standard you
 can, thus is the freedom of open source.


 Mark


 On Wed, Jul 23, 2014 at 2:54 AM, Kevin Burton 
 wrote:

> This document:
>
> https://wiki.apache.org/cassandra/Operations
>
> … for example.  Is extremely out dated… does NOT reflect 2.x releases
> certainly.  Mentions commands that are long since removed/deprecated.
>
> Instead of giving bad documentation, maybe remove this and mark it as
> obsolete.
>
> The datastax documentation… is … acceptable I guess.  My main
> criticism there is that a lot of it it is in their blog.
>
> Kevin
>
> --
>
> Founder/CEO Spinn3r.com 
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 
>
>


>>>
>>>
>>> --
>>> http://spawgi.wordpress.com
>>> We can do it and do it better.
>>>
>>
>>
>


Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Jack Krupansky
And the simplest and easiest thing to do is simply email this list when you see 
something wrong or missing in the DataStax Cassandra doc, or for anything that 
is not adequately anywhere. I work with the doc people there, so I can make 
sure they see corrections and improvements. And simply sharing knowledge on 
this list is always a big step forward.

-- Jack Krupansky

From: spa...@gmail.com 
Sent: Wednesday, July 23, 2014 4:25 AM
To: user@cassandra.apache.org 
Subject: Re: Why is the cassandra documentation such poor quality?

I would like to help out with the documentation of C*. How do I start?




On Wed, Jul 23, 2014 at 12:46 PM, Robert Stupp  wrote:

  Just a note:
  If you have suggestions how to improve documentation on the datastax website, 
write them an email to d...@datastax.com. They appreciate proposals :)

  Am 23.07.2014 um 09:10 schrieb Mark Reddy :


Hi Kevin,

The difference here is that the Apache Cassandra site is maintained by the 
community whereas the DataStax site is maintained by paid employees with a 
vested interest in producing documentation. 

With DataStax having some comprehensive docs, I guess the desire for people 
to maintain the Apache site has dwindled. However, if you are interested in 
contributing to it and bringing it back up to standard you can, thus is the 
freedom of open source. 


Mark



On Wed, Jul 23, 2014 at 2:54 AM, Kevin Burton  wrote:

  This document:

  https://wiki.apache.org/cassandra/Operations


  … for example.  Is extremely out dated… does NOT reflect 2.x releases 
certainly.  Mentions commands that are long since removed/deprecated.

  Instead of giving bad documentation, maybe remove this and mark it as 
obsolete.

  The datastax documentation… is … acceptable I guess.  My main criticism 
there is that a lot of it it is in their blog. 

  Kevin


  -- 


  Founder/CEO Spinn3r.com

  Location: San Francisco, CA

  blog: http://burtonator.wordpress.com
  … or check out my Google+ profile







-- 
http://spawgi.wordpress.com
We can do it and do it better. 

Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Benedict Elliott Smith
It only takes a moment to ask to be added as a wiki contributor; if you
email the dev list or ask on irc, somebody with privileges will ordinarily
add you within a day. It may be a psychological barrier, but it isn't
really a practical one. Still, if you feel the policy is incorrect, raise
this on the dev list also.


On Wed, Jul 23, 2014 at 1:33 PM, Peter Lin  wrote:

>
> I've tried to contribute docs to Cassandra wiki in the past, but there's
> an obstacle.
>
> currently wiki.apache.org/cassandra is locked down, so only commiters can
> edit it. I really wish that wasn't the case, since it wastes time. the
> commiters are busy writing code. Having to email a commiter and ask them to
> update it feels silly to me and kind of goes against openness. Back when I
> was active with JMeter, we decided to leave it open so that anyone can edit
> the docs.
>
> I can't be the only one that wants to help make the docs better, but get
> frustrated with the wiki being closed.
>
>
>
> On Wed, Jul 23, 2014 at 4:25 AM,  wrote:
>
>> I would like to help out with the documentation of C*. How do I start?
>>
>>
>> On Wed, Jul 23, 2014 at 12:46 PM, Robert Stupp  wrote:
>>
>>> Just a note:
>>> If you have suggestions how to improve documentation on the datastax
>>> website, write them an email to d...@datastax.com. They appreciate
>>> proposals :)
>>>
>>> Am 23.07.2014 um 09:10 schrieb Mark Reddy :
>>>
>>> Hi Kevin,
>>>
>>> The difference here is that the Apache Cassandra site is maintained by
>>> the community whereas the DataStax site is maintained by paid employees
>>> with a vested interest in producing documentation.
>>>
>>> With DataStax having some comprehensive docs, I guess the desire for
>>> people to maintain the Apache site has dwindled. However, if you are
>>> interested in contributing to it and bringing it back up to standard you
>>> can, thus is the freedom of open source.
>>>
>>>
>>> Mark
>>>
>>>
>>> On Wed, Jul 23, 2014 at 2:54 AM, Kevin Burton 
>>> wrote:
>>>
 This document:

 https://wiki.apache.org/cassandra/Operations

 … for example.  Is extremely out dated… does NOT reflect 2.x releases
 certainly.  Mentions commands that are long since removed/deprecated.

 Instead of giving bad documentation, maybe remove this and mark it as
 obsolete.

 The datastax documentation… is … acceptable I guess.  My main criticism
 there is that a lot of it it is in their blog.

 Kevin

 --

 Founder/CEO Spinn3r.com 
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 


>>>
>>>
>>
>>
>> --
>> http://spawgi.wordpress.com
>> We can do it and do it better.
>>
>
>


Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Peter Lin
I've tried to contribute docs to Cassandra wiki in the past, but there's an
obstacle.

currently wiki.apache.org/cassandra is locked down, so only commiters can
edit it. I really wish that wasn't the case, since it wastes time. the
commiters are busy writing code. Having to email a commiter and ask them to
update it feels silly to me and kind of goes against openness. Back when I
was active with JMeter, we decided to leave it open so that anyone can edit
the docs.

I can't be the only one that wants to help make the docs better, but get
frustrated with the wiki being closed.



On Wed, Jul 23, 2014 at 4:25 AM,  wrote:

> I would like to help out with the documentation of C*. How do I start?
>
>
> On Wed, Jul 23, 2014 at 12:46 PM, Robert Stupp  wrote:
>
>> Just a note:
>> If you have suggestions how to improve documentation on the datastax
>> website, write them an email to d...@datastax.com. They appreciate
>> proposals :)
>>
>> Am 23.07.2014 um 09:10 schrieb Mark Reddy :
>>
>> Hi Kevin,
>>
>> The difference here is that the Apache Cassandra site is maintained by
>> the community whereas the DataStax site is maintained by paid employees
>> with a vested interest in producing documentation.
>>
>> With DataStax having some comprehensive docs, I guess the desire for
>> people to maintain the Apache site has dwindled. However, if you are
>> interested in contributing to it and bringing it back up to standard you
>> can, thus is the freedom of open source.
>>
>>
>> Mark
>>
>>
>> On Wed, Jul 23, 2014 at 2:54 AM, Kevin Burton  wrote:
>>
>>> This document:
>>>
>>> https://wiki.apache.org/cassandra/Operations
>>>
>>> … for example.  Is extremely out dated… does NOT reflect 2.x releases
>>> certainly.  Mentions commands that are long since removed/deprecated.
>>>
>>> Instead of giving bad documentation, maybe remove this and mark it as
>>> obsolete.
>>>
>>> The datastax documentation… is … acceptable I guess.  My main criticism
>>> there is that a lot of it it is in their blog.
>>>
>>> Kevin
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com 
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>> 
>>>
>>>
>>
>>
>
>
> --
> http://spawgi.wordpress.com
> We can do it and do it better.
>


Re: horizontal query scaling issues follow on

2014-07-23 Thread Benedict Elliott Smith
>
> if you find that adding nodes causes performance to degrade I would
> suspect that you are querying data in one CQL statement that is spread over
> multiple partitions


This is exactly what is happening. The better way to query multiple
partitions is to simply despatch multiple queries (asynchronously), so that
the driver can route them directly to the owning node. With an IN query a
node owning one of the partitions is contacted, and this node then forwards
any requests it cannot service to their owning nodes, waits for their
response, and then returns the combined result to you, resulting in greater
work cluster-wide, and (more importantly here) greater latency for each
query which will reduce throughput when you are not at the maximum capacity
of the cluster.

Note that you will not see linear improvement in performance until you are
maxing out the throughput of the cluster.


On Wed, Jul 23, 2014 at 11:48 AM, Diane Griffith 
wrote:

> I posted the query wrong, I gave the query for 1 key versus the large
> batch of ids like I was testing.
>
> What it was using for large batch was IN, so
>
> Select * from foo where key IN  and col_name='LATEST
>
> So after breaking it down and reading as much as I can with regard to our
>
> - schema, dynamic wide rows (but should not equal more columns per row
> than what documentation warned about)
> - general configuration and recommended settings
>
> Out of that I then read up on the anti patterns and the Select IN was
> mentioned.  It sounds like it could impact the numbers.  So for our query
> test pattern and simple test cluster that yes there was throughput increase
> of 1 Node to 2 Nodes and potentially can explain why things decrease going
> from 2 Nodes to 4 Nodes.  Does that seem the likely culprit?
>
> Is there an alternative for batching or selecting a large key set in a
> clustered environment?
>
> Thanks,
> Diane
>
>
>
> On Fri, Jul 18, 2014 at 2:43 PM, Diane Griffith 
> wrote:
>
>> Okay here are the data samples.
>>
>> Column Family Schema again:
>> CREATE TABLE IF NOT EXISTS foo (key text, col_name text, col_value text,
>> PRIMARY KEY(key, col_name))
>>
>> CQL Write:
>>
>> INSERT INTO foo (key, col_name,col_value) VALUES
>> (“Type1:1109dccb-169b-40ef-b7f8-d072f04d8139”,”
>> HISTORY:2011-04-20T09:19:13.072-0400”,
>>
>> “{"key":"1109dccb-169b-40ef-b7f8-d072f04d8139","keyType":"
>> Type1","state":"state1","timestamp":1303305553072,"eventId":40902,"executionId":31082}”)
>>
>>
>>
>> CQL Read:
>>
>>
>>
>> SELECT col_value from foo where
>> key=”Type1:1109dccb-169b-40ef-b7f8-d072f04d8139“ and col_name=”LATEST“
>>
>>
>>
>> Read result from above query:
>>
>>
>>
>> {"key":"1109dccb-169b-40ef-b7f8-d072f04d8139","keyType":"
>> Type1","state":"state3","timestamp":1303446284614,"eventId":7688,"executionId":40847}
>>
>>
>>
>>
>>
>> CQL snippet example of select * from foo limit 8:
>>
>>
>>
>> Key  | col_name  |
>> col_value
>>
>>
>>
>>
>>
>> Type1:1109dccb-169b-40ef-b7f8-d072f04d8139  |
>>  HISTORY:2011-04-20T09:19:13.072-0400  |
>> {"key":"1109dccb-169b-40ef-b7f8-d072f04d8139","keyType":"
>> Type1","state":"state1","timestamp":1303305553072,"eventId":40902,"executionId":31082}
>>
>>
>>  Type1:1109dccb-169b-40ef-b7f8-d072f04d8139  |
>> HISTORY:2011-04-20T13:47:33.512-0400  |
>>{"key":"1109dccb-169b-40ef-b7f8-d072f04d8139","keyType":"
>> Type1","state":"state2","timestamp":1303321653512,"eventId":32660,"executionId":33510}
>>
>>
>>  Type1:1109dccb-169b-40ef-b7f8-d072f04d8139  |
>> HISTORY:2011-04-22T00:24:44.614-0400  |
>>{"key":"1109dccb-169b-40ef-b7f8-d072f04d8139","keyType":"
>> Type1","state":"state3","timestamp":1303446284614,"eventId":7688,"executionId":40847}
>>
>>
>>  Type1:1109dccb-169b-40ef-b7f8-d072f04d8139  | LATEST
>>  | {"key":"1109dccb-169b-40ef-b7f8-d072f04d8139","keyType":"
>> Type1","state":"state3","timestamp":1303446284614,"eventId":7688,"executionId":40847}
>>
>>
>>   Type2:e876d44d-246f-40c5-b5a3-4d0eb31db00d|
>>HISTORY:2010-08-26T03:45:43.366-0400   |
>>  {"key":"e876d44d-246f-40c5-b5a3-4d0eb31db00d","keyType":"
>> Type2","state":"state1","timestamp":1282808743366,"eventId":2,"executionId":6214}
>>
>>
>>  Type2:e876d44d-246f-40c5-b5a3-4d0eb31db00d |
>>  HISTORY:2010-08-26T04:58:46.810-0400   |
>>   {"key":"e876d44d-246f-40c5-b5a3-4d0eb31db00d","keyType":"
>> Type2","state":"state2","timestamp":1282813126810,"eventId":48575,"executionId":22318}
>>
>>
>>  Type2:e876d44d-246f-40c5-b5a3-4d0eb31db00d |
>>  HISTORY:2010-08-27T22:39:51.036-0400   |
>>  {"key":"e876d44d-246f-40c5-b5a3-4d0eb31db00d","keyType":"
>> Type2","state":"state2","timestamp":1282963191036,"eventId":21960,"executionId":5067}
>>
>>
>>  Type2:e876d44d-246f-40c5-b5a3-4d0eb31db00d |LATEST|
>> {"key":"e876d44d-246f-40c5-b5a3-4d0eb31db00d","keyType":"
>> Type2","state":"state2","timestamp":1282963191036,"eventId":21960,"executionId":5067

Re: Should PREPARE QUERY return metadata for the query result?

2014-07-23 Thread Ben Hood
On Wed, Jul 23, 2014 at 12:07 PM, Ben Hood <0x6e6...@gmail.com> wrote:
> Or have I just been looking at the wrong version of the spec all along?

So it turns out that this is a case of PEBCAK: v2 of the protocol is
formulated thusly:

4.2.5.4. Prepared

  The result to a PREPARE message. The rest of the body of a Prepared result is:

  where:
-  is [short bytes] representing the prepared query ID.
-  is defined exactly as for a Rows RESULT (See section
4.2.5.2; you
  can however assume that the Has_more_pages flag is always off) and
  is the specification for the variable bound in this prepare statement.
-  is defined exactly as  but correspond to the
  metadata for the resultSet that execute this query will yield. Note that
   may be empty (have the No_metadata flag and 0
columns, See
  section 4.2.5.2) and will be for any query that is not a Select. There is
  in fact never a guarantee that this will non-empty so client
should protect
  themselves accordingly. The presence of this information is an
  optimization that allows to later execute the statement that has been
  prepared without requesting the metadata (Skip_metadata flag in EXECUTE).
  Clients can safely discard this metadata if they do not want to take
  advantage of that optimization.

  Note that prepared query ID return is global to the node on which the query
  has been prepared. It can be used on any connection to that node and this
  until the node is restarted (after which the query must be reprepared).

Sorry for the noise on the list.


Re: Should PREPARE QUERY return metadata for the query result?

2014-07-23 Thread Ben Hood
On Wed, Jul 23, 2014 at 11:14 AM, Ben Hood <0x6e6...@gmail.com> wrote:
> But I was wondering if we were doing something wrong by not returning
> the result meta data from the PREPARE result (if it does indeed
> exist).

Looking into this a bit further, it looks like the client driver needs
to deserealize 2 blocks of meta data from PREPARE message, i.e. invoke
whatever routine parses the meta data twice to get result column meta
data.

So if this is correct, then the correct frame definition of the RESULT
of a PREPARE message should look like this:

The result to a PREPARE message. The rest of the body of a Prepared result is:

  where:
-  is [short bytes] representing the prepared query ID.
-  is defined exactly as for a Rows RESULT (See section
4.2.5.2) - this represents the type information for the query
arguments
-  is defined exactly as for a Rows RESULT (See section
4.2.5.2) - this represents the type information for the query result
columns

Does this make sense to anybody?

Or have I just been looking at the wrong version of the spec all along?


Should PREPARE QUERY return metadata for the query result?

2014-07-23 Thread Ben Hood
Hi all,

I'm looking at the specification of statement preparation (section
4.2.5.4 of the CQL protocol) and I'm wondering whether the metadata
result of the PREPARE query only returns column information for the
query arguments, and not for the columns of the actual query result.

The background is that we're changing gocql to expose this column
information to applications in a more useful way and we've noticed
that we only get the query argument meta data back from the server.

In practice, this might not be an issue, since the RESULT ROWS
contains the necessary column metadata.

But I was wondering if we were doing something wrong by not returning
the result meta data from the PREPARE result (if it does indeed
exist).

Can anybody shed any light on this?

Thanks,

Ben


Re: horizontal query scaling issues follow on

2014-07-23 Thread Diane Griffith
I posted the query wrong, I gave the query for 1 key versus the large batch
of ids like I was testing.

What it was using for large batch was IN, so

Select * from foo where key IN  and col_name='LATEST

So after breaking it down and reading as much as I can with regard to our

- schema, dynamic wide rows (but should not equal more columns per row than
what documentation warned about)
- general configuration and recommended settings

Out of that I then read up on the anti patterns and the Select IN was
mentioned.  It sounds like it could impact the numbers.  So for our query
test pattern and simple test cluster that yes there was throughput increase
of 1 Node to 2 Nodes and potentially can explain why things decrease going
from 2 Nodes to 4 Nodes.  Does that seem the likely culprit?

Is there an alternative for batching or selecting a large key set in a
clustered environment?

Thanks,
Diane



On Fri, Jul 18, 2014 at 2:43 PM, Diane Griffith 
wrote:

> Okay here are the data samples.
>
> Column Family Schema again:
> CREATE TABLE IF NOT EXISTS foo (key text, col_name text, col_value text,
> PRIMARY KEY(key, col_name))
>
> CQL Write:
>
> INSERT INTO foo (key, col_name,col_value) VALUES
> (“Type1:1109dccb-169b-40ef-b7f8-d072f04d8139”,”
> HISTORY:2011-04-20T09:19:13.072-0400”,
>
> “{"key":"1109dccb-169b-40ef-b7f8-d072f04d8139","keyType":"
> Type1","state":"state1","timestamp":1303305553072,"eventId":40902,"executionId":31082}”)
>
>
>
> CQL Read:
>
>
>
> SELECT col_value from foo where
> key=”Type1:1109dccb-169b-40ef-b7f8-d072f04d8139“ and col_name=”LATEST“
>
>
>
> Read result from above query:
>
>
>
> {"key":"1109dccb-169b-40ef-b7f8-d072f04d8139","keyType":"
> Type1","state":"state3","timestamp":1303446284614,"eventId":7688,"executionId":40847}
>
>
>
>
>
> CQL snippet example of select * from foo limit 8:
>
>
>
> Key  | col_name  |
> col_value
>
>
>
>
>
> Type1:1109dccb-169b-40ef-b7f8-d072f04d8139  |
>  HISTORY:2011-04-20T09:19:13.072-0400  |
> {"key":"1109dccb-169b-40ef-b7f8-d072f04d8139","keyType":"
> Type1","state":"state1","timestamp":1303305553072,"eventId":40902,"executionId":31082}
>
>
>  Type1:1109dccb-169b-40ef-b7f8-d072f04d8139  |
> HISTORY:2011-04-20T13:47:33.512-0400  |
>{"key":"1109dccb-169b-40ef-b7f8-d072f04d8139","keyType":"
> Type1","state":"state2","timestamp":1303321653512,"eventId":32660,"executionId":33510}
>
>
>  Type1:1109dccb-169b-40ef-b7f8-d072f04d8139  |
> HISTORY:2011-04-22T00:24:44.614-0400  |
>{"key":"1109dccb-169b-40ef-b7f8-d072f04d8139","keyType":"
> Type1","state":"state3","timestamp":1303446284614,"eventId":7688,"executionId":40847}
>
>
>  Type1:1109dccb-169b-40ef-b7f8-d072f04d8139  | LATEST
>  | {"key":"1109dccb-169b-40ef-b7f8-d072f04d8139","keyType":"
> Type1","state":"state3","timestamp":1303446284614,"eventId":7688,"executionId":40847}
>
>
>   Type2:e876d44d-246f-40c5-b5a3-4d0eb31db00d|
>HISTORY:2010-08-26T03:45:43.366-0400   |
>  {"key":"e876d44d-246f-40c5-b5a3-4d0eb31db00d","keyType":"
> Type2","state":"state1","timestamp":1282808743366,"eventId":2,"executionId":6214}
>
>
>  Type2:e876d44d-246f-40c5-b5a3-4d0eb31db00d |
>  HISTORY:2010-08-26T04:58:46.810-0400   |
>   {"key":"e876d44d-246f-40c5-b5a3-4d0eb31db00d","keyType":"
> Type2","state":"state2","timestamp":1282813126810,"eventId":48575,"executionId":22318}
>
>
>  Type2:e876d44d-246f-40c5-b5a3-4d0eb31db00d |
>  HISTORY:2010-08-27T22:39:51.036-0400   |
>  {"key":"e876d44d-246f-40c5-b5a3-4d0eb31db00d","keyType":"
> Type2","state":"state2","timestamp":1282963191036,"eventId":21960,"executionId":5067}
>
>
>  Type2:e876d44d-246f-40c5-b5a3-4d0eb31db00d |LATEST|
> {"key":"e876d44d-246f-40c5-b5a3-4d0eb31db00d","keyType":"
> Type2","state":"state2","timestamp":1282963191036,"eventId":21960,"executionId":5067}
>
>
> For that above select * example, given how I have the primary key for the
> schema to support dynamic wide rows, it was my understanding that it really
> equates to data for 2 physical rows each with 4 cells.  So I should have 18
> million physical rows but given the number of entries I inserted for each
> key it equated to 72 million rows a select count(*) from foo will report if
> I add the limit command to let it scan all rows.
>
> Does anything seem like it is hurting our chances to horizontally scale
> with the data/schema?
>
> Thanks,
> Diane
>
>
> On Fri, Jul 18, 2014 at 6:46 AM, Benedict Elliott Smith <
> belliottsm...@datastax.com> wrote:
>
>> How many columns are you inserting/querying per key? Could we see some
>> example CQL statements for the insert/read workload?
>>
>> If you are maxing out at 10 clients, something fishy is going on. In
>> general, though, if you find that adding nodes causes performance to
>> degrade I would suspect that you are querying data in one CQL statement
>> that is spread over multiple partitions, and so extra work needs to be done

Re: DataType protocol ID error for TIMESTAMPs when upgrading from 1.2.11 to 2.0.9

2014-07-23 Thread Ben Hood
On Wed, Jul 23, 2014 at 1:53 AM, Robert Coli  wrote:
> On Tue, Jul 22, 2014 at 1:53 AM, Ben Hood <0x6e6...@gmail.com> wrote:
> Indeed, reading up on the issue (and discussing it with folks) there are a
> number of mitigating factors, most significantly driver workarounds use of
> TimeUUIDs, which made this issue less common than reversed comparators use
> cases are. I still consider it a serious issue due to the nature of the
> regression, but it is fair to say not as serious as my initial reaction.

Just to highlight Karl's contribution, if you look back over the
discussion we had at gocql for this particular issue, we did actually
discuss the fact that we should probably actually look at the server
code to see if it doing anything unexpected. But we stopped short of
doing that. To Karl's credit, he did actually look at the code, and
spotted the bug.


Re: DataType protocol ID error for TIMESTAMPs when upgrading from 1.2.11 to 2.0.9

2014-07-23 Thread Ben Hood
On Wed, Jul 23, 2014 at 1:53 AM, Robert Coli  wrote:
> On Tue, Jul 22, 2014 at 1:53 AM, Ben Hood <0x6e6...@gmail.com> wrote:
> In this particular case, the answer to "why not" involves the idea that one
> needs to be able to test with a driver in order to expose it, and currently
> (as I understand it) only distributed tests use a driver.
>
> I believe that operators expect there to be a robust representative test
> schema that can be created on version X.Y.Z and be accessed on version
> X+1.y.0 which would exercise this core code and increase confidence that
> tables created in major version X will always be usable without exception in
> X+1.

With gocql we currently run out integration test suite on Travis
against 1.2.18 and 2.0.9 (*), but in each case, we install the server
from a clean slate. Theoretically we could do a migration, but the
clean slate makes things easier for us. One could argue however that
verifying server migration is beyond the scope of the integration test
suite for a client driver.

(*) We've looked at including 2.1rc3, but there is a acknowledged
server side bug that causes one of our tests to fail, so we do not
have mainline coverage for 2.1-rcx yet.


CSV Import is taking huge time

2014-07-23 Thread Akshay Ballarpure
Hello,
I am trying copy command in Cassandra to import CSV file in to DB, Import 
is taking huge time, any suggestion to improve it? 

id,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z
100,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
101,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26

--
--

there are ~ 50 K lines in this file , size is ~ 5 MB.
 
I have created table as per below:

create table csldata4 ( id int PRIMARY KEY,a int , b int, c int, d int, e 
int, f int,
g int, h int,i int, j int, k int, l int,m int, n 
int, o int, p int, qint, r int, s int, 
t int, u int, v int, w int, x int, y int , z int);
Copy Command:

COPY csldata4 (id , a , b , c , d , e , f , g , h , i , j , k , l , m , n 
, o , p , q , r , s , t , u , v , w , x , y , z ) FROM 'csldata1.csv' WITH 
HEADER=TRUE;
 
Issue here is it's taking huge time to import

cqlsh:mykeyspace> COPY csldata (id , a , b , c , d , e , f , g , h , i , j 
, k , l , m , n , o , p , q , r , s , t , u , v , w , x , y , z ) FROM 
'csldata1.csv' WITH HEADER=TRUE;
66215 rows imported in 1 minute and 31.044 seconds.


Thanks & Regards
Akshay Ghanshyam Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarp...@tcs.com
Website: http://www.tcs.com

Experience certainty.   IT Services
Business Solutions
Consulting

=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you




Re: Case Study from Migrating from RDBMS to Cassandra

2014-07-23 Thread DuyHai Doan
"How they have migrated from RDBMS to Cassandra?"

-> there is no one-answer-fits-all. It all depends on the application
features


"What are the things to consider?"
-> design data model with query-first approach. The choice of how/when/what
to denormalize is crucial

"How they have converted data model and after the new data model?"
-> for zero downtime migration you'll probably need double-run strategy
where you insert into old (RDBMS) and new (Cassandra) data store at the
same time but read only hit the RDBMS. Then launch data migration script to
copy existing data from RDBMS to Cassandra, then cut all read/write access
to RDBMS

"How they have loaded the data into cassadnra ?"
-> migration

"Performance test after and before migartion etc."
-> mandatory.




On Wed, Jul 23, 2014 at 8:40 AM, Mark Reddy  wrote:

> PlanetCassandra has a collection of migration use cases:
>
> http://planetcassandra.org/mysql-to-cassandra-migration/
> http://planetcassandra.org/oracle-to-cassandra-migration/
>
> If they don't contain the information you need I'm sure you could reach
> out to the companies involved or DataStax itself to get what you require.
>
>
> Mark
>
>
> On Wed, Jul 23, 2014 at 5:28 AM, Surbhi Gupta 
> wrote:
>
>> Thansk Shane, Howover i am looking for any Proof of Concepts kind of
>> document .
>> Does anybody has complete end to end document which contains the
>> application overview,
>>
>> How they have migrated from RDBMS to Cassandra?
>> What are the things to consider?
>> How they have converted data model and after the new data model?
>> How they have loaded the data into cassadnra ?
>> Performance test after and before migartion etc.
>>
>> Thanks
>> Surbhi
>>
>> On 23 July 2014 08:51, Shane Hansen  wrote:
>>
>>> There's lots of info on migrating from a relational database to
>>> Cassandra here:
>>> http://www.datastax.com/relational-database-to-nosql
>>>
>>>
>>>
>>> On Tue, Jul 22, 2014 at 7:45 PM, Surbhi Gupta 
>>> wrote:
>>>
 Hi,

 Does anybody has the case study for Migrating from RDBMS to Cassandra ?

 Thanks

>>>
>>>
>>
>


Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread spawgi
I would like to help out with the documentation of C*. How do I start?


On Wed, Jul 23, 2014 at 12:46 PM, Robert Stupp  wrote:

> Just a note:
> If you have suggestions how to improve documentation on the datastax
> website, write them an email to d...@datastax.com. They appreciate
> proposals :)
>
> Am 23.07.2014 um 09:10 schrieb Mark Reddy :
>
> Hi Kevin,
>
> The difference here is that the Apache Cassandra site is maintained by the
> community whereas the DataStax site is maintained by paid employees with a
> vested interest in producing documentation.
>
> With DataStax having some comprehensive docs, I guess the desire for
> people to maintain the Apache site has dwindled. However, if you are
> interested in contributing to it and bringing it back up to standard you
> can, thus is the freedom of open source.
>
>
> Mark
>
>
> On Wed, Jul 23, 2014 at 2:54 AM, Kevin Burton  wrote:
>
>> This document:
>>
>> https://wiki.apache.org/cassandra/Operations
>>
>> … for example.  Is extremely out dated… does NOT reflect 2.x releases
>> certainly.  Mentions commands that are long since removed/deprecated.
>>
>> Instead of giving bad documentation, maybe remove this and mark it as
>> obsolete.
>>
>> The datastax documentation… is … acceptable I guess.  My main criticism
>> there is that a lot of it it is in their blog.
>>
>> Kevin
>>
>> --
>>
>> Founder/CEO Spinn3r.com 
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>> 
>>
>>
>
>


-- 
http://spawgi.wordpress.com
We can do it and do it better.


Re: When will a node's host ID change?

2014-07-23 Thread Mark Reddy
It won't. Cassandra stores a node's host ID in an end-point to host ID
mapping, each ID must be unique and cannot be changed after the fact.


Mark


On Mon, Jul 21, 2014 at 4:44 PM, John Sanda  wrote:

> Under what circumstances, if any, will a node's host ID change?
>
> - John
>


Re: ONE consistency required 2 writes? huh?

2014-07-23 Thread Olivier Michallat
Hi Kevin,

This message was likely generated by the Java driver, not by Cassandra.

I'll follow up to your other post on the driver's mailing list.

-- Olivier


On Wed, Jul 23, 2014 at 4:37 AM, Kevin Burton  wrote:

> Perhaps it's me but it seems this exception is wrong:
>
> "Cassandra timeout during write query at consistency ONE (2 replica were
> required but only 1 acknowledged the write)"
>
> .. but the documentation for ONE says:
>
> " A write must be written to the commit log and memory table of at least
> one replica node.
> … so… in my situation… 1 replica DID ack the write… so why am I getting an
> exception?
>
> Maybe I'm jut not interpreting the exception correctly?
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 
>
>


Re: EXTERNAL: Re: Running Cassandra Server in an OSGi container

2014-07-23 Thread Robert Stupp
You mean "unzip and run" of an application using C* ?

Am 23.07.2014 um 00:34 schrieb Rodgers, Hugh :

> What got our team on the path of trying to embed C* was the wiki page 
> http://wiki.apache.org/cassandra/Embedding which implies this can be done. 
> Also WSO2 Carbon and Achilles have both embedded C* (not in an OSGi container 
> though, and Carbon is with an older C* version).
>  
> We are wanting an “unzip and run” system and do not expect the user to have 
> to do much, if any, C* configuration.
>  
> From: Robert Stupp [mailto:sn...@snazy.de] 
> Sent: Tuesday, July 22, 2014 1:19 PM
> To: user@cassandra.apache.org
> Subject: EXTERNAL: Re: Running Cassandra Server in an OSGi container
>  
> What's your intention to do this?
>  
> There are unit test integrations using C* daemon. A related bug that 
> prevented proper shutdown has been closed for C* 2.1-rc1: 
> https://issues.apache.org/jira/browse/CASSANDRA-5635
> It's perfectly fine to embed C* for unit tests.
>  
> But I'd definitely not recommend to use C* within a container in a real 
> production environment.
> Not just because of the few System.exit calls in CassandraDaemon but also of 
> the other places where System.exit is called for very good reasons. These 
> reasons include system/node failure scenarios (for example disk failures).
>  
> C* is designed to run in its own JVM process using dedicated hardware 
> resources on multiple servers using commodity hardware without any 
> virtualization or any shared storage. And it just works great with that.
>  
> There are good reasons to move computation near to the data - but that's 
> always a separate OS process on C* nodes. Examples are Hadoop and Spark.
>  
> Am 22.07.2014 um 21:45 schrieb Rodgers, Hugh :
> 
> 
> Hello –
>  
> I have a use case where I need to run the Cassandra Server as an OSGi bundle. 
> I have been able to embed all of the Cassandra dependencies in an OSGi bundle 
> and run it on Karaf container, but I am not happy with the approach I have 
> thus far.
>  
> Since CassandraDaemon has System.exit() calls in it, if these execute it will 
> bring down my entire OSGi container rather than just the bundle Cassandra is 
> running in. I hacked up a copy of CassandraDaemon enough to get it to run in 
> the bundle with no System.exit() calls, but the Cassandra StorageService is 
> not “aware” of it, i.e., I cannot call the StorageService.registerDaemon(…) 
> method because my copy of CassandraDaemon does not extend Apache’s. hence I 
> am getting exceptions when I do shutdown my container or restart the bundle 
> because the StorageService and my CassandraDaemon are not “linked”.
>  
> I am considering trying to extend Apache’s CassandraDaemon and override its 
> setup() method with a SecurityManager that disables System.exit() calls. This 
> too sounds “hacky”.
>  
> Does anyone have any better suggestions? Or know of an existing open source 
> project that has successfully embedded CassandraServer in an OSGi bundle?
>  
> I am using Cassandra v2.0.7 and am currently using CQL (vs. Thrift).
>  
> Thanks –
>  
> Hugh



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Robert Stupp
Just a note:
If you have suggestions how to improve documentation on the datastax website, 
write them an email to d...@datastax.com. They appreciate proposals :)

Am 23.07.2014 um 09:10 schrieb Mark Reddy :

> Hi Kevin,
> 
> The difference here is that the Apache Cassandra site is maintained by the 
> community whereas the DataStax site is maintained by paid employees with a 
> vested interest in producing documentation.
> 
> With DataStax having some comprehensive docs, I guess the desire for people 
> to maintain the Apache site has dwindled. However, if you are interested in 
> contributing to it and bringing it back up to standard you can, thus is the 
> freedom of open source. 
> 
> 
> Mark
> 
> 
> On Wed, Jul 23, 2014 at 2:54 AM, Kevin Burton  wrote:
> This document:
> 
> https://wiki.apache.org/cassandra/Operations
> 
> … for example.  Is extremely out dated… does NOT reflect 2.x releases 
> certainly.  Mentions commands that are long since removed/deprecated.
> 
> Instead of giving bad documentation, maybe remove this and mark it as 
> obsolete.
> 
> The datastax documentation… is … acceptable I guess.  My main criticism there 
> is that a lot of it it is in their blog.
> 
> Kevin
> 
> -- 
> 
> Founder/CEO Spinn3r.com
> Location: San Francisco, CA
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Mark Reddy
Hi Kevin,

The difference here is that the Apache Cassandra site is maintained by the
community whereas the DataStax site is maintained by paid employees with a
vested interest in producing documentation.

With DataStax having some comprehensive docs, I guess the desire for people
to maintain the Apache site has dwindled. However, if you are interested in
contributing to it and bringing it back up to standard you can, thus is the
freedom of open source.


Mark


On Wed, Jul 23, 2014 at 2:54 AM, Kevin Burton  wrote:

> This document:
>
> https://wiki.apache.org/cassandra/Operations
>
> … for example.  Is extremely out dated… does NOT reflect 2.x releases
> certainly.  Mentions commands that are long since removed/deprecated.
>
> Instead of giving bad documentation, maybe remove this and mark it as
> obsolete.
>
> The datastax documentation… is … acceptable I guess.  My main criticism
> there is that a lot of it it is in their blog.
>
> Kevin
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 
>
>