Re: Node added, no performance boost -- are the tokens correct?

2011-04-01 Thread Eric Gilmore
The DS docs go with should regarding setting the initial token to zero.
It's not a must, but you get enough convenience out of never having to
move tokens on that node that I'm not sure why you wouldn't do it.

If anyone has a compelling reason not to do so, I'm happy to hear it :)

On Fri, Apr 1, 2011 at 10:58 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 On Fri, Apr 1, 2011 at 1:15 PM, Peter Schuller
 peter.schul...@infidyne.com wrote:
  Now, I moved the tokens. I still observe that read latency deteriorated
 with
  3 machines vs original one. Replication factor is 1, Cassandra version
 0.7.2
  (didn't have time to upgrade as I need results by this weekend).
 
  Read *latency* is fully expected to increase if you just add a node.
  *Throughput* should increase, unless you have a workload that manages
  to be more expensive on RPC than actual reads/writes.
 
  Latency would only be improved by additional nodes under some significant
 load.
 
  How are you benchmarking? Are you concurrently submitting requests to
  all nodes at the same time? Try using stress.py from the Cassandra
  tree as a comparison.
 
  If you're sending one request at a time, there is no expectation at
  all of a performance improvement - just a decrease in performance.
 
  --
  / Peter Schuller
 

 To be clear on this issue. It does not matter where the tokens start
 it only matters that they are equally spaced around the token space.

 So for a 4 node clusters your tokens should either be
 1 * ((2^127) / 4) = 42535295865117307932921825928971026432
 2 * ((2^127) / 4) = 85070591730234615865843651857942052864
 3 * ((2^127) / 4) = 127605887595351923798765477786913079296
 4 * ((2^127) / 4) = 170141183460469231731687303715884105728

 or
 0 * ((2^127) / 4) = 0
 1 * ((2^127) / 4) = 42535295865117307932921825928971026432
 2 * ((2^127) / 4) = 85070591730234615865843651857942052864
 3 * ((2^127) / 4) = 127605887595351923798765477786913079296

 If you move one you have to move the rest because the distance between
 170141183460469231731687303715884105728 and 0 is 1



Re: Node added, no performance boost -- are the tokens correct?

2011-03-31 Thread Eric Gilmore
A script that I have says the following:

$ python ctokens.py
How many nodes are in your cluster? 2
node 0: 0
node 1: 85070591730234615865843651857942052864

The first token should be zero, for the reasons discussed here:
http://www.datastax.com/dev/tutorials/getting_started_0_7/configuring#initial-token-values

More details are available in
http://www.datastax.com/docs/0.7/operations/clustering#adding-capacity

The DS docs have some weak areas, but these two pages have been pretty well
vetted over the past months :)



On Thu, Mar 31, 2011 at 3:06 PM, buddhasystem potek...@bnl.gov wrote:

 I just configured a cluster of two nodes -- do these token values make
 sense?
 The reason I'm asking that so far I don't see load balancing to be
 happening, judging from performance.

 Address Status State   LoadOwnsToken

 170141183460469231731687303715884105728
 130.199.185.194 Up Normal  153.52 GB   50.00%
 85070591730234615865843651857942052864
 130.199.185.193 Up Normal  199.82 GB   50.00%
 170141183460469231731687303715884105728


 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-added-no-performance-boost-are-the-tokens-correct-tp6228872p6228872.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: How to determine if repair need to be run

2011-03-31 Thread Eric Gilmore
Peter, I want to join everyone else thanking you for helping out so much
with this thread, and especially for pointing out the problems with the DS
docs on this topic.  We have some corrections posted today, and will keep
looking to improve the information.

On Thu, Mar 31, 2011 at 3:11 PM, Peter Schuller peter.schul...@infidyne.com
 wrote:

  Thanks a lot for elaborating on repairs.Still, it's a bit fuzzy to me
 why it is so important to run a repair before the GCGraceSeconds kicks in.
 Does this mean a delete does not get replicated ?   In other words when I
 delete something on a node, doesn't cassandra set tombstones on its replica
 copies?

 Deletes are replicated, but deletes are special in that unlike actual
 data, you're wanting to *remove* something, but the information that
 says stuff is gone is information in and of itself. Clearly you
 don't want to forever and ever keep track of anything ever removed in
 the cluster, so this has to expire somehow. For that reason, there is
 a requirement that tombstones are replicated prior to their expiry.
 See:

  http://wiki.apache.org/cassandra/DistributedDeletes

  And technically, isn't repair only needed for cases where things weren't
 properly propogated in the cluster?  If all writes are written to the right
 replicas, and all deletes are written to all the replicas, and all nodes
 were available at all times, then everything should work as designed -
  without manual intervention, right?

 Yes, but you can assume that doesn't happen in real life for extended
 periods of time. It doesn't take a lot at all for a *few* writes not
 getting replicated (for example, just restarting a Cassandra node will
 cause some writes to be dropped - hinted handoff is not a guarantee,
 only an optimization).

 --
 / Peter Schuller



Re: Add node to balanced cluster?

2011-03-25 Thread Eric Gilmore
Ruslan, I'm not sure exactly what risks you are referring to -- can you be
more specific?

Do the CPU-intensive operations one at a time, including doing the cleanup
when it will not interfere with other operations, and I think you should be
fine, from my understanding.


   1. Start the new nodes in staggered fashion, allowing at least two
   minutes between each node startup for the gossip protocol to perform
   important inter-node communication. You can monitor the startup and data
   streaming process to its completion using *nodetool
netstats*http://www.datastax.com/docs/0.7/utilities/nodetool#nodetool-streams
   .
   2. After the new nodes are fully bootstrapped, run nodetool move
   new_token on each existing node, one node at a time, where new_token
   is the value you calculated for the node. Only the first node in the ring,
   whose token value is zero, does not need to be moved.


   1. Run *nodetool
cleanup*http://www.datastax.com/docs/0.7/utilities/nodetool#nodetool-cleanupon
each of the previously existing nodes to remove the keys no longer
   belonging to those nodes. This operation is as disk-intensive as a major
   compaction, so run only one cleanup command at a time. Cleanup may be safely
   postponed for low-usage hours.





On Fri, Mar 25, 2011 at 10:39 AM, ruslan usifov ruslan.usi...@gmail.comwrote:



 2011/3/25 Eric Gilmore e...@datastax.com

 Also:
 http://www.datastax.com/docs/0.7/operations/clustering#adding-capacity

 Can do that about i represent, but i afraid that when i begin balance
 cluster with new node this will be a big stress for it. Mey be exists some
 strategies how to do that?



Re: where to find the stress testing programs?

2011-03-16 Thread Eric Gilmore
There are both Python and Java stress testing tools.  I found the Java
version easier to use.  These directions (which echo the README for
stress.java) may help get you going:
http://www.datastax.com/docs/0.7/utilities/stress_java

On Tue, Mar 15, 2011 at 9:25 AM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote:

 contrib is only in the source download of cassandra

 On Mar 15, 2011, at 11:23 AM, Jonathan Colby wrote:

  According to the Cassandra Wiki and OReilly book supposedly there is a
  contrib directory within the cassandra download containing the
  Python Stress Test script stress.py.  It's not in the binary tarball
  of 0.7.3.
 
  Anyone know where to find it?
 
  Anyone know of other, maybe better stress testing scripts?
 
  Jon




Re: Installation

2011-03-07 Thread Eric Gilmore
The DataStax packaged
releaseshttp://www.datastax.com/docs/0.7/configuration/packaged_releasesfollow
standard practices for Linux-ish installation, so they might be a
good model to follow. For instance, the RHEL/CentOS package installs the
binaries (cassandra-cli, nodetool) in /usr/bin, configuration-related things
in /etc/cassandra/conf/, and start/stop scripts in /etc/init.d/ .

The installation
pagehttp://www.datastax.com/dev/tutorials/getting_started_0.7/installingin
our Getting Started guide describes the breakdown in binary files that
you can download from Apache http://cassandra.apache.org/download/ and
unpack, but in the near future we may add a more detailed description of the
package releases.

On Mon, Mar 7, 2011 at 8:40 AM, Mark static.void@gmail.com wrote:

 Where do must people install Cassandra to? /var or /opt?

 Thanks



Re: cassandra-rack.properties or cassandra-topology.properties

2011-03-04 Thread Eric Gilmore
Apologies A. J. -- the reference to rack.properties is an error in the
DataStax docs.  We'll update it ASAP.

On Thu, Mar 3, 2011 at 10:56 AM, A J s5a...@gmail.com wrote:

 Yes, that has topology and not rack.

 conf/access.properties  conf/log4j-server.properties
 conf/cassandra-env.sh   conf/log4j-tools.properties
 conf/cassandra-topology.properties  conf/passwd.properties
 conf/cassandra.yaml conf/README.txt


 On Thu, Mar 3, 2011 at 1:28 PM, Jonathan Ellis jbel...@gmail.com wrote:
  Did you try ls conf/ ?
 
  On Thu, Mar 3, 2011 at 11:27 AM, A J s5a...@gmail.com wrote:
  In PropertyFileSnitch is cassandra-rack.properties or
  cassandra-topology.properties file used ?
 
  Little confused by the stmt:
  PropertyFileSnitch determines the location of nodes by referring to a
  user-defined description of the network details located in the
  property file cassandra-rack.properties. Your installation contains an
  example properties file for PropertyFileSnitch in
  $CASSANDRA_HOME/conf/cassandra-topology.properties.
 
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 



Re: Seed Nodes

2011-03-01 Thread Eric Gilmore
Yes, two per DC is a recommendation I've heard from Jonathan Ellis.  We put
that in yet more documentation
athttp://www.datastax.com/dev/tutorials/getting_started_0.7/configuring#seed-list(appreciate
the citation Aaron :)

I had a recent conversation with a Cassandra expert who had me convinced
that, as long as a node online already had one seed in its list at startup,
you wouldn't really need to restart it -- at least right away -- after
adding a second seed to the config.  Rather than confuse the issue by trying
to remember/recreate the rationale for that, I'll ping him and see if he'll
comment here.

On Tue, Mar 1, 2011 at 12:04 PM, Aaron Morton aa...@thelastpickle.comwrote:

 AFAIK it's recommended to have two seed nodes per dc.

 Some info on seeds here 
 http://www.datastax.com/docs/0.7/operations/clustering
 http://www.datastax.com/docs/0.7/operations/clustering

 You will need a restart.

 Aaron


 On 2/03/2011, at 6:08 AM, shan...@accenture.com wrote:

  How many seed nodes should I have for a cluster of 100 nodes each with
 about 500gb of data? Also to add seeds the nodes, must I change the seed
 nodes list on all existing nodes through the Cassandra.yaml file? Will
 changes take effect without restarting the node?



 *Shan (Susie) Lu,  *Analyst**

 Accenture Technology Labs - Silicon Valley **

 cell  +1 425.749.2546

 email *shan...@accenture.com charles.nebol...@accenture.com*



 --
 This message is for the designated recipient only and may contain
 privileged, proprietary, or otherwise private information. If you have
 received it in error, please notify the sender immediately and delete the
 original. Any other use of the email by you is prohibited.




Re: Multiple Seeds

2011-02-23 Thread Eric Gilmore
The DataStax documentation offers some answers to those questions in
the Getting
Startedhttp://www.datastax.com/dev/tutorials/getting_started_0.7/configuring#adding-nodes-to-a-cassandra-clustersection
and the
Clusteringhttp://www.datastax.com/docs/0.7/operations/clustering#adding-capacityreference
docs.

Autobootstrap should be true, but with the important caveat that
intial_token values should be specified.  Have a look at those docs, and
please give feedback on how helpful they are/aren't.

Regards,

Eric Gilmore


On Wed, Feb 23, 2011 at 11:15 AM, jeremy.truel...@barclayscapital.comwrote:

 What’s the best way to bring multiple seeds up, should only one of them
 have auto bootstrap set to true or should neither of them? Should they list
 themselves and the other seed in their seed section in the yaml config?

 ___



 This e-mail may contain information that is confidential, privileged or
 otherwise protected from disclosure. If you are not an intended recipient of
 this e-mail, do not duplicate or redistribute it by any means. Please delete
 it and any attachments and notify the sender that you have received it in
 error. Unless specifically indicated, this e-mail is not an offer to buy or
 sell or a solicitation to buy or sell any securities, investment products or
 other financial product or service, an official confirmation of any
 transaction, or an official statement of Barclays. Any views or opinions
 presented are solely those of the author and do not necessarily represent
 those of Barclays. This e-mail is subject to terms available at the
 following link: www.barcap.com/emaildisclaimer. By messaging with Barclays
 you consent to the foregoing.  Barclays Capital is the investment banking
 division of Barclays Bank PLC, a company registered in England (number
 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
 This email may relate to or be sent from other members of the Barclays
 Group.**

 ___



Re: Multiple Seeds

2011-02-23 Thread Eric Gilmore
Well -- when you first bring a node into a ring, you will probably want to
stream data to it with auto_bootstrap: true.

If you want that node to be a seed, then add it to the seeds list AFTER it
has joined the ring.

I'd refer you to the Seed List and Autoboostrapping sections of the
Getting Started guide, which contain the following blurbs:

*There is no strict rule to determine which hosts need to be listed as
seeds, but all nodes in a cluster need the same seed list. For a production
deployment, DataStax recommends two seeds per data center.*
*
*

*An autobootstrapping node cannot have itself in the list of seeds nor can
it contain an 
initial_tokenhttp://www.datastax.com/docs/0.7/configuration/storage_configuration#initial-tokenalready
claimed by another node. To add new seeds, autobootstrap the nodes
first, and then configure them as seeds.*





On Wed, Feb 23, 2011 at 11:39 AM, jeremy.truel...@barclayscapital.comwrote:

 So all seeds should always be set to 'auto_bootstrap: false' in their .yaml
 file.

 -Original Message-
 From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
 Sent: Wednesday, February 23, 2011 2:36 PM
 To: user@cassandra.apache.org
 Cc: Truelove, Jeremy: IT (NYK)
 Subject: Re: Multiple Seeds

 On Wed, Feb 23, 2011 at 2:30 PM,  jeremy.truel...@barclayscapital.com
 wrote:
  Yeah I set the tokens, I'm more asking if I start the first seed node
 with
  autobootstrap set to false the second seed should have it set to true as
  well as all the slave nodes correct? I didn't see this in the docs but I
 may
  have just missed it.
 
 
 
  From: Eric Gilmore [mailto:e...@datastax.com]
  Sent: Wednesday, February 23, 2011 2:24 PM
  To: user@cassandra.apache.org
  Subject: Re: Multiple Seeds
 
 
 
  The DataStax documentation offers some answers to those questions in the
  Getting Started section and the Clustering reference docs.
 
  Autobootstrap should be true, but with the important caveat that
  intial_token values should be specified.  Have a look at those docs, and
  please give feedback on how helpful they are/aren't.
 
  Regards,
 
  Eric Gilmore
 
  On Wed, Feb 23, 2011 at 11:15 AM, jeremy.truel...@barclayscapital.com
  wrote:
 
  What's the best way to bring multiple seeds up, should only one of them
 have
  auto bootstrap set to true or should neither of them? Should they list
  themselves and the other seed in their seed section in the yaml config?
 
  ___
 
 
 
  This e-mail may contain information that is confidential, privileged or
  otherwise protected from disclosure. If you are not an intended recipient
 of
  this e-mail, do not duplicate or redistribute it by any means. Please
 delete
  it and any attachments and notify the sender that you have received it in
  error. Unless specifically indicated, this e-mail is not an offer to buy
 or
  sell or a solicitation to buy or sell any securities, investment products
 or
  other financial product or service, an official confirmation of any
  transaction, or an official statement of Barclays. Any views or opinions
  presented are solely those of the author and do not necessarily represent
  those of Barclays. This e-mail is subject to terms available at the
  following link: www.barcap.com/emaildisclaimer. By messaging with
 Barclays
  you consent to the foregoing.  Barclays Capital is the investment banking
  division of Barclays Bank PLC, a company registered in England (number
  1026167) with its registered office at 1 Churchill Place, London, E14
 5HP.
  This email may relate to or be sent from other members of the Barclays
  Group.
 
  ___
 
 

 If a node is defined as a seeds it will never auto bootstrap. After it
 has bootstrapped and has a system table you can set its yaml file as a
 seed if you wish.



Re: Error when bringing up 3rd node

2011-02-18 Thread Eric Gilmore
It sounds like one of your existing nodes already has the initial token
zero.  Did you set the intial token of the first node you brought online to
zero?

On Fri, Feb 18, 2011 at 12:35 PM, mcasandra mohitanch...@gmail.com wrote:


 I see following error. Is it because I have initial token defined? What
 token
 should I use as initial token?

  INFO 12:31:36,689 Finished hinted handoff of 0 rows to endpoint
 /172.16.208.12
  INFO 12:32:58,448 Joining: getting bootstrap token
 ERROR 12:32:58,451 Fatal error: Bootstraping to existing token 0 is not
 allowed (decommission/removetoken the old node first).
 Bad configuration; unable to start server

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Error-when-bringing-up-3rd-node-tp6041409p6041409.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: Error when bringing up 3rd node

2011-02-18 Thread Eric Gilmore
A Java program should work fine.  The Wiki and the DataStax documentation
use a python program for the same purpose:

http://www.datastax.com/docs/0.7/operations/clustering#calculating-tokens

On Fri, Feb 18, 2011 at 12:45 PM, mcasandra mohitanch...@gmail.com wrote:


 Yes I had set the first node to token 0. I think I read somewhere in the
 docs. What should I do. Should I write a java program to calculate the hash
 for 3 nodes and distribute it accross 3 nodes?
 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Error-when-bringing-up-3rd-node-tp6041409p6041430.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: Error when bringing up 3rd node

2011-02-18 Thread Eric Gilmore
I'm not sure I can say exactly why, but I'm sure those numbers can't be
correct.  One node should be zero and the other values should be very long
numbers like 85070591730234615865843651857942052863.

We need another Java expert's opinion here, but it looks like your snippet
may have integer
overflowhttp://www.mkyong.com/java/javas-silent-killer-buffer-overflow-careful/
or integer overload going on.

On Fri, Feb 18, 2011 at 1:04 PM, mcasandra mohitanch...@gmail.com wrote:


 Thanks! This is what I got. Is this right?

 public class TokenCalc{
  public static void main(String ...args){
   int nodes=3;
   for(int i = 1 ; i = nodes; i++) {
 System.out.println( (2 ^ 127) / nodes * i);
   }
  }
 }

 41
 82
 123
 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Error-when-bringing-up-3rd-node-tp6041409p6041471.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: Internal error processing insert

2011-02-14 Thread Eric Gilmore
For now, I have committed a change in the misleading documentation,
substituting SimpleStrategy for NTS.

Sorry you ran into trouble due to that, mcasandra.

On Mon, Feb 14, 2011 at 4:28 PM, Aaron Morton aa...@thelastpickle.comwrote:

 Will take a closer look at the code tonight, perhaps we should return an
 error if you try to using Network Topology it cannot detect any DC's .

 Cheers
 Aaron


 On 15 Feb, 2011,at 01:22 PM, mcasandra mohitanch...@gmail.com wrote:


 That's what I thought might be happening since network topology will try to
 find one node on the other data center. Message is little confusing though.

 [default@unknown] update keyspace twissandra
 placement_strategy='org.apache.cassandra.locator.SimpleStrategy';
 Syntax error at position 28: missing EOF at 'placement_strategy'
 [default@unknown] update keyspace twissandra with
 placement_strategy='org.apache.cassandra.locator.SimpleStrategy';
 5c487967-3899-11e0-993f-b7fa7ed61af9
 [default@unknown] use twissandra
 ... ;
 Authenticated to keyspace: twissandra
 [default@twissandra] set users['jsmith']['password']='ch@ngem3';
 Value inserted.

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Internal-error-processing-insert-tp6025740p6025813.htmlhttp://cassandra-user-incubator-apache-org.3065146.n2.nabblecom/Internal-error-processing-insert-tp6025740p6025813.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.




Re: How does Bootstrapping work in 0.7 ??

2011-01-24 Thread Eric Gilmore
Thanks very much Patrick for the good words and suggestions.  Those are
important points about initial_token and nodetool move.

Definitely, keep us informed about any and all doc issues you have, and we
will do what we can to keep improving the docs.

Eric

On Sun, Jan 23, 2011 at 2:26 PM, Patrick de Torcy pdeto...@gmail.comwrote:

 Thanks, Eric and Peter, for all the replies ! It helped me a lot...

 I've read again the Getting Started 
 pagehttp://www.datastax.com/docs/0.7/getting_started/index...
 In fact I have particularly read the chapter : setting up a multi node
 cluster (as it was what I tried to do). It's written : The rest of this
 section uses a two-node example cluster. With two nodes, a new Cassandra
 cluster’s load is automatically balanced, and remains balanced if you
 increase to four nodes, then eight. So I thought that keeping the default
 parameter (ie blank) would have been enough. So I think you should add in
 your doc (and in the yaml comments) that if it's the first node of your
 cluster, you really should set intialToken to 0, if you want your further
 nodes balanced.

 You should add too, that if your cluster  is unbalanced, you 'll have to
 run : nodetool host move newToken (it's not specified in this section). Some
 examples would be useful too. For people new to Cassandra, this part is very
 confusing (well, maybe I'm a little dumb too...)

 But even with these flaws, your documentation is the best I have read.
 And if I have other issues, don't worry, you'll be informed :-).

 Thanks again,

 Patrick


 On Thu, Jan 20, 2011 at 8:55 PM, Eric Gilmore e...@riptano.com wrote:

 Patrick, if you try adding capacity again from the beginning, I'd be
 curious to hear if the 
 DataStax/Riptanohttp://www.datastax.com/docs/0.7/operations/clustering#adding-capacitydocs
  are helpful or not.

 Also, in the Getting Started 
 pagehttp://www.datastax.com/docs/0.7/getting_started/index,
 we note that it may be best to set initial_token to 0 on the very first node
 that you start.

 Regards,

 Eric Gilmore


 On Thu, Jan 20, 2011 at 11:05 AM, Peter Schuller 
 peter.schul...@infidyne.com wrote:

  Is it supposed to work that way, or have I missed something ?

 I don't see that you did anything wrong based on your description and
 based on my understanding how it works in 0.7 (not sure about 0.6),
 but hopefully someone else can address that part. What I can think of
 - did you inspect the log on the new node? Does it say anything about
 bootstraping or streaming data from other nodes? Does 'nodetool ring'
 indicate it considers itself completely up and in the cluster already?

 Trying to determine whether the node is in fact considering it self
 done bootstrapping and joined to the ring, yet containing no data.

  I tried then to put values for initialToken for both nodes (stopping
 and
  restartings the servers), but it didn't change anything : I have the
 same
  token values...

 This is expected. Once the node has bootstrapped into the cluster and
 saved its token, it will no longer try to acquire a new one. Any
 initial token in the configuration is ignored; it is only the
 *initial* token, quite literally. Changing the token would require a
 'nodetool move' command.

 --
 / Peter Schuller




 --
 *Eric Gilmore
 *
 Consulting Technical Writer
 Riptano, Inc.
 Ph: 510 684 9786  (cell)





-- 
*Eric Gilmore
*
Consulting Technical Writer
Riptano, Inc.
Ph: 510 684 9786  (cell)


Re: How does Bootstrapping work in 0.7 ??

2011-01-20 Thread Eric Gilmore
Patrick, if you try adding capacity again from the beginning, I'd be curious
to hear if the 
DataStax/Riptanohttp://www.datastax.com/docs/0.7/operations/clustering#adding-capacitydocs
are helpful or not.

Also, in the Getting Started
pagehttp://www.datastax.com/docs/0.7/getting_started/index,
we note that it may be best to set initial_token to 0 on the very first node
that you start.

Regards,

Eric Gilmore

On Thu, Jan 20, 2011 at 11:05 AM, Peter Schuller 
peter.schul...@infidyne.com wrote:

  Is it supposed to work that way, or have I missed something ?

 I don't see that you did anything wrong based on your description and
 based on my understanding how it works in 0.7 (not sure about 0.6),
 but hopefully someone else can address that part. What I can think of
 - did you inspect the log on the new node? Does it say anything about
 bootstraping or streaming data from other nodes? Does 'nodetool ring'
 indicate it considers itself completely up and in the cluster already?

 Trying to determine whether the node is in fact considering it self
 done bootstrapping and joined to the ring, yet containing no data.

  I tried then to put values for initialToken for both nodes (stopping and
  restartings the servers), but it didn't change anything : I have the same
  token values...

 This is expected. Once the node has bootstrapped into the cluster and
 saved its token, it will no longer try to acquire a new one. Any
 initial token in the configuration is ignored; it is only the
 *initial* token, quite literally. Changing the token would require a
 'nodetool move' command.

 --
 / Peter Schuller




-- 
*Eric Gilmore
*
Consulting Technical Writer
Riptano, Inc.
Ph: 510 684 9786  (cell)


Re: How does Bootstrapping work in 0.7 ??

2011-01-20 Thread Eric Gilmore
 Sorry, my comments were indeed a little short on elucidation.  :)

The cited doc suggest that setting initial_token to 0 on the first node
simplifies load balancing as you later expand the cluster . . . .  If this
is unset (the default), Cassandra picks a token number randomly.

A more complete explanation might look something like:

. . . it is recommended to set the initial token's value to zero.  This
simplifies load balancing as you later expand the cluster, since the node
starting at 0 will never need to be moved to a new token.  Also, if this is
unset (the default), Cassandra picks a token number randomly, which can lead
to hot spots in the ring.


On Thu, Jan 20, 2011 at 12:59 PM, Brandon Williams dri...@gmail.com wrote:

 On Thu, Jan 20, 2011 at 2:14 PM, Robert Coli rc...@digg.com wrote:

 On Thu, Jan 20, 2011 at 11:55 AM, Eric Gilmore e...@riptano.com wrote:
  Also, in the Getting Started page, we note that it may be best to set
  initial_token to 0 on the very first node that you start.

 Could you expand a bit on the reasons for and implications of this,
 for our collective elucidation? :)


 Because then the node never has to move.  Same would be true of 2**127, but
 zero is mnemonically easier. :)

 -Brandon




-- 
*Eric Gilmore
*
Consulting Technical Writer
Riptano, Inc.
Ph: 510 684 9786  (cell)


Re: If one seed node crash, how can I add one seed node?

2010-12-07 Thread Eric Gilmore
What would comprise a sane and reasonably balanced list?  Should there be a
certain proportion of seeds per total nodes?  Any other considerations
besides a) list must be identical on all nodes and b) you can't
auto-bootstrap a seed node?

I'm new to thinking about this setting, but it sounds like this discussion
may be approaching some best-practice guidelines.

On Tue, Dec 7, 2010 at 1:01 PM, Jonathan Ellis jbel...@gmail.com wrote:

 The gossip-to-seed each round is to prevent cluster partitions, so if
 you're following correct procedure and making every node's seed list
 identical, then any potential new nodes gossiping to one of the old seeds
 means it is still harmless for old nodes not to gossip to the new one until
 the next restart.


 On Tue, Dec 7, 2010 at 2:10 PM, Aaron Morton aa...@thelastpickle.comwrote:

 Ryan,
 I've not checked with the code but the wiki docs for the Gossip Protocol
 say it makes use of the seed list.
 http://wiki.apache.org/cassandra/ArchitectureGossip

 During each gossip round a node will try to gossip to one seed node.

 Which made me think keeping the list sane and reasonably balanced was a
 good idea. Obviously would not matter too much on a small cluster though.

 Aaron


 On 08 Dec, 2010,at 07:16 AM, Ryan King r...@twitter.com wrote:

 Note that there's not really anything special about the seed node and its
 all relative– the cluster doesn't necessarily have to agreed on who the
 seeds are.

 So, to bring up a new node to replace the old seed, just set the new
 node's seed to any existing node in the system. After that you can go back
 and make the setting consistent across the cluster.

 -ryan

 On Tue, Dec 7, 2010 at 7:01 AM, Nick Bailey n...@riptano.com wrote:

 Yes, cassandra only reads the configuration when it starts up. However
 seed nodes are only used when a node starts. After that they aren't needed.
 There should be no reason to restart your cluster after adding a seed node
 to  your cluster.



 On Tue, Dec 7, 2010 at 2:09 AM, aaron morton aa...@thelastpickle.comwrote:

 You will need to restart the nodes for them to pickup changes in
 cassandra.yaml


 Aaron


 On 7 Dec 2010, at 16:32, lei liu wrote:

 Thanks Nick.

 After I add the new node as seed node in the configuration for all of my
 nodes, do I need to restart all of my nodes?

 2010/12/7 Nick Bailey n...@riptano.com

 The node can be set as a seed node at any time. It does not need to be
 a seed node when it joins the cluster. You should remove it as a seed 
 node,
 set autobootstrap to true and let it join the cluster. Once it has joined
 the cluster you should add it as a seed node in the configuration for all 
 of
 your nodes.



 On Mon, Dec 6, 2010 at 9:59 AM, lei liu liulei...@gmail.com wrote:

 Thank Jonathan for your reply.

 How  can I bootstrap the node into cluster, I know if the node is seed
 node, I can't set AutoBootstrap to true.

 2010/12/6 Jonathan Ellis jbel...@gmail.com

 set it as a seed _after_ bootstrapping it into the cluster.


 On Mon, Dec 6, 2010 at t5:01 AM, lei liu liulei...@gmail.com
 wrote:
  After one seed node crash, I want to add one node as seed node, I
 set
  auto_bootstrap to true, but the new node don't migrate data from
 other
  node s.
 
  How can I add one new seed node and let the node to migrate data
 from other
  nodes?
 
 
 
  Thanks,
 
  LiuLei
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com










 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com




-- 
*Eric Gilmore
*
Consulting Technical Writer
Riptano, Inc.
Ph: 510 684 9786  (cell)