[Cassandra Wiki] Update of "GettingStarted" by MakiWatanabe

Apache Wiki Thu, 23 Feb 2012 19:29:18 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "GettingStarted" page has been changed by MakiWatanabe:
http://wiki.apache.org/cassandra/GettingStarted?action=diff&rev1=61&rev2=62

Comment:
Simplify Step 1 and add more explicit configuration examples. Move build 
information to the other page.

  == Cassandra documentation from DataStax ==
  !DataStax's latest [[http://www.datastax.com/docs/1.0/index|Cassandra 
documentation]] covers topics from installation to troubleshooting.  
Documentation for older releases is also available.
- 
+  
  == Introduction ==
- This document aims to provide a few easy to follow steps to take the 
first-time user from installation, to an operational Cassandra cluster.
+ This document aims to provide a few easy to follow steps to take the 
first-time user from installation, to running single node Cassandra, and 
overview to configure multinode cluster.
- 
+ Cassandra is meant to run on a cluster of nodes, but will run equally well on 
a single machine. This is a handy way of getting familiar with the software 
while avoiding the complexities of a larger system.
+   
  == Step 0: Prerequisites and connection to the community ==
  Cassandra requires the most stable version of Java 1.6 you can deploy.  For 
Sun's jvm, this means at least u19; u21 is better.  Cassandra also runs on the 
IBM jvm, and should run on jrockit as well.
- 
+  
  The best way to ensure you always have up to date information on the project, 
releases, stability, bugs, and features is to subscribe to the users mailing 
list ([[mailto:user-subscr...@cassandra.apache.org|subscription required]]) and 
participate in the #cassandra channel on 
[[http://webchat.freenode.net/?channels=#cassandra|IRC]].
- 
+  
  <<Anchor(picking_a_version)>>
+ <<Anchor(download_a_kit)>>
+  
+ == Step 1: Download Cassandra Kit ==
+  
- 
- == Step 1: Picking a version ==
- At any given time, there are a number of different versions available to 
install:
- 
- === Stable releases ===
- Cassandra stable releases are well tested and reasonably free of serious 
problems, (or at least the problems are known and well documented). If you are 
setting up a production environment, a stable release is what you want.
- 
- Download links for the latest stable release can always be found on the 
[[http://cassandra.apache.org/download|website]].
+  * Download links for the latest stable release can always be found on the 
[[http://cassandra.apache.org/download|website]].
+  * Users of Debian or Debian-based derivatives can install the latest stable 
release in package form, see DebianPackaging for details.
+  * Users of RPM-based distributions can get packages from 
[[http://www.datastax.com/blog/announcing-rpms-cassandra|Datastax]].
+  * If you are interested in building Cassandra from source, please refer to 
[[HowToBuild|How to Build]] page.
+  
+ For more details about misc builds, please refer to 
[[VersionsAndBuilds|Cassandra versions and builds]] page.
+  
- 
- === Betas and release candidates ===
- Betas are prototype releases considered ready for user testing, and release 
candidates have the potential to become the next stable release. These releases 
represent the state-of-the-art so are often the best place to start, and since 
APIs and on-disk storage formats can change between major versions this can 
also save you from an upgrade. The testing and feedback is also highly 
appreciated.
- 
- === Nightly builds ===
- Nightly builds represent the current state of development as of the time of 
the build. They contain all of the previous day's new features, fixes, and 
newly introduced bugs. The only guarantee they come with is that they 
successfully build and the unit tests pass. Nightly builds are a handy way of 
testing recent changes, or accessing the latest features and fixes not found in 
beta or release candidates, but there is some risk of them being buggy.
- 
- The most recent nightly build can be downloaded 
[[http://hudson.zones.apache.org/hudson/job/Cassandra/lastSuccessfulBuild/artifact/cassandra/build/|here]].
- 
- === Git ===
- Cassandra's git repository is where all active development takes place. 
Anyone interested in contributing to the project should use a checkout of 
trunk. If you do run from git, be sure to update frequently, and subscribe to 
the [[mailto:dev-subscr...@cassandra.apache.org?subject=subscribe|mailing 
list]] to stay abreast of the latest developments.
- 
- Instructions for checking out the source code can always be found on the 
[[http://cassandra.apache.org/download|website]].
- 
  <<Anchor(running_a_single_node)>>
+  
+ == Step 2: Edit configuration files ==
+  
+ ## Since there isn't currently an installation method per se, the easiest 
solution is to simply run Cassandra from an extracted archive or Git checkout 
(see: [[#picking_a_version|Picking a version]]). Also, unless you've downloaded 
a binary distribution, you'll need to compile the software by invoking `ant` 
from the top-level directory.
+  
+ === Step 2.1: Edit cassandra.yaml ===
+ The distribution's sample configuration `conf/cassandra.yaml` contains 
reasonable defaults for single node operation, but you will need to make sure 
that the paths exist for '''data_file_directories''', 
'''commitlog_directory''', and '''saved_caches_directory'''.
  
- == Step 2: Running a single node ==
- Cassandra is meant to run on a cluster of nodes, but will run equally well on 
a single machine. This is a handy way of getting familiar with the software 
while avoiding the complexities of a larger system.
+ Verify '''storage_port''' and '''rpc_port''' are not conflict with other 
service on your computer.
+ By default, Cassandra uses 7000 for storage_port, and 9160 for rpc_port. The 
`storage_port` must be identical between Cassandra nodes in a cluster. 
Cassandra client applications will use `rpc_port` to connect to Cassandra. 
+  
+ It will be a good idea to change '''cluster_name''' to avoid unnecessary 
conflict with existing clusters.
  
- Since there isn't currently an installation method per se, the easiest 
solution is to simply run Cassandra from an extracted archive<<FootNote(Users 
of Debian or Debian-based derivatives can install the latest stable release in 
package form, see DebianPackaging for details.)>><<FootNote(Users of RPM-based 
distributions can get packages from 
[[http://www.datastax.com/blog/announcing-rpms-cassandra|Datastax]])>> or Git 
checkout (see: [[#picking_a_version|Picking a version]]). Also, unless you've 
downloaded a binary distribution, you'll need to compile the software by 
invoking `ant` from the top-level directory.
+ '''initial_token'''. You can leave it blank, but I recommend you to set it to 
0 if you are configuring your first node.
  
- The distribution's sample configuration `conf/cassandra.yaml` contains 
reasonable defaults for single node operation, but you will need to make sure 
that the paths exist for `data_file_directories`, `commitlog_directory`, and 
`saved_caches_directory`. Additionally, take a minute now to look over the 
logging configuration in `conf/log4j.properties` and make sure that directories 
exist for the configured log file(s) as well.
+ === Step 2.2: Edit log4j-server.properties ===
+ `conf/log4j.properties` contains a path for the log file. Edit the line if 
you need.
+ {{{
+ # Edit the next line to point to your logs directory
+ log4j.appender.R.File=/var/log/cassandra/system.log
+ }}}
+  
+ === Step 2.3: Edit cassandra-env.sh ===
+ Cassandra has JMX (Java Management Extensions) interface, and the JMX_PORT is 
defined in `conf/cassandra-env.sh`.
+ Edit following line if you need.
+ {{{
+ # Specifies the default port over which Cassandra will be available for
+ # JMX connections.
+ JMX_PORT="7199"
+ }}}
+  
+ By default, Cassandra will allocate memory based on physical memory your 
system has. 
+ For example it will allocate 1GB heap on 2GB system, and 2GB heap on 8GB 
system.
+ If you want to specify Cassandra heap size,  remove  leading pound sign(#) on 
the following lines and specify memory size for them. 
+ {{{
+ #MAX_HEAP_SIZE="4G"
+ #HEAP_NEWSIZE="800M"
+ }}}
+ If you are not familiar with Java GC, 1/4 of MAX_HEAP_SIZE may be a good 
start point for  HEAP_NEWSIZE.
+  
+ Cassandra will need more than few GB heap for production use, but you can run 
it with smaller footprint for test drive. If you want to assign 96MB as max, 
edit the lines as following.
+ {{{ 
+ MAX_HEAP_SIZE="96M"
+ HEAP_NEWSIZE="24M"
+ }}}
+ If you face OutOfMemory exceptions or massive GCs with this configuration, 
increase these values.
+ '''Don't start your production service with such tiny heap configuration!'''
  
+  Note for Mac Uses:
- Some people running OS X have trouble getting Java 6 to work. If you've kept 
up with Apple's updates, Java 6 should already be installed (it comes in Mac OS 
X 10.5 Update 1). Unfortunately, Apple does not default to using it. What you 
have to do is change your `JAVA_HOME` environment setting to 
`/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home` and add 
`/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin` to the 
beginning of your `PATH`.
+  Some people running OS X have trouble getting Java 6 to work. If you've kept 
up with Apple's updates, Java 6 should already be installed (it comes in Mac OS 
X 10.5  Update 1). Unfortunately, Apple does not default to using it. What you 
have to do is change your `JAVA_HOME` environment setting to 
`/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home` and add 
`/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin` to the 
beginning of your `PATH`.
  
- And now for the moment of truth, start up Cassandra by invoking 
`bin/cassandra -f` from the command line<<FootNote(To learn more about 
controlling the behavior of startup scripts, see RunningCassandra.)>>. The 
service should start in the foreground and log gratuitously to standard-out. 
Assuming you don't see messages with scary words like "error", or "fatal", or 
anything that looks like a Java stack trace, then chances are you've succeeded. 
To be certain though, take some time to try out the examples in CassandraCli 
before moving on (note: if you are using Cassandra 0.7.0, you'll need to load 
the demo Keyspaces first using JMX, see 
http://wiki.apache.org/cassandra/FAQ#no_keyspaces, or even better follow 
testing instructions on the README of the installation folder). Also, if you 
run into problems, Don't Panic, calmly proceed to [[#if_something_goes_wrong|If 
Something Goes Wrong]].
+ == Step 3: Start up Cassandra ==
+ And now for the moment of truth, start up Cassandra by invoking 
`bin/cassandra -f` from the command line<<FootNote(To learn more about 
controlling the behavior of startup scripts, see RunningCassandra.)>>. The 
service should start in the foreground and log gratuitously to standard-out. 
Assuming you don't see messages with scary words like "error", or "fatal", or 
anything that looks like a Java stack trace, then chances are you've succeeded. 
  
+ Press "Control-C" to stop Cassandra.
+ 
+ If you start up Cassandra without "-f" option, it will run in background, so 
you need to kill the process to stop.
+ 
+ == Step 4: Using cassandra-cli ==
+ 
+ `bin/cassandra-cli` is a interactive command line interface for Cassandra. 
You can define schema, store and fetch data with the tool.
+ Run following command to connect to your Cassandra instance.
+ {{{
+ bin/cassandra-cli -h host -p rpc_port
+ 
+ example:
+ % bin/cassandra-cli -h 127.0.0.1 -p 9160
+ }}}
+ 
+ Then you will see following cassandra-cli prompt.
+ 
+ {{{
+ Connected to: "Test Cluster" on 127.0.0.1/9160
+ Welcome to Cassandra CLI version 1.0.7
+ 
+ Type 'help;' or '?' for help.
+ Type 'quit;' or 'exit;' to quit.
+ 
+ [default@unknown] 
+ }}}
+ 
+ You can access to the online help with 'help;' command. You need semicolon(;) 
at end to complete a command in cli.
+ 
+ {{{
+ [default@unknown] help;
+ }}}
+ 
+ First, create a keyspace for your test.
+ 
+ {{{
+ [default@unknown] create keyspace DEMO;  
+ f53dff10-5bd8-11e1-0000-915a024292eb
+ Waiting for schema agreement...
+ ... schemas agree across the cluster
+ [default@unknown] 
+ }}}
+ 
+ Don't forget to add semicolon(;) at end of the command.
+ 
+ Second, authenticate you to use DEMO keyspace.
+ {{{
+ [default@unknown] use DEMO;
+ Authenticated to keyspace: DEMO
+ [default@DEMO]
+ }}}
+ 
+ Third, create a column family ''Users'', just for test.
+ {{{
+ [default@DEMO] create column family Users;
+ 18a3e2d0-5bd9-11e1-0000-915a024292eb
+ Waiting for schema agreement...
+ ... schemas agree across the cluster
+ [default@DEMO]
+ }}}
+ 
+ Now you can store data into ''Users'' column family.
+ 
+ {{{
+ [default@DEMO] set Users[utf8('1234')][utf8('name')] = utf8('scott');
+ Value inserted.
+ Elapsed time: 10 msec(s).
+ [default@DEMO] set Users[utf8('1234')][utf8('password')] = utf8('tiger');
+ Value inserted.
+ Elapsed time: 10 msec(s).
+ [default@DEMO]
+ }}}
+ 
+ You have inserted a row to Users column family. The row key is '1234', and we 
set the 2 columns in the row: column named 'name', and 'password'.
+ 'utf8()' means  to treat the data as UTF8 string. Refer to 'help set;' for 
more details.
+ Now let's try to fetch data you inserted.
+ 
+ {{{
+ [default@DEMO] get Users[utf8('1234')];
+ => (column=6e616d65, value=73636f7474, timestamp=1330051295937000)
+ => (column=70617373776f7264, value=7469676572, timestamp=1330051308368000)
+ 
+ Returned 2 results.
+ Elapsed time: 9 msec(s).
+ [default@DEMO]
+ }}}
+ 
+ You may notice that the column name and value are not displayed as string.
+ Use 'assume' command to let Cassandra to know the data type of the key, 
column name and value.
+ 
+ {{{
+ [default@DEMO] assume Users keys as utf8;
+ Assumption for column family 'Users' added successfully.
+ [default@DEMO] assume Users comparator as utf8;
+ Assumption for column family 'Users' added successfully.
+ [default@DEMO] assume Users validator as utf8;      
+ Assumption for column family 'Users' added successfully.
+ [default@DEMO] get Users['1234'];
+ => (column=name, value=scott, timestamp=1330051295937000)
+ => (column=password, value=tiger, timestamp=1330051308368000)
+ 
+ Returned 2 results.
+ Elapsed time: 9 msec(s).
+ [default@DEMO]
+ }}}
+ 
+ Please note that we didn't use "utf8()" for the row key this time.
+ You can define the data type as meta data of the column family. Check 'help 
update column family;' and 'help create column family;' for more details.
+ 
+ To be certain though, take some time to try out the examples in CassandraCli 
before moving on
+ Also, if you run into problems, Don't Panic, calmly proceed to 
[[#if_something_goes_wrong|If Something Goes Wrong]].
+  
- Users of recent Linux distributions and Mac OS X Snow Leopard should be able 
to start up Cassandra simply by untarring and invoking `bin/cassandra -f` with 
root privileges. Snow Leopard ships with Java 1.6.0 and does not require 
changing the `JAVA_HOME` environment variable or adding any directory to your 
`PATH`. On Linux just make sure you have a working Java JDK package installed 
such as the `openjdk-6-jdk` on Ubuntu Lucid Lynx.
+  Users of recent Linux distributions and Mac OS X Snow Leopard should be able 
to start up Cassandra simply by untarring and invoking `bin/cassandra -f` with 
root privileges. Snow Leopard ships with Java 1.6.0 and does not require 
changing the `JAVA_HOME` environment variable or adding any directory to your 
`PATH`. On Linux just make sure you have a working Java JDK package installed 
such as the `openjdk-6-jdk` on Ubuntu Lucid Lynx.
  
- == Step 3: Running a cluster ==
+ == Configuring Multinode Cluster ==
+ 
+ Now you have single working Cassandra node. It is a Cassandra cluster which 
has only one node. By adding more nodes, you can make it a multi node cluster.
+ 
- Setting up a Cassandra cluster is ''almost'' as simple as repeating 
[[#running_a_single_node|Step 2]] for each node in your cluster. There are a 
few minor exceptions though.
+ Setting up a Cassandra cluster is ''almost'' as simple as repeating the above 
procedures  for each node in your cluster. There are a few minor exceptions 
though.
- 
+  
- Cassandra nodes exchange information about one another using a mechanism 
called Gossip, but to get the ball rolling a newly started node needs to know 
of at least one other, this is called a `Seed`. It's customary to pick a small 
number of relatively stable nodes to serve as your seeds, but there is no 
hard-and-fast rule here. Do make sure that each seed also knows of at least one 
other, remember, the goal is to avoid a chicken-and-egg scenario and provide an 
avenue for all nodes in the cluster to discover one another.
+ Cassandra nodes exchange information about one another using a mechanism 
called Gossip, but to get the ball rolling a newly started node needs to know 
of at least one other, this is called a '''Seed'''. It's customary to pick a 
small number of relatively stable nodes to serve as your seeds, but there is no 
hard-and-fast rule here. Do make sure that each seed also knows of at least one 
other, remember, the goal is to avoid a chicken-and-egg scenario and provide an 
avenue for all nodes in the cluster to discover one another.
- 
+  
- In addition to seeds, you'll also need to configure the IP interface to 
listen on for Gossip and Thrift, (`ListenAddress` and `ThriftAddress` 
respectively). Use a `ListenAddress` that will be reachable from the 
`ListenAddress` used on all other nodes, and a `ThriftAddress` that will be 
accessible to clients.
+ In addition to seeds, you'll also need to configure the IP interface to 
listen on for Gossip and Thrift, ('''listen_address''' and '''rpc_address''' 
respectively). Use a 'listen_address` that will be reachable from the 
`listen_address` used on all other nodes, and a `rpc_address` that will be 
accessible to clients.
+  
+ One other thing you need to care at multi node cluster is '''Token'''. Each 
node in the cluster owns a part of token range  from 0 to 2^127-1. 
+ If the Nth node in the cluster has token value T(N), the node owns range from 
T(N-1)+1 to T(N).  Cassandra decide nodes where a data should be stored based 
on the consistent mapping of the row key and token range (refer to 
RandomPartitioner, ByteOrderedPartitioner). 
  
+ The token can be assigned to node by '''initial_token''' parameter in 
cassandra.yaml. The parameter is effective only at the first boot of the node. 
Once you boot a node, use 'nodetool move' command to change the assigned token. 
 You need to specify appropriate initial_token for each node to balance data 
load across the nodes.  Here is a python script to calculate balanced tokens.
+ {{{
+ # Number of nodes in the cluster
+ num_node = 4
+ 
+ for n in range(num_node):
+     print int(2**127 / num_node * n)
+ }}}
+ 
- Once everything is configured and the nodes are running, use the 
`bin/nodetool` utility to verify a properly connected cluster. For example:
+ Once everything is configured and the nodes are running, use the 
`bin/nodetool ring` utility to verify a properly connected cluster. For example:
- 
+  
  {{{
- eevans@achilles:~$ bin/nodetool -host 98.139.220.175 ring
+ eevans@achilles:‾$ bin/nodetool -host 98.139.220.175 -p 7199 ring
  Address       Status     Load          Range                                  
    Ring
                                         169048975998562660269742699624378098572
  98.139.220.175  Up         0.02 GB     14183696824377310051808173385764689249 
    |<--|
@@ -67, +226 @@

  98.139.220.176  Up         0.13 GB     42530828068625072228863933889289238187 
    |-->|
  }}}
  Advanced cluster management is described in [[Operations]].
- 
+  
  If you don't yet have access to hardware for a Cassandra cluster you can try 
it out on EC2 with CloudConfig.
  
+ For more details about configuring multi node cluster, please refer to 
[[MultinodeCluster]].
+  
- == Step 4: Write your application ==
+ == Write your application ==
  The recommended way to communicate with Cassandra in your application is to 
use a [[http://wiki.apache.org/cassandra/ClientOptions|higher-level client]]. 
These provide programming language specific API:s for talking to Cassandra in a 
variety of languages. The details will vary depending on programming language 
and client, but in general using a higher-level client will mean that you have 
to write less code and get several features for free that you would otherwise 
have to write yourself.
- 
+  
  That said, it is useful to know that Cassandra uses 
[[http://thrift.apache.org/|Thrift]] for its external client-facing API. 
Cassandra's main API/RPC/Thrift port is 9160. Thrift supports a 
[[http://svn.apache.org/viewvc/thrift/trunk/lib/|wide variety of languages]] so 
you can code your application to use Thrift directly if you so chose (but again 
we recommend a [[http://wiki.apache.org/cassandra/ClientOptions|high-level 
client]] where available).
- 
+  
  Important note: If you intend to use thrift directly, you need to install a 
version of thrift that matches the revision that your version of Cassandra 
uses. InstallThrift
- 
+  
- Cassandra's main API/RPC/Thrift port is 9160. It is a common mistake for API 
clients to connect to the JMX port instead.
+ Cassandra's main API/RPC/Thrift port is 9160 by default, which is defined as 
rpc_port in cassandra.yaml. It is a common mistake for API clients to connect 
to the JMX port instead.
- 
+  
  Checking out a demo application like 
[[http://github.com/twissandra/twissandra|Twissandra]] (Python + Django) will 
also be useful.
- 
+  
  <<Anchor(if_something_goes_wrong)>>
- 
+  
  == If Something Goes Wrong ==
  If you followed the steps in this guide and failed to get up and running, 
we'd love to help. Here's what we need.
- 
+  
   1. If you are running anything other than a stable release, please upgrade 
first and see if you can still reproduce the problem.
   1. Make sure debug logging is enabled (hint: `conf/log4j.properties`) and 
save a copy of the output.
   1. Search the [[http://news.gmane.org/gmane.comp.db.cassandra.user|mailing 
list archive]] and see if anyone has reported a similar problem and what, if 
any resolution they received.
   1. Ditto for the [[https://issues.apache.org/jira/browse/CASSANDRA|bug 
tracking system]].
   1. See if you can put together a unit test, script, or application that 
reproduces the problem.
- 
+  
  Finally, post a message with all relevant details to the list 
([[mailto:user-subscr...@cassandra.apache.org|subscription required]]), or hop 
onto [[http://webchat.freenode.net/?channels=#cassandra|IRC]] (network 
irc.freenode.net, channel #cassandra) and let us know.
- 
+  
  <<BR>> <<BR>>
- 
+  
  ----
  '''Footnotes:'''

[Cassandra Wiki] Update of "GettingStarted" by MakiWatanabe

Reply via email to