[Cassandra Wiki] Update of "StorageConfiguration" by Jo nHermes

Apache Wiki Tue, 24 Aug 2010 15:09:42 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "StorageConfiguration" page has been changed by JonHermes.
http://wiki.apache.org/cassandra/StorageConfiguration?action=diff&rev1=28&rev2=29

--------------------------------------------------

  {{{
  bin/schematool HOST PORT import
  }}}
+ 
+ 
+ 
  = Config Overview =
  Not going to cover every value, just the interesting ones. When in doubt, 
check out the comments on the default cassandra.yaml as they're well documented 
there.
  
  == per-Cluster (Global) Settings ==
- === authenticator ===
+  * authenticator
  Allows for pluggable authentication of users, which defines whether it is 
necessary to call the Thrift 'login' method, and which parameters are required 
to login. The default '!AllowAllAuthenticator' does not require users to call 
'login': any user can perform any operation. The other built in option is 
'!SimpleAuthenticator', which requires users and passwords to be defined in 
property files, and for users to call login with a valid combo.
  
  Default is: 'org.apache.cassandra.auth.AllowAllAuthenticator', a no-op.
  
- === auto_bootstrap ===
+  * auto_bootstrap
  Set to 'true' to make new [non-seed] nodes automatically migrate the right 
data to themselves.  (If no InitialToken is specified, they will pick one  such 
that they will get half the range of the most-loaded node.) If a node starts up 
without bootstrapping, it will mark itself bootstrapped so that you can't 
subsequently accidently bootstrap a node with data on it.  (You can reset this 
by wiping your data and commitlog directories.)
  
  Off by default so that new clusters don't bootstrap immediately.  You should 
turn this on when you start adding new nodes to a cluster that already has data 
on it.
  
- === cluster_name ===
+  * cluster_name
  The name of this cluster.  This is mainly used to prevent machines in one 
logical cluster from joining another.
  
- === commitlog_directory ===
+  * commitlog_directory and data_file_directories
  /var/lib/cassandra/commitlog
  
+  * concurrent_reads and concurrent_writes
- === concurrent_reads ===
- === concurrent_writes ===
  8
- 
  32
  
- === disk_access_mode ===
+  * disk_access_mode
  auto, mmap, mmap_index_only, standard
  
- === dynamic_snitch ===
+  * dynamic_snitch and endpoint_snitch
- false
+ false. 
- 
- === endpoint_snitch ===
  !EndPointSnitch: Setting this to the class that implements 
{{{IEndPointSnitch}}} which will see if two endpoints are in the same data 
center or on the same rack. Out of the box, Cassandra provides 
{{{org.apache.cassandra.locator.RackInferringSnitch}}}
  
  Note: this class will work on hosts' IPs only. There is no configuration 
parameter to tell Cassandra that a node is in rack ''R'' and in datacenter 
''D''. The current rules are based on the two methods:
@@ -60, +59 @@

  
   * isInSameDataCenter: Look at the IP Address of the two hosts. Compare the 
2nd octet. If they are the same then the hosts are in the same datacenter else 
different datacenter.
  
+  * memtable_flush_after_mins, memtable_operations_in_millions, and 
memtable_throughput_in_mb
- === memtable_flush_after_mins ===
- === memtable_operations_in_millions ===
- === memtable_throughput_in_mb ===
  60 0.3 64
  
- === partitioner ===
- org.apache.cassandra.dht.RandomPartitioner
+  * partitioner
+ Partitioner: any {{{IPartitioner}}} may be used, including your own as long 
as it is on the classpath.  Out of the box, Cassandra provides 
{{{org.apache.cassandra.dht.RandomPartitioner}}}, 
{{{org.apache.cassandra.dht.OrderPreservingPartitioner}}}, and 
{{{org.apache.cassandra.dht.CollatingOrderPreservingPartitioner}}}. 
(CollatingOPP colates according to EN,US rules, not naive byte ordering.  Use 
this as an example if you need locale-aware collation.) Range queries require 
using an order-preserving partitioner.
  
+ Achtung!  Changing this parameter requires wiping your data directories, 
since the partitioner can modify the !sstable on-disk format.
- === rpc_timeout_in_ms ===
- 10000
  
- === seeds ===
+ If you are using an order-preserving partitioner and you know your key 
distribution, you can specify the token for this node to use. (Keys are sent to 
the node with the "closest" token, so distributing your tokens equally along 
the key distribution space will spread keys evenly across your cluster.)  This 
setting is only checked the first time a node is started.
+ 
+ This can also be useful with {{{RandomPartitioner}}} to force equal spacing 
of tokens around the hash space, especially for clusters with a small number of 
nodes.
+ 
+ Cassandra uses MD5 hash internally to hash the keys to place on the ring in a 
{{{RandomPartitioner}}}. So it makes sense to divide the hash space equally by 
the number of machines available using {{{InitialToken}}} ie, If there are 10 
machines, each will handle 1/10th of maximum hash value) and expect that the 
machines will get a reasonably equal load.
+ 
+ With {{{OrderPreservingPartitioner}}} the keys themselves are used to place 
on the ring. One of the potential drawback of this approach is that if rows are 
inserted with sequential keys, all the write load will go to the same node.
+ 
+  * seeds
  Never use a node's own address as a seed if you are bootstrapping it by 
setting autobootstrap to true!
  
- === thrift_framed_transport_size_in_mb ===
+  * thrift_framed_transport_size_in_mb
  15 by default. Setting this to 0 is how to denote using unframed transport.
  
  == per-Keyspace Settings ==
+  * replica_placement_strategy and replication_factor ===
+ Strategy: Setting this to the class that implements 
{{{IReplicaPlacementStrategy}}} will change the way the node picker works. Out 
of the box, Cassandra provides 
{{{org.apache.cassandra.locator.RackUnawareStrategy}}} and 
{{{org.apache.cassandra.locator.RackAwareStrategy}}} (place one replica in a 
different datacenter, and the others on different racks in the same one.)
+ 
+ Note that the replication factor (RF) is the ''total'' number of nodes onto 
which the data will be placed.  So, a replication factor of 1 means that only 1 
node will have the data.  It does '''not''' mean that one ''other'' node will 
have the data.
+ 
  == per-ColumnFamily Settings ==
- == per-Column Settings ==
+   * comment and name
+ You can describe a ColumnFamily in plain text by setting this property.
  
+   * compare_with
- 
- 
- 
- == Keyspaces and ColumnFamilies ==
- Keyspaces and {{{ColumnFamilies}}}: A {{{ColumnFamily}}} is the Cassandra 
concept closest to a relational table.  {{{Keyspaces}}} are separate groups of 
{{{ColumnFamilies}}}.  Except in very unusual circumstances you will have one 
Keyspace per application.
- 
- There is an implicit keyspace named 'system' for Cassandra internals.
- 
- {{{
- <Keyspaces>
-  <Keyspace Name="Keyspace1">
- }}}
- ''[New in 0.5:''
- 
- The fraction of keys per sstable whose locations we keep in memory in "mostly 
LRU" order.  (JUST the key locations, NOT any column values.) The amount of 
memory used by the default setting of 0.01 is comparable to the amount used by 
the internal per-sstable key index. Consider increasing this if you have fewer, 
wider rows. Set to 0 to disable entirely.
- 
- {{{
-       <KeysCachedFraction>0.01</KeysCachedFraction>
- }}}
- '']''
- 
- ''[New in 0.6: !EndPointSnitch, !ReplicaPlacementStrategy and 
!ReplicationFactor became configurable per keyspace.  Prior to that they were 
global settings.]''
- 
- === ReplicaPlacementStrategy and ReplicationFactor ===
- Strategy: Setting this to the class that implements 
{{{IReplicaPlacementStrategy}}} will change the way the node picker works. Out 
of the box, Cassandra provides 
{{{org.apache.cassandra.locator.RackUnawareStrategy}}} and 
{{{org.apache.cassandra.locator.RackAwareStrategy}}} (place one replica in a 
different datacenter, and the others on different racks in the same one.)
- 
- {{{
- 
<ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
- }}}
- Number of replicas of the data
- 
- {{{
- <ReplicationFactor>1</ReplicationFactor>
- }}}
- Note that the replication factor (RF) is the ''total'' number of nodes onto 
which the data will be placed.  So, a replication factor of 1 means that only 1 
node will have the data.  It does '''not''' mean that one ''other'' node will 
have the data.
- 
- === ColumnFamilies ===
  The {{{CompareWith}}} attribute tells Cassandra how to sort the columns for 
slicing operations.  The default is {{{BytesType}}}, which is a straightforward 
lexical comparison of the bytes in each column. Other options are 
{{{AsciiType}}}, {{{UTF8Type}}}, {{{LexicalUUIDType}}}, {{{TimeUUIDType}}}, and 
{{{LongType}}}.  You can also specify the fully-qualified class name to a class 
of your choice extending {{{org.apache.cassandra.db.marshal.AbstractType}}}.
  
   * {{{SuperColumns}}} have a similar {{{CompareSubcolumnsWith}}} attribute.
@@ -130, +104 @@

  
  (To get the closest approximation to 0.3-style {{{supercolumns}}}, you would 
use {{{CompareWith=UTF8Type CompareSubcolumnsWith=LongType}}}.)
  
- If {{{FlushPeriodInMinutes}}} is configured and positive, it will be flushed 
to disk with that period whether it is dirty or not.  This is intended for 
lightly-used {{{columnfamilies}}} so that they do not prevent commitlog 
segments from being purged.
+   * gc_grace_seconds
+   * keys_cached and rows_cached
+   * preload_row_cache
+   * read_repair_chance
+   * default_validation_class
  
- ''[New in 0.5:'' An optional `Comment` attribute may be used to attach 
additional human-readable information about the column family to its 
definition. '']''
+ == per-Column Settings ==
+   * validation_class
+   * index_type
  
+ 
- {{{
- <ColumnFamily CompareWith="BytesType"
-        Name="Standard1"
-        FlushPeriodInMinutes="60"/>
- <ColumnFamily CompareWith="UTF8Type"
-        Name="Standard2"/>
- <ColumnFamily CompareWith="TimeUUIDType"
-        Name="StandardByUUID1"/>
- <ColumnFamily ColumnType="Super"
-        CompareWith="UTF8Type"
-        CompareSubcolumnsWith="UTF8Type"
-        Name="Super1"
-        Comment="A column family with supercolumns, whose column and subcolumn 
names are UTF8 strings"/>
- }}}
  == Partitioner ==
- Partitioner: any {{{IPartitioner}}} may be used, including your own as long 
as it is on the classpath.  Out of the box, Cassandra provides 
{{{org.apache.cassandra.dht.RandomPartitioner}}}, 
{{{org.apache.cassandra.dht.OrderPreservingPartitioner}}}, and 
{{{org.apache.cassandra.dht.CollatingOrderPreservingPartitioner}}}. 
(CollatingOPP colates according to EN,US rules, not naive byte ordering.  Use 
this as an example if you need locale-aware collation.) Range queries require 
using an order-preserving partitioner.
- 
- Achtung!  Changing this parameter requires wiping your data directories, 
since the partitioner can modify the !sstable on-disk format.
- 
- Example:
- 
- {{{
- <Partitioner>org.apache.cassandra.dht.RandomPartitioner</Partitioner>
- }}}
- If you are using an order-preserving partitioner and you know your key 
distribution, you can specify the token for this node to use. (Keys are sent to 
the node with the "closest" token, so distributing your tokens equally along 
the key distribution space will spread keys evenly across your cluster.)  This 
setting is only checked the first time a node is started.
- 
- This can also be useful with {{{RandomPartitioner}}} to force equal spacing 
of tokens around the hash space, especially for clusters with a small number of 
nodes.
- 
- {{{
- <InitialToken></InitialToken>
- }}}
- Cassandra uses MD5 hash internally to hash the keys to place on the ring in a 
{{{RandomPartitioner}}}. So it makes sense to divide the hash space equally by 
the number of machines available using {{{InitialToken}}} ie, If there are 10 
machines, each will handle 1/10th of maximum hash value) and expect that the 
machines will get a reasonably equal load.
- 
- With {{{OrderPreservingPartitioner}}} the keys themselves are used to place 
on the ring. One of the potential drawback of this approach is that if rows are 
inserted with sequential keys, all the write load will go to the same node.
- 
- 
  == Miscellaneous ==
  Time to wait for a reply from other nodes before failing the command

[Cassandra Wiki] Update of "StorageConfiguration" by Jo nHermes

Reply via email to