Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "LargeDataSetConsiderations_JP" page has been changed by MakiWatanabe. The comment on this change is: Translation in progress.... http://wiki.apache.org/cassandra/LargeDataSetConsiderations_JP?action=diff&rev1=17&rev2=18 -------------------------------------------------- ## page was copied from LargeDataSetConsiderations - = Using Cassandra for large data sets (lots of data per node) = + = Cassandraで多量のデータを扱う場合の注意点 = - This page aims to to give some advise as to the issues one may need to consider when using Cassandra for large data sets (meaning hundreds of gigabytes or terrabytes per node). The intent is not to make original claims, but to collect in one place some issues that are operationally relevant. Other parts of the wiki are highly recommended in order to fully understand the issues involved. + このページではcassandraで大きなデータ集合を扱う場合に考慮すべき事項について幾つかの助言を述べます。 + ここでは相互に関連する話題を一箇所にまとめることを目的としており、独自の手法について解説するつもりはありません。 + この話題について理解を深めるには他のWikiページを参照することを強くお勧めします。 + このページは依然として作業中の状態です。もし情報的に古い内容を見つけたら、あなた自身で更新するか、cassandra-userメーリングリストに投稿してください。 - This is a work in progress. If you find information out of date (e.g., a JIRA ticket referenced has been resolved but this document has not been updated), please help by editing or e-mail:ing cassandra-user. + ここで触れる話題はCassandra特有のものばかりではありません。例えば、どんなストレージシステムにおいてもアクティブなデータセットの大きさに対するキャッシュのサイズとIOPSの間にはトレードオフの関係があります。なぜなら、IOPSはキャッシュミスの比率に強く影響されるからです。 - Note that not all of these issues are specific to Cassandra (for example, any storage system is subject to the trade-offs of cache sizes relative to active set size, and IOPS will always be strongly correlated with the percentage of requests that penetrate caching layers). + 特に触れない限り、ここではCassandra 0.7以降を前提とします。 - Unless otherwise noted, the points refer to Cassandra 0.7 and above. + * Cassandraのディスク使用率は時間経過に伴って急激に変動する場合があります。使用可能なディスク容量に比べて扱うデータ量が顕著に大きい場合、言い換えるとディスク容量に対してデータサイズが無視出来ないほど大きい場合は、次のようなことについて考慮したほうが良いでしょう。 - * Disk space usage in Cassandra can vary fairly suddenly over time. If you have significant amounts of data such that available disk space is not significantly higher than usage, consider: - * Compaction of a column family can up to double the disk space used by said column family (in the case of a major compaction and no deletions). If your data is predominantly made up of a single, or a select few, column families then doubling the disk space for a CF may be a significant amount compared to your total disk usage. - * Repair operations can increase disk space demands (particularly in 0.6, less so in 0.7; TODO: provide actual maximum growth and what it depends on). - * As your data set becomes larger and larger (assuming significantly larger than memory), you become more and more dependent on caching to elide I/O operations. As you plan and test your capacity, keep in mind that: - * The cassandra row cache is in the JVM heap and unaffected (remains warm) by compactions and repair operations. This is a plus, but the down-side is that the row cache is not very memory efficient compared to the operating system page cache. + * カラムファミリのCompactionには、最大そのカラムファミリと同じ大きさのディスク容量が必要になります。(削除がない場合にmajor compactionが実行された場合)あなたのデータが単一、もしくは限られた少数のカラムファミリに格納されている場合、CFと同じ大きさの領域をCompactionに使用すると言うことは、ディスク使用量の観点からは看過できないかもしれません。 + + * リペア操作にはある程度のディスク容量が必要です。(0.6では特に顕著です。0.7ではそれほどでもありません。TODO: 具体的な最大値、依存するパラメータを明示すること。) + + * データ量が多くなるにつれ、ディスクIO操作を避けるためにキャッシュへの依存が強まります。キャバシティに関するプランニングとテストの際には以下のことを考慮すべきです。 + * Cassandra の行キャッシュはJVMのヒープ上に存在し、compactionやrepairの影響を受けません。これは利点ですが、一方でメモリの有効利用という点では行キャッシュはOSのページキャッシュほど効率的でありません。 + * For 0.6.8 and below, the key cache is affected by compaction because it is per-sstable, and compaction moves data to new sstables. * Was fixed/improved as of [[https://issues.apache.org/jira/browse/CASSANDRA-1878|CASSANDRA-1878]], for 0.6.9 and 0.7.0. * The operating system's page cache is affected by compaction and repair operations. If you are relying on the page cache to keep the active set in memory, you may see significant degradation on performance as a result of compaction and repair operations. * Potential future improvements: [[https://issues.apache.org/jira/browse/CASSANDRA-1470|CASSANDRA-1470]], [[https://issues.apache.org/jira/browse/CASSANDRA-1882|CASSANDRA-1882]]. - * If you have column families with more than 143 million row keys in them, bloom filter false positive rates are likely to go up because of implementation concerns that limit the maximum size of a bloom filter. See [[ArchitectureInternals]] for information on how bloom filters are used. The negative effects of hitting this limit is that reads will start taking additional seeks to disk as the row count increases. Note that the effect you are seeing at any given moment will depend on when compaction was last run, because the bloom filter limit is per-sstable. It is an issue for column families because after a major compaction, the entire column family will be in a single sstable. + * If you have column families with more than 143 million row keys in them, bloom filter false positive rates are likely to go up because of implementation concerns that limit the maximum size of a bloom filter. See [[ArchitectureInternals]] for information on how bloom filters are used. The negative effects of hitting this limit is that reads will start taking additional seeks to disk as the row count increases. Note that the effect you are seeing at any given moment will depend on when compaction was last run, because the bloom filter limit is per-sstable. It is an issue for column families because after a major compaction, the entire column family will be in a single sstable. * This will likely be addressed in the future: See [[https://issues.apache.org/jira/browse/CASSANDRA-1608|CASSANDRA-1608]] and [[https://issues.apache.org/jira/browse/CASSANDRA-1555|CASSANDRA-1555]] - * Compaction is currently not concurrent, so only a single compaction runs at a time. This means that sstable counts may spike during larger compactions as several smaller sstables are written while a large compaction is happening. This can cause additional seeks on reads. + * Compaction is currently not concurrent, so only a single compaction runs at a time. This means that sstable counts may spike during larger compactions as several smaller sstables are written while a large compaction is happening. This can cause additional seeks on reads. * Potential future improvements: [[https://issues.apache.org/jira/browse/CASSANDRA-1876|CASSANDRA-1876]] and [[https://issues.apache.org/jira/browse/CASSANDRA-1881|CASSANDRA-1881]] - * Consider the choice of file system. Removal of large files is notoriously slow and seek bound on e.g. ext2/ext3. Consider xfs or ext4fs. This affects background unlink():ing of sstables that happens every now and then, and also affects start-up time (if there are sstables pending removal when a node is starting up, they are removed as part of the start-up proceess; it may thus be detrimental if removing a terrabyte of sstables takes an hour (numbers are ballparks, not accurately measured and depends on circumstances)). + * Consider the choice of file system. Removal of large files is notoriously slow and seek bound on e.g. ext2/ext3. Consider xfs or ext4fs. This affects background unlink():ing of sstables that happens every now and then, and also affects start-up time (if there are sstables pending removal when a node is starting up, they are removed as part of the start-up proceess; it may thus be detrimental if removing a terrabyte of sstables takes an hour (numbers are ballparks, not accurately measured and depends on circumstances)). - * Adding nodes is a slow process if each node is responsible for a large amount of data. Plan for this; do not try to throw additional hardware at a cluster at the last minute. + * Adding nodes is a slow process if each node is responsible for a large amount of data. Plan for this; do not try to throw additional hardware at a cluster at the last minute. - * Cassandra will read through sstable index files on start-up, doing what is known as "index sampling". This is used to keep a subset (currently and by default, 1 out of 100) of keys and and their on-disk location in the index, in memory. See [[ArchitectureInternals]]. This means that the larger the index files are, the longer it takes to perform this sampling. Thus, for very large indexes (typically when you have a very large number of keys) the index sampling on start-up may be a significant issue. + * Cassandra will read through sstable index files on start-up, doing what is known as "index sampling". This is used to keep a subset (currently and by default, 1 out of 100) of keys and and their on-disk location in the index, in memory. See [[ArchitectureInternals]]. This means that the larger the index files are, the longer it takes to perform this sampling. Thus, for very large indexes (typically when you have a very large number of keys) the index sampling on start-up may be a significant issue. - * A negative side-effect of a large row-cache is start-up time. The periodic saving of the row cache information only saves the keys that are cached; the data has to be pre-fetched on start-up. On a large data set, this is probably going to be seek-bound and the time it takes to warm up the row cache will be linear with respect to the row cache size (assuming sufficiently large amounts of data that the seek bound I/O is not subject to optimization by disks). + * A negative side-effect of a large row-cache is start-up time. The periodic saving of the row cache information only saves the keys that are cached; the data has to be pre-fetched on start-up. On a large data set, this is probably going to be seek-bound and the time it takes to warm up the row cache will be linear with respect to the row cache size (assuming sufficiently large amounts of data that the seek bound I/O is not subject to optimization by disks). * Potential future improvement: [[https://issues.apache.org/jira/browse/CASSANDRA-1625|CASSANDRA-1625]].