Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "LargeDataSetConsiderations_JP" page has been changed by MakiWatanabe. http://wiki.apache.org/cassandra/LargeDataSetConsiderations_JP?action=diff&rev1=20&rev2=21 -------------------------------------------------- * リペア操作にはある程度のディスク容量が必要です。(0.6では特に顕著です。0.7ではそれほどでもありません。TODO: 具体的な最大値、依存するパラメータを明示すること。) * データ量が多くなるにつれ、ディスクIO操作を避けるためにキャッシュへの依存が強まります。キャバシティに関するプランニングとテストの際には以下のことを考慮すべきです。 - * Cassandra の行キャッシュはJVMのヒープ上に存在し、compactionやrepairの影響を受けません。これは利点ですが、一方でメモリの有効利用という点では行キャッシュはOSのページキャッシュほど効率的でありません。 + * Cassandra の行キャッシュはJVMのヒープ上に存在し、compactionやrepairの影響を受けません。 + * For 0.6.8 and below, the key cache is affected by compaction because it is per-sstable, and compaction moves data to new sstables. + 0.6.8以前はキーキャッシュはcompactionによって影響を受けます。それらのバージョンではキーキャッシュはSSTABLE単位で管理されているため、compactionによってデータが新しいSSTABLEにコピーされると古いキャッシュが無効になります。 + * Was fixed/improved as of + この動作は0.6.9 及び0.7.0で改善されています。[[https://issues.apache.org/jira/browse/CASSANDRA-1878|CASSANDRA-1878]] - * For 0.6.8 and below, the key cache is affected by compaction because it is per-sstable, and compaction moves data to new sstables. - * Was fixed/improved as of [[https://issues.apache.org/jira/browse/CASSANDRA-1878|CASSANDRA-1878]], for 0.6.9 and 0.7.0. - * The operating system's page cache is affected by compaction and repair operations. If you are relying on the page cache to keep the active set in memory, you may see significant degradation on performance as a result of compaction and repair operations. + + * OSのページキャッシュはcompaction及びrepair操作の影響を受けます。アクティブなデータをメモリ上に保つ手段としてページキャッシュに依存している場合、compaction及びrepair操作に連動して顕著な性能劣化が起きるでしょう。 + + - * Potential future improvements: [[https://issues.apache.org/jira/browse/CASSANDRA-1470|CASSANDRA-1470]], [[https://issues.apache.org/jira/browse/CASSANDRA-1882|CASSANDRA-1882]]. + * 将来的な改善方法については以下のリンクで議論されています:[[https://issues.apache.org/jira/browse/CASSANDRA-1470|CASSANDRA-1470]], [[https://issues.apache.org/jira/browse/CASSANDRA-1882|CASSANDRA-1882]] - * If you have column families with more than 143 million row keys in them, bloom filter false positive rates are likely to go up because of implementation concerns that limit the maximum size of a bloom filter. See [[ArchitectureInternals]] for information on how bloom filters are used. The negative effects of hitting this limit is that reads will start taking additional seeks to disk as the row count increases. Note that the effect you are seeing at any given moment will depend on when compaction was last run, because the bloom filter limit is per-sstable. It is an issue for column families because after a major compaction, the entire column family will be in a single sstable. + + * bloom filterの最大サイズの実装上の制限により、14300万以上の行キーを格納しているカラムファミリでは、bloom filterの偽陽性率が増加することが予想されます。bloom filterがどのように使用されているかについては[[ArchitectureInternals]]を参照してください。この制限に抵触した場合、行数の増加に従ってreadごとに追加のseekが発生するようになります。bloom filterの制限はsstable単位であるため、上記の影響は最後にcompactionが実行された時間に依存することに注意してください。major compactionの後ではカラムファミリの全データが単一のsstableに格納されるため、これはカラムファミリ単位の問題です。 - * This will likely be addressed in the future: See [[https://issues.apache.org/jira/browse/CASSANDRA-1608|CASSANDRA-1608]] and [[https://issues.apache.org/jira/browse/CASSANDRA-1555|CASSANDRA-1555]] + * この問題については以下のリンクで議論されています。[[https://issues.apache.org/jira/browse/CASSANDRA-1608|CASSANDRA-1608]], [[https://issues.apache.org/jira/browse/CASSANDRA-1555|CASSANDRA-1555]] - * Compaction is currently not concurrent, so only a single compaction runs at a time. This means that sstable counts may spike during larger compactions as several smaller sstables are written while a large compaction is happening. This can cause additional seeks on reads. + + * Compactionは現在は並列化されていません。即ちある瞬間に実行されるCompactionは一つのみです。これは大きなcompactionsの実行中にsstable数が増大することを意味します。大きなcompactionsの実行中は複数の小さなsstableに書き込む必要があるためです。この状態ではreadに追加のseekが必要です。 + - * Potential future improvements: [[https://issues.apache.org/jira/browse/CASSANDRA-1876|CASSANDRA-1876]] and [[https://issues.apache.org/jira/browse/CASSANDRA-1881|CASSANDRA-1881]] + * 将来的な改善方法については以下のリンクで議論されています:[[https://issues.apache.org/jira/browse/CASSANDRA-1876|CASSANDRA-1876]], [[https://issues.apache.org/jira/browse/CASSANDRA-1881|CASSANDRA-1881]] - * Consider the choice of file system. Removal of large files is notoriously slow and seek bound on e.g. ext2/ext3. Consider xfs or ext4fs. This affects background unlink():ing of sstables that happens every now and then, and also affects start-up time (if there are sstables pending removal when a node is starting up, they are removed as part of the start-up proceess; it may thus be detrimental if removing a terrabyte of sstables takes an hour (numbers are ballparks, not accurately measured and depends on circumstances)). - * Adding nodes is a slow process if each node is responsible for a large amount of data. Plan for this; do not try to throw additional hardware at a cluster at the last minute. + + * ファイルシステムの選択について:巨大なファイルの削除は、例えばext2/ext3では恐ろしく遅く、多量のseekを要します。xfsまたはext4fsの使用を検討してください。これは恒常的に生じるsstableのバックグラウンドでのunlinkに影響します。また起動速度にも影響します。 + (起動時に削除待機中のsstableがある場合、起動プロセスの一環としてそれらを削除します。従って数TBのsstableを削除するような場合は有害になるでしょう。) + + * 各ノードが多量のデータを格納している場合、ノードの追加には時間がかかります。 + + Plan for this; do not try to throw additional hardware at a cluster at the last minute. + システムがぎりぎりまで逼迫してからノードを追加するのは避けたほうがいいでしょう。 + - * Cassandra will read through sstable index files on start-up, doing what is known as "index sampling". This is used to keep a subset (currently and by default, 1 out of 100) of keys and and their on-disk location in the index, in memory. See [[ArchitectureInternals]]. This means that the larger the index files are, the longer it takes to perform this sampling. Thus, for very large indexes (typically when you have a very large number of keys) the index sampling on start-up may be a significant issue. + * Cassandra will read through sstable index files on start-up, doing what is known as "index sampling". This is used to keep a subset (currently and by default, 1 out of 100) of keys and and their on-disk location in the index, in memory. See [[ArchitectureInternals]]. This means that the larger the index files are, the longer it takes to perform this sampling. Thus, for very large indexes (typically when you have a very large number of keys) the index sampling on start-up may be a significant issue. - * A negative side-effect of a large row-cache is start-up time. The periodic saving of the row cache information only saves the keys that are cached; the data has to be pre-fetched on start-up. On a large data set, this is probably going to be seek-bound and the time it takes to warm up the row cache will be linear with respect to the row cache size (assuming sufficiently large amounts of data that the seek bound I/O is not subject to optimization by disks). + * A negative side-effect of a large row-cache is start-up time. The periodic saving of the row cache information only saves the keys that are cached; the data has to be pre-fetched on start-up. On a large data set, this is probably going to be seek-bound and the time it takes to warm up the row cache will be linear with respect to the row cache size (assuming sufficiently large amounts of data that the seek bound I/O is not subject to optimization by disks). * Potential future improvement: [[https://issues.apache.org/jira/browse/CASSANDRA-1625|CASSANDRA-1625]].