Controlling the MAX SIZE of sstables after compaction
Hi *Setup* *3 Node Cluster* Api- * Hector*CL- * QUORUM* RF- *3* Compaction Strategy- *Size Tiered Compaction* *Use Case* I have about *320 million rows*(~12 to 15 columns each) worth of data stored in Cassandra. In order to generate a report containing ALL that data, I do the following: 1. Run Compaction 2. Take a snapshot of the db 3. Run sstable2json on all the *Data.db files 4. Read those jsons and write to a csv. *Problem*: The *sstable2json* utility takes about 350-400 hours (~85% of the total time) thereby lengthening the process. (I am running sstable2json sequentially on all the *Data.db files but the size of those is inconsistent so making it run concurrently doesn't help either E.G one file is of size 25 GB while another of 500 MB) *My Thought Process:* Is there a way to put a cap on the maximum size of the sstables that are generated after compaction such that i have multiple sstables of uniform size. Then I can run sstable2json utility on the same concurrently *Questions:* 1. Is there a way to configure the size of sstables created after compaction? 2. Is there a better approach to generate the report? 3. What are the flaws with this approach? Best Parth
SStables can't compat automaticly
Hi everybody: I have 18 nodes using cassandra2.1.2.Every node has 4 core, 32 GB RAM, 2T hard disk,OS is CentOS release 6.2 (Final). I have follow the to config my system.such as disable SWAP,unlimited mem lock... My heap size is: MAX_HEAP_SIZE="8G" MIN_HEAP_SIZE="8G" HEAP_NEWSIZE="2G" I use STCS,other config using default,using Datastax Java Driver 2.1.2. BatchStatment 100key commit per time. When I run my cluster and insert data from kafka (1 keys/s) after 2 days,every node can't compact some there too many sstables. I try to use major compact to compact the sstables , it cost a long long time .Also the new sstables can't compat automatic. I trace the log , the CMS GC too often,almost 30 minute onetime. Could someone help me to solve this problem. -- 曹志富 手机:18611121927 邮箱:caozf.zh...@gmail.com 微博:http://weibo.com/boliza/
Re: Which Topology fits best ?
NetworkTopogolyStrategy gives you a better horizon and more flexibility as you scale out, at least once you've gone past small cluster problems like wanting RF=3 in a 4 node two dc cluster. IMO I'd go with "DC:1,DC2:1". ~mck
Re: Which Topology fits best ?
As far as I know they're effectively the same. NetworkTopologyStrategy is useful when you want to set up separate RF per DC, such as if you want to have an analytics DC with lower RF to save money. On Sun, Jan 25, 2015 at 8:01 AM, SEGALIS Morgan wrote: > Hi everyone, > I need one more time your precious advice. > > I would like to create a 2 nodes cluster, each node are on a different > DataCenter, but with the same provider, ping between the 2 servers is fast: > ~0,5 ms, and the bandwidth is great: ~ 1GB/s > > is, org.apache.cassandra.locator.SimpleStrategy with replication factor > set to 2 is a good practice ? > > Or should I org.apache.cassandra.locator.NetworkTopologyStrategy with > DC1:1 and DC2:1 > (if this is the correct way to use NetworkTopologyStrategy, not sure at > 100%) > > > Thank you for your time. >
Which Topology fits best ?
Hi everyone, I need one more time your precious advice. I would like to create a 2 nodes cluster, each node are on a different DataCenter, but with the same provider, ping between the 2 servers is fast: ~0,5 ms, and the bandwidth is great: ~ 1GB/s is, org.apache.cassandra.locator.SimpleStrategy with replication factor set to 2 is a good practice ? Or should I org.apache.cassandra.locator.NetworkTopologyStrategy with DC1:1 and DC2:1 (if this is the correct way to use NetworkTopologyStrategy, not sure at 100%) Thank you for your time.