Re: uneven data movement in one of the disk in Cassandra
The version here really matters. If it’s higher than 3.2, it’s probably related to this issue which places sstables for a given range in the same directory to avoid data loss on single drive failure: https://issues.apache.org/jira/browse/CASSANDRA-6696 -- Jeff Jirsa > On Mar 9, 2018, at 9:38 PM, Madhu B wrote: > > Yes it will helps,thanks James for correcting me > >> On Mar 9, 2018, at 9:52 PM, James Shaw wrote: >> >> per my testing, repair not help. >> repair build Merkle tree to compare data, it only write to a new file while >> have difference, very very small file at the end (of course, means most >> data are synced) >> >>> On Fri, Mar 9, 2018 at 10:31 PM, Madhu B wrote: >>> Yasir, >>> I think you need to run full repair in off-peak hours >>> >>> Thanks, >>> Madhu >>> >>> >>>> On Mar 9, 2018, at 7:20 AM, Kenneth Brotman >>>> wrote: >>>> >>>> Yasir, >>>> >>>> >>>> >>>> How many nodes are in the cluster? >>>> >>>> What is num_tokens set to in the Cassandra.yaml file? >>>> >>>> Is it just this one node doing this? >>>> >>>> What replication factor do you use that affects the ranges on that disk? >>>> >>>> >>>> >>>> Kenneth Brotman >>>> >>>> >>>> >>>> From: Kyrylo Lebediev [mailto:kyrylo_lebed...@epam.com] >>>> Sent: Friday, March 09, 2018 4:14 AM >>>> To: user@cassandra.apache.org >>>> Subject: Re: uneven data movement in one of the disk in Cassandra >>>> >>>> >>>> >>>> Not sure where I heard this, but AFAIK data imbalance when multiple >>>> data_directories are in use is a known issue for older versions of >>>> Cassandra. This might be the root-cause of your issue. >>>> >>>> Which version of C* are you using? >>>> >>>> Unfortunately, don't remember in which version this imbalance issue was >>>> fixed. >>>> >>>> >>>> >>>> -- Kyrill >>>> >>>> From: Yasir Saleem >>>> Sent: Friday, March 9, 2018 1:34:08 PM >>>> To: user@cassandra.apache.org >>>> Subject: Re: uneven data movement in one of the disk in Cassandra >>>> >>>> >>>> >>>> Hi Alex, >>>> >>>> >>>> >>>> no active compaction, right now. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Mar 9, 2018 at 3:47 PM, Oleksandr Shulgin >>>> wrote: >>>> >>>> On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem >>>> wrote: >>>> >>>> Thanks, Nicolas Guyomar >>>> >>>> >>>> >>>> I am new to cassandra, here is the properties which I can see in yaml file: >>>> >>>> >>>> >>>> # of compaction, including validation compaction. >>>> >>>> compaction_throughput_mb_per_sec: 16 >>>> >>>> compaction_large_partition_warning_threshold_mb: 100 >>>> >>>> >>>> >>>> To check currently active compaction please use this command: >>>> >>>> >>>> >>>> nodetool compactionstats -H >>>> >>>> >>>> >>>> on the host which shows the problem. >>>> >>>> >>>> >>>> -- >>>> >>>> Alex >>>> >>>> >>>> >>>> >>>> >>
Re: uneven data movement in one of the disk in Cassandra
Yes it will helps,thanks James for correcting me > On Mar 9, 2018, at 9:52 PM, James Shaw wrote: > > per my testing, repair not help. > repair build Merkle tree to compare data, it only write to a new file while > have difference, very very small file at the end (of course, means most data > are synced) > >> On Fri, Mar 9, 2018 at 10:31 PM, Madhu B wrote: >> Yasir, >> I think you need to run full repair in off-peak hours >> >> Thanks, >> Madhu >> >> >>> On Mar 9, 2018, at 7:20 AM, Kenneth Brotman >>> wrote: >>> >>> Yasir, >>> >>> >>> >>> How many nodes are in the cluster? >>> >>> What is num_tokens set to in the Cassandra.yaml file? >>> >>> Is it just this one node doing this? >>> >>> What replication factor do you use that affects the ranges on that disk? >>> >>> >>> >>> Kenneth Brotman >>> >>> >>> >>> From: Kyrylo Lebediev [mailto:kyrylo_lebed...@epam.com] >>> Sent: Friday, March 09, 2018 4:14 AM >>> To: user@cassandra.apache.org >>> Subject: Re: uneven data movement in one of the disk in Cassandra >>> >>> >>> >>> Not sure where I heard this, but AFAIK data imbalance when multiple >>> data_directories are in use is a known issue for older versions of >>> Cassandra. This might be the root-cause of your issue. >>> >>> Which version of C* are you using? >>> >>> Unfortunately, don't remember in which version this imbalance issue was >>> fixed. >>> >>> >>> >>> -- Kyrill >>> >>> From: Yasir Saleem >>> Sent: Friday, March 9, 2018 1:34:08 PM >>> To: user@cassandra.apache.org >>> Subject: Re: uneven data movement in one of the disk in Cassandra >>> >>> >>> >>> Hi Alex, >>> >>> >>> >>> no active compaction, right now. >>> >>> >>> >>> >>> >>> >>> >>> On Fri, Mar 9, 2018 at 3:47 PM, Oleksandr Shulgin >>> wrote: >>> >>> On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem >>> wrote: >>> >>> Thanks, Nicolas Guyomar >>> >>> >>> >>> I am new to cassandra, here is the properties which I can see in yaml file: >>> >>> >>> >>> # of compaction, including validation compaction. >>> >>> compaction_throughput_mb_per_sec: 16 >>> >>> compaction_large_partition_warning_threshold_mb: 100 >>> >>> >>> >>> To check currently active compaction please use this command: >>> >>> >>> >>> nodetool compactionstats -H >>> >>> >>> >>> on the host which shows the problem. >>> >>> >>> >>> -- >>> >>> Alex >>> >>> >>> >>> >>> >
Re: uneven data movement in one of the disk in Cassandra
per my testing, repair not help. repair build Merkle tree to compare data, it only write to a new file while have difference, very very small file at the end (of course, means most data are synced) On Fri, Mar 9, 2018 at 10:31 PM, Madhu B wrote: > Yasir, > I think you need to run full repair in off-peak hours > > Thanks, > Madhu > > > On Mar 9, 2018, at 7:20 AM, Kenneth Brotman > wrote: > > Yasir, > > > > How many nodes are in the cluster? > > What is num_tokens set to in the Cassandra.yaml file? > > Is it just this one node doing this? > > What replication factor do you use that affects the ranges on that disk? > > > > Kenneth Brotman > > > > *From:* Kyrylo Lebediev [mailto:kyrylo_lebed...@epam.com > ] > *Sent:* Friday, March 09, 2018 4:14 AM > *To:* user@cassandra.apache.org > *Subject:* Re: uneven data movement in one of the disk in Cassandra > > > > Not sure where I heard this, but AFAIK data imbalance when multiple > data_directories are in use is a known issue for older versions of > Cassandra. This might be the root-cause of your issue. > > Which version of C* are you using? > > Unfortunately, don't remember in which version this imbalance issue was > fixed. > > > > -- Kyrill > -------------- > > *From:* Yasir Saleem > *Sent:* Friday, March 9, 2018 1:34:08 PM > *To:* user@cassandra.apache.org > *Subject:* Re: uneven data movement in one of the disk in Cassandra > > > > Hi Alex, > > > > no active compaction, right now. > > > > > > > > On Fri, Mar 9, 2018 at 3:47 PM, Oleksandr Shulgin < > oleksandr.shul...@zalando.de> wrote: > > On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem > wrote: > > Thanks, Nicolas Guyomar > > > > I am new to cassandra, here is the properties which I can see in yaml > file: > > > > # of compaction, including validation compaction. > > compaction_throughput_mb_per_sec: 16 > > compaction_large_partition_warning_threshold_mb: 100 > > > > To check currently active compaction please use this command: > > > > nodetool compactionstats -H > > > > on the host which shows the problem. > > > > -- > > Alex > > > > > >
Re: uneven data movement in one of the disk in Cassandra
Ours have similar issue and I am working to solve it this weekend. Our case is because STCS make one huge table's sstable file bigger and bigger after compaction (this is STCS compaction nature, nothing wrong), even all most all data TTL 30days, but tombstones not evicted since largest file is waiting for other 3 files for compaction. The largest file 99.99% are tombstones. use command: nodetool upgradesstables -a keyspace table it will re-write all existed sstables and evit tombstones. in you case, first do a few checking: 1. cd /data/disk03/cassandra/data_prod/data du -ks * | sort -n find which tables use most space 2. check the snapshot for above bigger tables it's possible too old snapshots caused. 3. cd table directory sstablemetadata sstablefile to look the tables, whether a lot tombstones droppable 4. ls -lhS /data/disk */ cassandra/data_prod/data/"that keyspace"/"that_table"*/*Data.db look all sstables files, you will see what's next compaction. Per my watch, when small size compaction, seems randomly to which disks, but when size large, it goes to disks which has more free space. 5. if the biggest file too big, will wait long time for next compaction. You may test ( sorry, not in my case, so I am not 100% sure) 1) if new cassandra 3.0, you may try nodetool compact -s ( it will split ) 2) if old cassandra version, stop cassandra, use sstbalesplit Hope it helps Thanks, James On Fri, Mar 9, 2018 at 7:14 AM, Kyrylo Lebediev wrote: > Not sure where I heard this, but AFAIK data imbalance when multiple > data_directories are in use is a known issue for older versions of > Cassandra. This might be the root-cause of your issue. > > Which version of C* are you using? > > Unfortunately, don't remember in which version this imbalance issue was > fixed. > > > -- Kyrill > -- > *From:* Yasir Saleem > *Sent:* Friday, March 9, 2018 1:34:08 PM > *To:* user@cassandra.apache.org > *Subject:* Re: uneven data movement in one of the disk in Cassandra > > Hi Alex, > > no active compaction, right now. > > > > > On Fri, Mar 9, 2018 at 3:47 PM, Oleksandr Shulgin < > oleksandr.shul...@zalando.de> wrote: > > On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem > wrote: > > Thanks, Nicolas Guyomar > > I am new to cassandra, here is the properties which I can see in yaml > file: > > # of compaction, including validation compaction. > compaction_throughput_mb_per_sec: 16 > compaction_large_partition_warning_threshold_mb: 100 > > > To check currently active compaction please use this command: > > nodetool compactionstats -H > > on the host which shows the problem. > > -- > Alex > > >
Re: uneven data movement in one of the disk in Cassandra
Yasir, I think you need to run full repair in off-peak hours Thanks, Madhu > On Mar 9, 2018, at 7:20 AM, Kenneth Brotman > wrote: > > Yasir, > > How many nodes are in the cluster? > What is num_tokens set to in the Cassandra.yaml file? > Is it just this one node doing this? > What replication factor do you use that affects the ranges on that disk? > > Kenneth Brotman > > From: Kyrylo Lebediev [mailto:kyrylo_lebed...@epam.com] > Sent: Friday, March 09, 2018 4:14 AM > To: user@cassandra.apache.org > Subject: Re: uneven data movement in one of the disk in Cassandra > > Not sure where I heard this, but AFAIK data imbalance when multiple > data_directories are in use is a known issue for older versions of Cassandra. > This might be the root-cause of your issue. > Which version of C* are you using? > Unfortunately, don't remember in which version this imbalance issue was fixed. > > -- Kyrill > From: Yasir Saleem > Sent: Friday, March 9, 2018 1:34:08 PM > To: user@cassandra.apache.org > Subject: Re: uneven data movement in one of the disk in Cassandra > > Hi Alex, > > no active compaction, right now. > > > > > On Fri, Mar 9, 2018 at 3:47 PM, Oleksandr Shulgin > wrote: > On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem > wrote: > Thanks, Nicolas Guyomar > > I am new to cassandra, here is the properties which I can see in yaml file: > > # of compaction, including validation compaction. > compaction_throughput_mb_per_sec: 16 > compaction_large_partition_warning_threshold_mb: 100 > > To check currently active compaction please use this command: > > nodetool compactionstats -H > > on the host which shows the problem. > > -- > Alex > >
RE: uneven data movement in one of the disk in Cassandra
Yasir, How many nodes are in the cluster? What is num_tokens set to in the Cassandra.yaml file? Is it just this one node doing this? What replication factor do you use that affects the ranges on that disk? Kenneth Brotman From: Kyrylo Lebediev [mailto:kyrylo_lebed...@epam.com] Sent: Friday, March 09, 2018 4:14 AM To: user@cassandra.apache.org Subject: Re: uneven data movement in one of the disk in Cassandra Not sure where I heard this, but AFAIK data imbalance when multiple data_directories are in use is a known issue for older versions of Cassandra. This might be the root-cause of your issue. Which version of C* are you using? Unfortunately, don't remember in which version this imbalance issue was fixed. -- Kyrill _ From: Yasir Saleem Sent: Friday, March 9, 2018 1:34:08 PM To: user@cassandra.apache.org Subject: Re: uneven data movement in one of the disk in Cassandra Hi Alex, no active compaction, right now. On Fri, Mar 9, 2018 at 3:47 PM, Oleksandr Shulgin wrote: On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem wrote: Thanks, Nicolas Guyomar I am new to cassandra, here is the properties which I can see in yaml file: # of compaction, including validation compaction. compaction_throughput_mb_per_sec: 16 compaction_large_partition_warning_threshold_mb: 100 To check currently active compaction please use this command: nodetool compactionstats -H on the host which shows the problem. -- Alex
Re: uneven data movement in one of the disk in Cassandra
Not sure where I heard this, but AFAIK data imbalance when multiple data_directories are in use is a known issue for older versions of Cassandra. This might be the root-cause of your issue. Which version of C* are you using? Unfortunately, don't remember in which version this imbalance issue was fixed. -- Kyrill From: Yasir Saleem Sent: Friday, March 9, 2018 1:34:08 PM To: user@cassandra.apache.org Subject: Re: uneven data movement in one of the disk in Cassandra Hi Alex, no active compaction, right now. [cid:ii_jejv51ck1_1620a89ebd6c7e92] On Fri, Mar 9, 2018 at 3:47 PM, Oleksandr Shulgin mailto:oleksandr.shul...@zalando.de>> wrote: On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem mailto:yasirsaleem9...@gmail.com>> wrote: Thanks, Nicolas Guyomar I am new to cassandra, here is the properties which I can see in yaml file: # of compaction, including validation compaction. compaction_throughput_mb_per_sec: 16 compaction_large_partition_warning_threshold_mb: 100 To check currently active compaction please use this command: nodetool compactionstats -H on the host which shows the problem. -- Alex
Re: uneven data movement in one of the disk in Cassandra
Hi Alex, no active compaction, right now. On Fri, Mar 9, 2018 at 3:47 PM, Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem > wrote: > >> Thanks, Nicolas Guyomar >> >> I am new to cassandra, here is the properties which I can see in yaml >> file: >> >> # of compaction, including validation compaction. >> compaction_throughput_mb_per_sec: 16 >> compaction_large_partition_warning_threshold_mb: 100 >> > > To check currently active compaction please use this command: > > nodetool compactionstats -H > > on the host which shows the problem. > > -- > Alex > >
Re: uneven data movement in one of the disk in Cassandra
On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem wrote: > Thanks, Nicolas Guyomar > > I am new to cassandra, here is the properties which I can see in yaml file: > > # of compaction, including validation compaction. > compaction_throughput_mb_per_sec: 16 > compaction_large_partition_warning_threshold_mb: 100 > To check currently active compaction please use this command: nodetool compactionstats -H on the host which shows the problem. -- Alex
Re: uneven data movement in one of the disk in Cassandra
Thanks, Nicolas Guyomar I am new to cassandra, here is the properties which I can see in yaml file: # of compaction, including validation compaction. compaction_throughput_mb_per_sec: 16 compaction_large_partition_warning_threshold_mb: 100 On Fri, Mar 9, 2018 at 3:33 PM, Nicolas Guyomar wrote: > Hi, > > This might be a compaction which is running, have you check that ? > > On 9 March 2018 at 11:29, Yasir Saleem wrote: > >> Hi Team, >> >> we are facing issue of uneven data movement in cassandra disk for >> specific which disk03 in our case, however all the disk are consuming >> around 60% of space but disk03 is taking 87% space. Here is configuration >> in yaml and current disk space: >> >> data_file_directories: >> - /data/disk01/cassandra/data_prod/data >> - /data/disk02/cassandra/data_prod/data >> - /data/disk03/cassandra/data_prod/data >> - /data/disk04/cassandra/data_prod/data >> - /data/disk05/cassandra/data_prod/data >> >> disk space: >> >> 734G 417G 280G 60% /data/disk02 >> 734G 342G 355G 50% /data/disk05 >> 734G 383G 314G 55% /data/disk04 >> *734G 599G 98G 87% /data/disk03* >> 734G 499G 198G 60% /data/disk01 >> >> Please note that we have tried to delete data several times but still >> space is continuously increasing in disk03. Please let me know if there is >> any workaround to resolve this issue. >> >> Regards, >> >> Yasir. >> >> >
Re: uneven data movement in one of the disk in Cassandra
Hi, This might be a compaction which is running, have you check that ? On 9 March 2018 at 11:29, Yasir Saleem wrote: > Hi Team, > > we are facing issue of uneven data movement in cassandra disk for > specific which disk03 in our case, however all the disk are consuming > around 60% of space but disk03 is taking 87% space. Here is configuration > in yaml and current disk space: > > data_file_directories: > - /data/disk01/cassandra/data_prod/data > - /data/disk02/cassandra/data_prod/data > - /data/disk03/cassandra/data_prod/data > - /data/disk04/cassandra/data_prod/data > - /data/disk05/cassandra/data_prod/data > > disk space: > > 734G 417G 280G 60% /data/disk02 > 734G 342G 355G 50% /data/disk05 > 734G 383G 314G 55% /data/disk04 > *734G 599G 98G 87% /data/disk03* > 734G 499G 198G 60% /data/disk01 > > Please note that we have tried to delete data several times but still > space is continuously increasing in disk03. Please let me know if there is > any workaround to resolve this issue. > > Regards, > > Yasir. > >