Re: uneven data movement in one of the disk in Cassandra

2018-03-09 Thread Jeff Jirsa
The version here really matters. If it’s higher than 3.2, it’s probably related 
to this issue which places sstables for a given range in the same directory to 
avoid data loss on single drive failure:

https://issues.apache.org/jira/browse/CASSANDRA-6696



-- 
Jeff Jirsa


> On Mar 9, 2018, at 9:38 PM, Madhu B  wrote:
> 
> Yes it will helps,thanks James for correcting me
> 
>> On Mar 9, 2018, at 9:52 PM, James Shaw  wrote:
>> 
>> per my testing, repair not help.
>> repair build Merkle tree to compare data, it only write to a new file while 
>> have difference, very very small file at the end  (of course, means most 
>> data are synced)
>> 
>>> On Fri, Mar 9, 2018 at 10:31 PM, Madhu B  wrote:
>>> Yasir,
>>> I think you need to run full repair in off-peak hours
>>> 
>>> Thanks,
>>> Madhu
>>> 
>>> 
>>>> On Mar 9, 2018, at 7:20 AM, Kenneth Brotman  
>>>> wrote:
>>>> 
>>>> Yasir,
>>>> 
>>>>  
>>>> 
>>>> How many nodes are in the cluster? 
>>>> 
>>>> What is num_tokens set to in the Cassandra.yaml file? 
>>>> 
>>>> Is it just this one node doing this? 
>>>> 
>>>> What replication factor do you use that affects the ranges on that disk?
>>>> 
>>>>  
>>>> 
>>>> Kenneth Brotman
>>>> 
>>>>  
>>>> 
>>>> From: Kyrylo Lebediev [mailto:kyrylo_lebed...@epam.com] 
>>>> Sent: Friday, March 09, 2018 4:14 AM
>>>> To: user@cassandra.apache.org
>>>> Subject: Re: uneven data movement in one of the disk in Cassandra
>>>> 
>>>>  
>>>> 
>>>> Not sure where I heard this, but AFAIK data imbalance when multiple 
>>>> data_directories are in use is a known issue for older versions of 
>>>> Cassandra. This might be the root-cause of your issue.
>>>> 
>>>> Which version of C* are you using?
>>>> 
>>>> Unfortunately, don't remember in which version this imbalance issue was 
>>>> fixed.
>>>> 
>>>>  
>>>> 
>>>> -- Kyrill
>>>> 
>>>> From: Yasir Saleem 
>>>> Sent: Friday, March 9, 2018 1:34:08 PM
>>>> To: user@cassandra.apache.org
>>>> Subject: Re: uneven data movement in one of the disk in Cassandra
>>>> 
>>>>  
>>>> 
>>>> Hi Alex,
>>>> 
>>>>  
>>>> 
>>>> no active compaction, right now.
>>>> 
>>>>  
>>>> 
>>>> 
>>>> 
>>>>  
>>>> 
>>>> On Fri, Mar 9, 2018 at 3:47 PM, Oleksandr Shulgin 
>>>>  wrote:
>>>> 
>>>> On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem  
>>>> wrote:
>>>> 
>>>> Thanks, Nicolas Guyomar
>>>> 
>>>>  
>>>> 
>>>> I am new to cassandra, here is the properties which I can see in yaml file:
>>>> 
>>>>  
>>>> 
>>>> # of compaction, including validation compaction.
>>>> 
>>>> compaction_throughput_mb_per_sec: 16
>>>> 
>>>> compaction_large_partition_warning_threshold_mb: 100
>>>> 
>>>>  
>>>> 
>>>> To check currently active compaction please use this command:
>>>> 
>>>>  
>>>> 
>>>> nodetool compactionstats -H
>>>> 
>>>>  
>>>> 
>>>> on the host which shows the problem.
>>>> 
>>>>  
>>>> 
>>>> --
>>>> 
>>>> Alex
>>>> 
>>>>  
>>>> 
>>>>  
>>>> 
>> 


Re: uneven data movement in one of the disk in Cassandra

2018-03-09 Thread Madhu B
Yes it will helps,thanks James for correcting me

> On Mar 9, 2018, at 9:52 PM, James Shaw  wrote:
> 
> per my testing, repair not help.
> repair build Merkle tree to compare data, it only write to a new file while 
> have difference, very very small file at the end  (of course, means most data 
> are synced)
> 
>> On Fri, Mar 9, 2018 at 10:31 PM, Madhu B  wrote:
>> Yasir,
>> I think you need to run full repair in off-peak hours
>> 
>> Thanks,
>> Madhu
>> 
>> 
>>> On Mar 9, 2018, at 7:20 AM, Kenneth Brotman  
>>> wrote:
>>> 
>>> Yasir,
>>> 
>>>  
>>> 
>>> How many nodes are in the cluster? 
>>> 
>>> What is num_tokens set to in the Cassandra.yaml file? 
>>> 
>>> Is it just this one node doing this? 
>>> 
>>> What replication factor do you use that affects the ranges on that disk?
>>> 
>>>  
>>> 
>>> Kenneth Brotman
>>> 
>>>  
>>> 
>>> From: Kyrylo Lebediev [mailto:kyrylo_lebed...@epam.com] 
>>> Sent: Friday, March 09, 2018 4:14 AM
>>> To: user@cassandra.apache.org
>>> Subject: Re: uneven data movement in one of the disk in Cassandra
>>> 
>>>  
>>> 
>>> Not sure where I heard this, but AFAIK data imbalance when multiple 
>>> data_directories are in use is a known issue for older versions of 
>>> Cassandra. This might be the root-cause of your issue.
>>> 
>>> Which version of C* are you using?
>>> 
>>> Unfortunately, don't remember in which version this imbalance issue was 
>>> fixed.
>>> 
>>>  
>>> 
>>> -- Kyrill
>>> 
>>> From: Yasir Saleem 
>>> Sent: Friday, March 9, 2018 1:34:08 PM
>>> To: user@cassandra.apache.org
>>> Subject: Re: uneven data movement in one of the disk in Cassandra
>>> 
>>>  
>>> 
>>> Hi Alex,
>>> 
>>>  
>>> 
>>> no active compaction, right now.
>>> 
>>>  
>>> 
>>> 
>>> 
>>>  
>>> 
>>> On Fri, Mar 9, 2018 at 3:47 PM, Oleksandr Shulgin 
>>>  wrote:
>>> 
>>> On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem  
>>> wrote:
>>> 
>>> Thanks, Nicolas Guyomar
>>> 
>>>  
>>> 
>>> I am new to cassandra, here is the properties which I can see in yaml file:
>>> 
>>>  
>>> 
>>> # of compaction, including validation compaction.
>>> 
>>> compaction_throughput_mb_per_sec: 16
>>> 
>>> compaction_large_partition_warning_threshold_mb: 100
>>> 
>>>  
>>> 
>>> To check currently active compaction please use this command:
>>> 
>>>  
>>> 
>>> nodetool compactionstats -H
>>> 
>>>  
>>> 
>>> on the host which shows the problem.
>>> 
>>>  
>>> 
>>> --
>>> 
>>> Alex
>>> 
>>>  
>>> 
>>>  
>>> 
> 


Re: uneven data movement in one of the disk in Cassandra

2018-03-09 Thread James Shaw
per my testing, repair not help.
repair build Merkle tree to compare data, it only write to a new file while
have difference, very very small file at the end  (of course, means most
data are synced)

On Fri, Mar 9, 2018 at 10:31 PM, Madhu B  wrote:

> Yasir,
> I think you need to run full repair in off-peak hours
>
> Thanks,
> Madhu
>
>
> On Mar 9, 2018, at 7:20 AM, Kenneth Brotman 
> wrote:
>
> Yasir,
>
>
>
> How many nodes are in the cluster?
>
> What is num_tokens set to in the Cassandra.yaml file?
>
> Is it just this one node doing this?
>
> What replication factor do you use that affects the ranges on that disk?
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Kyrylo Lebediev [mailto:kyrylo_lebed...@epam.com
> ]
> *Sent:* Friday, March 09, 2018 4:14 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: uneven data movement in one of the disk in Cassandra
>
>
>
> Not sure where I heard this, but AFAIK data imbalance when multiple
> data_directories are in use is a known issue for older versions of
> Cassandra. This might be the root-cause of your issue.
>
> Which version of C* are you using?
>
> Unfortunately, don't remember in which version this imbalance issue was
> fixed.
>
>
>
> -- Kyrill
> --------------
>
> *From:* Yasir Saleem 
> *Sent:* Friday, March 9, 2018 1:34:08 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: uneven data movement in one of the disk in Cassandra
>
>
>
> Hi Alex,
>
>
>
> no active compaction, right now.
>
>
>
> 
>
>
>
> On Fri, Mar 9, 2018 at 3:47 PM, Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
> On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem 
> wrote:
>
> Thanks, Nicolas Guyomar
>
>
>
> I am new to cassandra, here is the properties which I can see in yaml
> file:
>
>
>
> # of compaction, including validation compaction.
>
> compaction_throughput_mb_per_sec: 16
>
> compaction_large_partition_warning_threshold_mb: 100
>
>
>
> To check currently active compaction please use this command:
>
>
>
> nodetool compactionstats -H
>
>
>
> on the host which shows the problem.
>
>
>
> --
>
> Alex
>
>
>
>
>
>


Re: uneven data movement in one of the disk in Cassandra

2018-03-09 Thread James Shaw
Ours have similar issue and I am working to solve it this weekend.
Our case is because STCS make one huge table's sstable file bigger and
bigger after compaction  (this is STCS compaction nature, nothing wrong),
even all most all data TTL 30days, but tombstones not evicted since largest
file is waiting for other 3 files for compaction.  The largest file 99.99%
are tombstones.

use command:  nodetool upgradesstables -a keyspace table
it will re-write all existed sstables and evit tombstones.

in you case, first do a few checking:
1. cd  /data/disk03/cassandra/data_prod/data
du -ks * | sort -n
find which tables use most space

2.  check the snapshot for above bigger tables
it's possible too old snapshots caused.

3.  cd table directory
sstablemetadata  sstablefile
to look the tables, whether a lot tombstones droppable

 4.
ls -lhS /data/disk */ cassandra/data_prod/data/"that
keyspace"/"that_table"*/*Data.db
look all sstables files,  you will see what's next compaction.

Per my watch, when small size compaction, seems randomly to which disks,
but when size large, it goes to disks which has more free space.

5.  if the biggest file too big, will wait long time for next compaction.
You may test ( sorry, not in my case, so I am not 100% sure)
1) if new cassandra 3.0,  you may try nodetool compact -s  ( it will split )
2) if old cassandra version,  stop cassandra,  use sstbalesplit


Hope it helps

Thanks,

James


On Fri, Mar 9, 2018 at 7:14 AM, Kyrylo Lebediev 
wrote:

> Not sure where I heard this, but AFAIK data imbalance when multiple
> data_directories are in use is a known issue for older versions of
> Cassandra. This might be the root-cause of your issue.
>
> Which version of C* are you using?
>
> Unfortunately, don't remember in which version this imbalance issue was
> fixed.
>
>
> -- Kyrill
> --
> *From:* Yasir Saleem 
> *Sent:* Friday, March 9, 2018 1:34:08 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: uneven data movement in one of the disk in Cassandra
>
> Hi Alex,
>
> no active compaction, right now.
>
>
>
>
> On Fri, Mar 9, 2018 at 3:47 PM, Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
> On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem 
> wrote:
>
> Thanks, Nicolas Guyomar
>
> I am new to cassandra, here is the properties which I can see in yaml
> file:
>
> # of compaction, including validation compaction.
> compaction_throughput_mb_per_sec: 16
> compaction_large_partition_warning_threshold_mb: 100
>
>
> To check currently active compaction please use this command:
>
> nodetool compactionstats -H
>
> on the host which shows the problem.
>
> --
> Alex
>
>
>


Re: uneven data movement in one of the disk in Cassandra

2018-03-09 Thread Madhu B
Yasir,
I think you need to run full repair in off-peak hours

Thanks,
Madhu


> On Mar 9, 2018, at 7:20 AM, Kenneth Brotman  
> wrote:
> 
> Yasir,
>  
> How many nodes are in the cluster? 
> What is num_tokens set to in the Cassandra.yaml file? 
> Is it just this one node doing this? 
> What replication factor do you use that affects the ranges on that disk?
>  
> Kenneth Brotman
>  
> From: Kyrylo Lebediev [mailto:kyrylo_lebed...@epam.com] 
> Sent: Friday, March 09, 2018 4:14 AM
> To: user@cassandra.apache.org
> Subject: Re: uneven data movement in one of the disk in Cassandra
>  
> Not sure where I heard this, but AFAIK data imbalance when multiple 
> data_directories are in use is a known issue for older versions of Cassandra. 
> This might be the root-cause of your issue.
> Which version of C* are you using?
> Unfortunately, don't remember in which version this imbalance issue was fixed.
>  
> -- Kyrill
> From: Yasir Saleem 
> Sent: Friday, March 9, 2018 1:34:08 PM
> To: user@cassandra.apache.org
> Subject: Re: uneven data movement in one of the disk in Cassandra
>  
> Hi Alex,
>  
> no active compaction, right now.
>  
> 
> 
>  
> On Fri, Mar 9, 2018 at 3:47 PM, Oleksandr Shulgin 
>  wrote:
> On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem  
> wrote:
> Thanks, Nicolas Guyomar
>  
> I am new to cassandra, here is the properties which I can see in yaml file:
>  
> # of compaction, including validation compaction.
> compaction_throughput_mb_per_sec: 16
> compaction_large_partition_warning_threshold_mb: 100
>  
> To check currently active compaction please use this command:
>  
> nodetool compactionstats -H
>  
> on the host which shows the problem.
>  
> --
> Alex
>  
>  


RE: uneven data movement in one of the disk in Cassandra

2018-03-09 Thread Kenneth Brotman
Yasir,

 

How many nodes are in the cluster?  

What is num_tokens set to in the Cassandra.yaml file?  

Is it just this one node doing this?  

What replication factor do you use that affects the ranges on that disk?

 

Kenneth Brotman

 

From: Kyrylo Lebediev [mailto:kyrylo_lebed...@epam.com] 
Sent: Friday, March 09, 2018 4:14 AM
To: user@cassandra.apache.org
Subject: Re: uneven data movement in one of the disk in Cassandra

 

Not sure where I heard this, but AFAIK data imbalance when multiple
data_directories are in use is a known issue for older versions of
Cassandra. This might be the root-cause of your issue. 

Which version of C* are you using?

Unfortunately, don't remember in which version this imbalance issue was
fixed.

 

-- Kyrill

  _  

From: Yasir Saleem 
Sent: Friday, March 9, 2018 1:34:08 PM
To: user@cassandra.apache.org
Subject: Re: uneven data movement in one of the disk in Cassandra 

 

Hi Alex, 

 

no active compaction, right now.

 



 

On Fri, Mar 9, 2018 at 3:47 PM, Oleksandr Shulgin
 wrote:

On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem 
wrote:

Thanks, Nicolas Guyomar 

 

I am new to cassandra, here is the properties which I can see in yaml file: 

 

# of compaction, including validation compaction.

compaction_throughput_mb_per_sec: 16

compaction_large_partition_warning_threshold_mb: 100

 

To check currently active compaction please use this command:

 

nodetool compactionstats -H

 

on the host which shows the problem.

 

--

Alex

 

 



Re: uneven data movement in one of the disk in Cassandra

2018-03-09 Thread Kyrylo Lebediev
Not sure where I heard this, but AFAIK data imbalance when multiple 
data_directories are in use is a known issue for older versions of Cassandra. 
This might be the root-cause of your issue.

Which version of C* are you using?

Unfortunately, don't remember in which version this imbalance issue was fixed.


-- Kyrill


From: Yasir Saleem 
Sent: Friday, March 9, 2018 1:34:08 PM
To: user@cassandra.apache.org
Subject: Re: uneven data movement in one of the disk in Cassandra

Hi Alex,

no active compaction, right now.

[cid:ii_jejv51ck1_1620a89ebd6c7e92]


On Fri, Mar 9, 2018 at 3:47 PM, Oleksandr Shulgin 
mailto:oleksandr.shul...@zalando.de>> wrote:
On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem 
mailto:yasirsaleem9...@gmail.com>> wrote:
Thanks, Nicolas Guyomar

I am new to cassandra, here is the properties which I can see in yaml file:

# of compaction, including validation compaction.
compaction_throughput_mb_per_sec: 16
compaction_large_partition_warning_threshold_mb: 100

To check currently active compaction please use this command:

nodetool compactionstats -H

on the host which shows the problem.

--
Alex




Re: uneven data movement in one of the disk in Cassandra

2018-03-09 Thread Yasir Saleem
Hi Alex,

no active compaction, right now.




On Fri, Mar 9, 2018 at 3:47 PM, Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem 
> wrote:
>
>> Thanks, Nicolas Guyomar
>>
>> I am new to cassandra, here is the properties which I can see in yaml
>> file:
>>
>> # of compaction, including validation compaction.
>> compaction_throughput_mb_per_sec: 16
>> compaction_large_partition_warning_threshold_mb: 100
>>
>
> To check currently active compaction please use this command:
>
> nodetool compactionstats -H
>
> on the host which shows the problem.
>
> --
> Alex
>
>


Re: uneven data movement in one of the disk in Cassandra

2018-03-09 Thread Oleksandr Shulgin
On Fri, Mar 9, 2018 at 11:40 AM, Yasir Saleem 
wrote:

> Thanks, Nicolas Guyomar
>
> I am new to cassandra, here is the properties which I can see in yaml file:
>
> # of compaction, including validation compaction.
> compaction_throughput_mb_per_sec: 16
> compaction_large_partition_warning_threshold_mb: 100
>

To check currently active compaction please use this command:

nodetool compactionstats -H

on the host which shows the problem.

--
Alex


Re: uneven data movement in one of the disk in Cassandra

2018-03-09 Thread Yasir Saleem
Thanks, Nicolas Guyomar

I am new to cassandra, here is the properties which I can see in yaml file:

# of compaction, including validation compaction.
compaction_throughput_mb_per_sec: 16
compaction_large_partition_warning_threshold_mb: 100



On Fri, Mar 9, 2018 at 3:33 PM, Nicolas Guyomar 
wrote:

> Hi,
>
> This might be a compaction which is running, have you check that ?
>
> On 9 March 2018 at 11:29, Yasir Saleem  wrote:
>
>> Hi Team,
>>
>>   we are facing issue of uneven data movement in cassandra disk for
>> specific which disk03 in our case, however all the disk are consuming
>> around 60% of space but disk03 is taking 87% space. Here is configuration
>> in yaml and current disk space:
>>
>> data_file_directories:
>> - /data/disk01/cassandra/data_prod/data
>> - /data/disk02/cassandra/data_prod/data
>> - /data/disk03/cassandra/data_prod/data
>> - /data/disk04/cassandra/data_prod/data
>> - /data/disk05/cassandra/data_prod/data
>>
>> disk space:
>>
>> 734G  417G  280G  60% /data/disk02
>> 734G  342G  355G  50% /data/disk05
>> 734G  383G  314G  55% /data/disk04
>> *734G  599G   98G  87% /data/disk03*
>> 734G  499G  198G  60% /data/disk01
>>
>> Please note that we have tried to delete data several times but still
>> space is continuously increasing in disk03. Please let me know if there is
>> any workaround to resolve this issue.
>>
>> Regards,
>>
>> Yasir.
>>
>>
>


Re: uneven data movement in one of the disk in Cassandra

2018-03-09 Thread Nicolas Guyomar
Hi,

This might be a compaction which is running, have you check that ?

On 9 March 2018 at 11:29, Yasir Saleem  wrote:

> Hi Team,
>
>   we are facing issue of uneven data movement in cassandra disk for
> specific which disk03 in our case, however all the disk are consuming
> around 60% of space but disk03 is taking 87% space. Here is configuration
> in yaml and current disk space:
>
> data_file_directories:
> - /data/disk01/cassandra/data_prod/data
> - /data/disk02/cassandra/data_prod/data
> - /data/disk03/cassandra/data_prod/data
> - /data/disk04/cassandra/data_prod/data
> - /data/disk05/cassandra/data_prod/data
>
> disk space:
>
> 734G  417G  280G  60% /data/disk02
> 734G  342G  355G  50% /data/disk05
> 734G  383G  314G  55% /data/disk04
> *734G  599G   98G  87% /data/disk03*
> 734G  499G  198G  60% /data/disk01
>
> Please note that we have tried to delete data several times but still
> space is continuously increasing in disk03. Please let me know if there is
> any workaround to resolve this issue.
>
> Regards,
>
> Yasir.
>
>