Re: Many pending compactions

2015-02-24 Thread Ja Sam
The repair results is following (we run it Friday): Cannot proceed on
repair because a neighbor (/192.168.61.201) is dead: session failed

But to be honest the neighbor did not died. It seemed to trigger a series
of full GC events on the initiating node. The results form logs are:

[2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges
for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false)
[2015-02-21 02:21:55,640] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 02:22:55,642] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 02:23:55,642] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 02:24:55,644] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 04:41:08,607] Repair session
d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range
(85070591730234615865843651857942052874,102084710076281535261119195933814292480]
failed with error org.apache.cassandra.exceptions.RepairException: [repair
#d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events,
(85070591730234615865843651857942052874,102084710076281535261119195933814292480]]
Sync failed between /192.168.71.196 and /192.168.61.199
[2015-02-21 04:41:08,608] Repair session
eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range
(68056473384187696470568107782069813248,85070591730234615865843651857942052874]
failed with error java.io.IOException: Endpoint /192.168.61.199 died
[2015-02-21 04:41:08,608] Repair session
c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error
java.io.IOException: Cannot proceed on repair because a neighbor (/
192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,609] Repair session
c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range
(42535295865117307932921825928971026442,68056473384187696470568107782069813248]
failed with error java.io.IOException: Cannot proceed on repair because a
neighbor (/192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,609] Repair session
c48d38f1-b971-11e4-bc97-e9a66e5b2124 for range
(127605887595351923798765477786913079306,136112946768375392941136215564139626496]
failed with error java.io.IOException: Cannot proceed on repair because a
neighbor (/192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,619] Repair session
c48d6000-b971-11e4-bc97-e9a66e5b2124 for range
(136112946768375392941136215564139626496,0] failed with error
java.io.IOException: Cannot proceed on repair because a neighbor (/
192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,620] Repair session
c48d6001-b971-11e4-bc97-e9a66e5b2124 for range
(102084710076281535261119195933814292480,127605887595351923798765477786913079306]
failed with error java.io.IOException: Cannot proceed on repair because a
neighbor (/192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,620] Repair command #2 finished


We tried to run repair one more time. After 24 hour have some streaming
errors. Moreover we have to stop it because we start to have write timeouts
on client :(

We check iostat when we have write timeouts. Example from one node in DC_A
are here:
The file also contains tpstats from all nodes.Nodes starting with "z" are
in DC_B, rest is in DC_A
Cassandra is data and commit log are on disk dm-XX.

I also read
http://jonathanhui.com/cassandra-performance-tuning-and-monitoring and I
think about:
1) memtable configuration - do you have some suggestion?
2) run INSERT in batch statements - I am not sure if this reduce IO, again
do you have experience with this?

Any tips will be helpful

Regards
Piotrek

On Thu, Feb 19, 2015 at 10:34 AM, Roland Etzenhammer <
r.etzenham...@t-online.de> wrote:

> Hi,
>
> 2.1.3 is now the official latest release - I checked this morning and got
> this good surprise. Now it's update time - thanks to all guys involved, if
> I meet anyone one beer from me :-)
>
> The changelist is rather long:
> https://git1-us-west.apache.org/repos/asf?p=cassandra.git;
> a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-2.1.3
>
> Hopefully that will solve many of those oddities and not invent to much
> new ones :-)
>
> Cheers,
> Roland
>
>
>


Re: Many pending compactions

2015-02-19 Thread Roland Etzenhammer

Hi,

2.1.3 is now the official latest release - I checked this morning and 
got this good surprise. Now it's update time - thanks to all guys 
involved, if I meet anyone one beer from me :-)


The changelist is rather long:
https://git1-us-west.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-2.1.3

Hopefully that will solve many of those oddities and not invent to much 
new ones :-)


Cheers,
Roland




Re: Many pending compactions

2015-02-18 Thread Ja Sam
As Al Tobey suggest me I upgraded my 2.1.0 to snaphot version of 2.1.3. I
have now installed exactly this build:
https://cassci.datastax.com/job/cassandra-2.1/912/
I see many compaction which completes, but some of them are really slow.
Maybe I should send some stats form OpsCenter or servers? But it is
difficult to me to choose what is important

Regards



On Wed, Feb 18, 2015 at 6:11 PM, Jake Luciani  wrote:

> Ja, Please upgrade to official 2.1.3 we've fixed many things related to
> compaction.  Are you seeing the compactions % complete progress at all?
>
> On Wed, Feb 18, 2015 at 11:58 AM, Roni Balthazar 
> wrote:
>
>> Try repair -pr on all nodes.
>>
>> If after that you still have issues, you can try to rebuild the SSTables
>> using nodetool upgradesstables or scrub.
>>
>> Regards,
>>
>> Roni Balthazar
>>
>> Em 18/02/2015, às 14:13, Ja Sam  escreveu:
>>
>> ad 3)  I did this already yesterday (setcompactionthrouput also). But
>> still SSTables are increasing.
>>
>> ad 1) What do you think I should use -pr or try to use incremental?
>>
>>
>>
>> On Wed, Feb 18, 2015 at 4:54 PM, Roni Balthazar 
>> wrote:
>>
>>> You are right... Repair makes the data consistent between nodes.
>>>
>>> I understand that you have 2 issues going on.
>>>
>>> You need to run repair periodically without errors and need to decrease
>>> the numbers of compactions pending.
>>>
>>> So I suggest:
>>>
>>> 1) Run repair -pr on all nodes. If you upgrade to the new 2.1.3, you can
>>> use incremental repairs. There were some bugs on 2.1.2.
>>> 2) Run cleanup on all nodes
>>> 3) Since you have too many cold SSTables, set cold_reads_to_omit to
>>> 0.0, and increase setcompactionthroughput for some time and see if the
>>> number of SSTables is going down.
>>>
>>> Let us know what errors are you getting when running repairs.
>>>
>>> Regards,
>>>
>>> Roni Balthazar
>>>
>>>
>>> On Wed, Feb 18, 2015 at 1:31 PM, Ja Sam  wrote:
>>>
>>>> Can you explain me what is the correlation between growing SSTables and
>>>> repair?
>>>> I was sure, until your  mail, that repair is only to make data
>>>> consistent between nodes.
>>>>
>>>> Regards
>>>>
>>>>
>>>> On Wed, Feb 18, 2015 at 4:20 PM, Roni Balthazar <
>>>> ronibaltha...@gmail.com> wrote:
>>>>
>>>>> Which error are you getting when running repairs?
>>>>> You need to run repair on your nodes within gc_grace_seconds (eg:
>>>>> weekly). They have data that are not read frequently. You can run
>>>>> "repair -pr" on all nodes. Since you do not have deletes, you will not
>>>>> have trouble with that. If you have deletes, it's better to increase
>>>>> gc_grace_seconds before the repair.
>>>>>
>>>>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
>>>>> After repair, try to run a "nodetool cleanup".
>>>>>
>>>>> Check if the number of SSTables goes down after that... Pending
>>>>> compactions must decrease as well...
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Roni Balthazar
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam  wrote:
>>>>> > 1) we tried to run repairs but they usually does not succeed. But we
>>>>> had
>>>>> > Leveled compaction before. Last week we ALTER tables to STCS,
>>>>> because guys
>>>>> > from DataStax suggest us that we should not use Leveled and alter
>>>>> tables in
>>>>> > STCS, because we don't have SSD. After this change we did not run any
>>>>> > repair. Anyway I don't think it will change anything in SSTable
>>>>> count - if I
>>>>> > am wrong please give me an information
>>>>> >
>>>>> > 2) I did this. My tables are 99% write only. It is audit system
>>>>> >
>>>>> > 3) Yes I am using default values
>>>>> >
>>>>> > 4) In both operations I am using LOCAL_QUORUM.
>>>>> >
>>>>> > I am almost sure that READ timeout happens because of too much
>>>>> SSTables

Re: Many pending compactions

2015-02-18 Thread Jake Luciani
Ja, Please upgrade to official 2.1.3 we've fixed many things related to
compaction.  Are you seeing the compactions % complete progress at all?

On Wed, Feb 18, 2015 at 11:58 AM, Roni Balthazar 
wrote:

> Try repair -pr on all nodes.
>
> If after that you still have issues, you can try to rebuild the SSTables
> using nodetool upgradesstables or scrub.
>
> Regards,
>
> Roni Balthazar
>
> Em 18/02/2015, às 14:13, Ja Sam  escreveu:
>
> ad 3)  I did this already yesterday (setcompactionthrouput also). But
> still SSTables are increasing.
>
> ad 1) What do you think I should use -pr or try to use incremental?
>
>
>
> On Wed, Feb 18, 2015 at 4:54 PM, Roni Balthazar 
> wrote:
>
>> You are right... Repair makes the data consistent between nodes.
>>
>> I understand that you have 2 issues going on.
>>
>> You need to run repair periodically without errors and need to decrease
>> the numbers of compactions pending.
>>
>> So I suggest:
>>
>> 1) Run repair -pr on all nodes. If you upgrade to the new 2.1.3, you can
>> use incremental repairs. There were some bugs on 2.1.2.
>> 2) Run cleanup on all nodes
>> 3) Since you have too many cold SSTables, set cold_reads_to_omit to 0.0,
>> and increase setcompactionthroughput for some time and see if the number
>> of SSTables is going down.
>>
>> Let us know what errors are you getting when running repairs.
>>
>> Regards,
>>
>> Roni Balthazar
>>
>>
>> On Wed, Feb 18, 2015 at 1:31 PM, Ja Sam  wrote:
>>
>>> Can you explain me what is the correlation between growing SSTables and
>>> repair?
>>> I was sure, until your  mail, that repair is only to make data
>>> consistent between nodes.
>>>
>>> Regards
>>>
>>>
>>> On Wed, Feb 18, 2015 at 4:20 PM, Roni Balthazar >> > wrote:
>>>
>>>> Which error are you getting when running repairs?
>>>> You need to run repair on your nodes within gc_grace_seconds (eg:
>>>> weekly). They have data that are not read frequently. You can run
>>>> "repair -pr" on all nodes. Since you do not have deletes, you will not
>>>> have trouble with that. If you have deletes, it's better to increase
>>>> gc_grace_seconds before the repair.
>>>>
>>>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
>>>> After repair, try to run a "nodetool cleanup".
>>>>
>>>> Check if the number of SSTables goes down after that... Pending
>>>> compactions must decrease as well...
>>>>
>>>> Cheers,
>>>>
>>>> Roni Balthazar
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam  wrote:
>>>> > 1) we tried to run repairs but they usually does not succeed. But we
>>>> had
>>>> > Leveled compaction before. Last week we ALTER tables to STCS, because
>>>> guys
>>>> > from DataStax suggest us that we should not use Leveled and alter
>>>> tables in
>>>> > STCS, because we don't have SSD. After this change we did not run any
>>>> > repair. Anyway I don't think it will change anything in SSTable count
>>>> - if I
>>>> > am wrong please give me an information
>>>> >
>>>> > 2) I did this. My tables are 99% write only. It is audit system
>>>> >
>>>> > 3) Yes I am using default values
>>>> >
>>>> > 4) In both operations I am using LOCAL_QUORUM.
>>>> >
>>>> > I am almost sure that READ timeout happens because of too much
>>>> SSTables.
>>>> > Anyway firstly I would like to fix to many pending compactions. I
>>>> still
>>>> > don't know how to speed up them.
>>>> >
>>>> >
>>>> > On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar <
>>>> ronibaltha...@gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> Are you running repairs within gc_grace_seconds? (default is 10 days)
>>>> >>
>>>> >>
>>>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
>>>> >>
>>>> >> Double check if you set cold_reads_to_omit to 0.0 on tables with STCS
>>>> >> that you do not read often.
>>>> >>
>>&g

Re: Many pending compactions

2015-02-18 Thread Roni Balthazar
Try repair -pr on all nodes.

If after that you still have issues, you can try to rebuild the SSTables using 
nodetool upgradesstables or scrub.

Regards,

Roni Balthazar

> Em 18/02/2015, às 14:13, Ja Sam  escreveu:
> 
> ad 3)  I did this already yesterday (setcompactionthrouput also). But still 
> SSTables are increasing.
> 
> ad 1) What do you think I should use -pr or try to use incremental?
> 
> 
> 
>> On Wed, Feb 18, 2015 at 4:54 PM, Roni Balthazar  
>> wrote:
>> You are right... Repair makes the data consistent between nodes.
>> 
>> I understand that you have 2 issues going on.
>> 
>> You need to run repair periodically without errors and need to decrease the 
>> numbers of compactions pending.
>> 
>> So I suggest:
>> 
>> 1) Run repair -pr on all nodes. If you upgrade to the new 2.1.3, you can use 
>> incremental repairs. There were some bugs on 2.1.2.
>> 2) Run cleanup on all nodes
>> 3) Since you have too many cold SSTables, set cold_reads_to_omit to 0.0, and 
>> increase setcompactionthroughput for some time and see if the number of 
>> SSTables is going down.
>> 
>> Let us know what errors are you getting when running repairs.
>> 
>> Regards,
>> 
>> Roni Balthazar
>> 
>> 
>>> On Wed, Feb 18, 2015 at 1:31 PM, Ja Sam  wrote:
>>> Can you explain me what is the correlation between growing SSTables and 
>>> repair? 
>>> I was sure, until your  mail, that repair is only to make data consistent 
>>> between nodes.
>>> 
>>> Regards
>>> 
>>> 
>>> 
>>>> On Wed, Feb 18, 2015 at 4:20 PM, Roni Balthazar  
>>>> wrote:
>>>> Which error are you getting when running repairs?
>>>> You need to run repair on your nodes within gc_grace_seconds (eg:
>>>> weekly). They have data that are not read frequently. You can run
>>>> "repair -pr" on all nodes. Since you do not have deletes, you will not
>>>> have trouble with that. If you have deletes, it's better to increase
>>>> gc_grace_seconds before the repair.
>>>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
>>>> After repair, try to run a "nodetool cleanup".
>>>> 
>>>> Check if the number of SSTables goes down after that... Pending
>>>> compactions must decrease as well...
>>>> 
>>>> Cheers,
>>>> 
>>>> Roni Balthazar
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam  wrote:
>>>> > 1) we tried to run repairs but they usually does not succeed. But we had
>>>> > Leveled compaction before. Last week we ALTER tables to STCS, because 
>>>> > guys
>>>> > from DataStax suggest us that we should not use Leveled and alter tables 
>>>> > in
>>>> > STCS, because we don't have SSD. After this change we did not run any
>>>> > repair. Anyway I don't think it will change anything in SSTable count - 
>>>> > if I
>>>> > am wrong please give me an information
>>>> >
>>>> > 2) I did this. My tables are 99% write only. It is audit system
>>>> >
>>>> > 3) Yes I am using default values
>>>> >
>>>> > 4) In both operations I am using LOCAL_QUORUM.
>>>> >
>>>> > I am almost sure that READ timeout happens because of too much SSTables.
>>>> > Anyway firstly I would like to fix to many pending compactions. I still
>>>> > don't know how to speed up them.
>>>> >
>>>> >
>>>> > On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar 
>>>> > wrote:
>>>> >>
>>>> >> Are you running repairs within gc_grace_seconds? (default is 10 days)
>>>> >>
>>>> >> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
>>>> >>
>>>> >> Double check if you set cold_reads_to_omit to 0.0 on tables with STCS
>>>> >> that you do not read often.
>>>> >>
>>>> >> Are you using default values for the properties
>>>> >> min_compaction_threshold(4) and max_compaction_threshold(32)?
>>>> >>
>>>> >> Which Consistency Level are you using for reading operations? Check if
>>>> >> you are not reading 

Re: Many pending compactions

2015-02-18 Thread Ja Sam
ad 3)  I did this already yesterday (setcompactionthrouput also). But still
SSTables are increasing.

ad 1) What do you think I should use -pr or try to use incremental?



On Wed, Feb 18, 2015 at 4:54 PM, Roni Balthazar 
wrote:

> You are right... Repair makes the data consistent between nodes.
>
> I understand that you have 2 issues going on.
>
> You need to run repair periodically without errors and need to decrease
> the numbers of compactions pending.
>
> So I suggest:
>
> 1) Run repair -pr on all nodes. If you upgrade to the new 2.1.3, you can
> use incremental repairs. There were some bugs on 2.1.2.
> 2) Run cleanup on all nodes
> 3) Since you have too many cold SSTables, set cold_reads_to_omit to 0.0,
> and increase setcompactionthroughput for some time and see if the number
> of SSTables is going down.
>
> Let us know what errors are you getting when running repairs.
>
> Regards,
>
> Roni Balthazar
>
>
> On Wed, Feb 18, 2015 at 1:31 PM, Ja Sam  wrote:
>
>> Can you explain me what is the correlation between growing SSTables and
>> repair?
>> I was sure, until your  mail, that repair is only to make data consistent
>> between nodes.
>>
>> Regards
>>
>>
>> On Wed, Feb 18, 2015 at 4:20 PM, Roni Balthazar 
>> wrote:
>>
>>> Which error are you getting when running repairs?
>>> You need to run repair on your nodes within gc_grace_seconds (eg:
>>> weekly). They have data that are not read frequently. You can run
>>> "repair -pr" on all nodes. Since you do not have deletes, you will not
>>> have trouble with that. If you have deletes, it's better to increase
>>> gc_grace_seconds before the repair.
>>>
>>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
>>> After repair, try to run a "nodetool cleanup".
>>>
>>> Check if the number of SSTables goes down after that... Pending
>>> compactions must decrease as well...
>>>
>>> Cheers,
>>>
>>> Roni Balthazar
>>>
>>>
>>>
>>>
>>> On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam  wrote:
>>> > 1) we tried to run repairs but they usually does not succeed. But we
>>> had
>>> > Leveled compaction before. Last week we ALTER tables to STCS, because
>>> guys
>>> > from DataStax suggest us that we should not use Leveled and alter
>>> tables in
>>> > STCS, because we don't have SSD. After this change we did not run any
>>> > repair. Anyway I don't think it will change anything in SSTable count
>>> - if I
>>> > am wrong please give me an information
>>> >
>>> > 2) I did this. My tables are 99% write only. It is audit system
>>> >
>>> > 3) Yes I am using default values
>>> >
>>> > 4) In both operations I am using LOCAL_QUORUM.
>>> >
>>> > I am almost sure that READ timeout happens because of too much
>>> SSTables.
>>> > Anyway firstly I would like to fix to many pending compactions. I still
>>> > don't know how to speed up them.
>>> >
>>> >
>>> > On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar <
>>> ronibaltha...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Are you running repairs within gc_grace_seconds? (default is 10 days)
>>> >>
>>> >>
>>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
>>> >>
>>> >> Double check if you set cold_reads_to_omit to 0.0 on tables with STCS
>>> >> that you do not read often.
>>> >>
>>> >> Are you using default values for the properties
>>> >> min_compaction_threshold(4) and max_compaction_threshold(32)?
>>> >>
>>> >> Which Consistency Level are you using for reading operations? Check if
>>> >> you are not reading from DC_B due to your Replication Factor and CL.
>>> >>
>>> >>
>>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
>>> >>
>>> >>
>>> >> Cheers,
>>> >>
>>> >> Roni Balthazar
>>> >>
>>> >> On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam  wrote:
>>> >> > I don't have problems with DC_B (replica) only in DC_A(my system
>>> write
>>> >> > only
>

Re: Many pending compactions

2015-02-18 Thread Roni Balthazar
You are right... Repair makes the data consistent between nodes.

I understand that you have 2 issues going on.

You need to run repair periodically without errors and need to decrease the
numbers of compactions pending.

So I suggest:

1) Run repair -pr on all nodes. If you upgrade to the new 2.1.3, you can
use incremental repairs. There were some bugs on 2.1.2.
2) Run cleanup on all nodes
3) Since you have too many cold SSTables, set cold_reads_to_omit to 0.0,
and increase setcompactionthroughput for some time and see if the number of
SSTables is going down.

Let us know what errors are you getting when running repairs.

Regards,

Roni Balthazar


On Wed, Feb 18, 2015 at 1:31 PM, Ja Sam  wrote:

> Can you explain me what is the correlation between growing SSTables and
> repair?
> I was sure, until your  mail, that repair is only to make data consistent
> between nodes.
>
> Regards
>
>
> On Wed, Feb 18, 2015 at 4:20 PM, Roni Balthazar 
> wrote:
>
>> Which error are you getting when running repairs?
>> You need to run repair on your nodes within gc_grace_seconds (eg:
>> weekly). They have data that are not read frequently. You can run
>> "repair -pr" on all nodes. Since you do not have deletes, you will not
>> have trouble with that. If you have deletes, it's better to increase
>> gc_grace_seconds before the repair.
>>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
>> After repair, try to run a "nodetool cleanup".
>>
>> Check if the number of SSTables goes down after that... Pending
>> compactions must decrease as well...
>>
>> Cheers,
>>
>> Roni Balthazar
>>
>>
>>
>>
>> On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam  wrote:
>> > 1) we tried to run repairs but they usually does not succeed. But we had
>> > Leveled compaction before. Last week we ALTER tables to STCS, because
>> guys
>> > from DataStax suggest us that we should not use Leveled and alter
>> tables in
>> > STCS, because we don't have SSD. After this change we did not run any
>> > repair. Anyway I don't think it will change anything in SSTable count -
>> if I
>> > am wrong please give me an information
>> >
>> > 2) I did this. My tables are 99% write only. It is audit system
>> >
>> > 3) Yes I am using default values
>> >
>> > 4) In both operations I am using LOCAL_QUORUM.
>> >
>> > I am almost sure that READ timeout happens because of too much SSTables.
>> > Anyway firstly I would like to fix to many pending compactions. I still
>> > don't know how to speed up them.
>> >
>> >
>> > On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar <
>> ronibaltha...@gmail.com>
>> > wrote:
>> >>
>> >> Are you running repairs within gc_grace_seconds? (default is 10 days)
>> >>
>> >>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
>> >>
>> >> Double check if you set cold_reads_to_omit to 0.0 on tables with STCS
>> >> that you do not read often.
>> >>
>> >> Are you using default values for the properties
>> >> min_compaction_threshold(4) and max_compaction_threshold(32)?
>> >>
>> >> Which Consistency Level are you using for reading operations? Check if
>> >> you are not reading from DC_B due to your Replication Factor and CL.
>> >>
>> >>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
>> >>
>> >>
>> >> Cheers,
>> >>
>> >> Roni Balthazar
>> >>
>> >> On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam  wrote:
>> >> > I don't have problems with DC_B (replica) only in DC_A(my system
>> write
>> >> > only
>> >> > to it) I have read timeouts.
>> >> >
>> >> > I checked in OpsCenter SSTable count  and I have:
>> >> > 1) in DC_A  same +-10% for last week, a small increase for last 24h
>> (it
>> >> > is
>> >> > more than 15000-2 SSTables depends on node)
>> >> > 2) in DC_B last 24h shows up to 50% decrease, which give nice
>> >> > prognostics.
>> >> > Now I have less then 1000 SSTables
>> >> >
>> >> > What did you measure during system optimizations? Or do you have an
>> idea
>> >> > what more should I check?
>> >> &g

Re: Many pending compactions

2015-02-18 Thread Marcelo Valle (BLOOMBERG/ LONDON)
Cassandra 2.1 comes with incremental repair, and I haven't read the details 
myself: 
http://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_repair_nodes_c.html

However, AFAIK, a full repair will rebuild all sstables, that's why you should 
have more than 50% of disk space available on each node. Of course, it will 
also make sure data is replicated to the right nodes in the process.

[]s
From: user@cassandra.apache.org 
Subject: Re: Many pending compactions

Can you explain me what is the correlation between growing SSTables and repair? 
I was sure, until your  mail, that repair is only to make data consistent 
between nodes.

Regards


On Wed, Feb 18, 2015 at 4:20 PM, Roni Balthazar  wrote:

Which error are you getting when running repairs?
You need to run repair on your nodes within gc_grace_seconds (eg:
weekly). They have data that are not read frequently. You can run
"repair -pr" on all nodes. Since you do not have deletes, you will not
have trouble with that. If you have deletes, it's better to increase
gc_grace_seconds before the repair.
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
After repair, try to run a "nodetool cleanup".

Check if the number of SSTables goes down after that... Pending
compactions must decrease as well...

Cheers,

Roni Balthazar


On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam  wrote:
> 1) we tried to run repairs but they usually does not succeed. But we had
> Leveled compaction before. Last week we ALTER tables to STCS, because guys
> from DataStax suggest us that we should not use Leveled and alter tables in
> STCS, because we don't have SSD. After this change we did not run any
> repair. Anyway I don't think it will change anything in SSTable count - if I
> am wrong please give me an information
>
> 2) I did this. My tables are 99% write only. It is audit system
>
> 3) Yes I am using default values
>
> 4) In both operations I am using LOCAL_QUORUM.
>
> I am almost sure that READ timeout happens because of too much SSTables.
> Anyway firstly I would like to fix to many pending compactions. I still
> don't know how to speed up them.
>
>
> On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar 
> wrote:
>>
>> Are you running repairs within gc_grace_seconds? (default is 10 days)
>>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
>>
>> Double check if you set cold_reads_to_omit to 0.0 on tables with STCS
>> that you do not read often.
>>
>> Are you using default values for the properties
>> min_compaction_threshold(4) and max_compaction_threshold(32)?
>>
>> Which Consistency Level are you using for reading operations? Check if
>> you are not reading from DC_B due to your Replication Factor and CL.
>>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
>>
>>
>> Cheers,
>>
>> Roni Balthazar
>>
>> On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam  wrote:
>> > I don't have problems with DC_B (replica) only in DC_A(my system write
>> > only
>> > to it) I have read timeouts.
>> >
>> > I checked in OpsCenter SSTable count  and I have:
>> > 1) in DC_A  same +-10% for last week, a small increase for last 24h (it
>> > is
>> > more than 15000-2 SSTables depends on node)
>> > 2) in DC_B last 24h shows up to 50% decrease, which give nice
>> > prognostics.
>> > Now I have less then 1000 SSTables
>> >
>> > What did you measure during system optimizations? Or do you have an idea
>> > what more should I check?
>> > 1) I look at CPU Idle (one node is 50% idle, rest 70% idle)
>> > 2) Disk queue -> mostly is it near zero: avg 0.09. Sometimes there are
>> > spikes
>> > 3) system RAM usage is almost full
>> > 4) In Total Bytes Compacted most most lines are below 3MB/s. For total
>> > DC_A
>> > it is less than 10MB/s, in DC_B it looks much better (avg is like
>> > 17MB/s)
>> >
>> > something else?
>> >
>> >
>> >
>> > On Wed, Feb 18, 2015 at 1:32 PM, Roni Balthazar
>> > 
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> You can check if the number of SSTables is decreasing. Look for the
>> >> "SSTable count" information of your tables using "nodetool cfstats".
>> >> The compaction history can be viewed using "nodetool
>> >> compactionhistory".
>> >>
>> >> About the timeouts, check this out:
>> >>

Re: Many pending compactions

2015-02-18 Thread Ja Sam
Can you explain me what is the correlation between growing SSTables and
repair?
I was sure, until your  mail, that repair is only to make data consistent
between nodes.

Regards


On Wed, Feb 18, 2015 at 4:20 PM, Roni Balthazar 
wrote:

> Which error are you getting when running repairs?
> You need to run repair on your nodes within gc_grace_seconds (eg:
> weekly). They have data that are not read frequently. You can run
> "repair -pr" on all nodes. Since you do not have deletes, you will not
> have trouble with that. If you have deletes, it's better to increase
> gc_grace_seconds before the repair.
>
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
> After repair, try to run a "nodetool cleanup".
>
> Check if the number of SSTables goes down after that... Pending
> compactions must decrease as well...
>
> Cheers,
>
> Roni Balthazar
>
>
>
>
> On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam  wrote:
> > 1) we tried to run repairs but they usually does not succeed. But we had
> > Leveled compaction before. Last week we ALTER tables to STCS, because
> guys
> > from DataStax suggest us that we should not use Leveled and alter tables
> in
> > STCS, because we don't have SSD. After this change we did not run any
> > repair. Anyway I don't think it will change anything in SSTable count -
> if I
> > am wrong please give me an information
> >
> > 2) I did this. My tables are 99% write only. It is audit system
> >
> > 3) Yes I am using default values
> >
> > 4) In both operations I am using LOCAL_QUORUM.
> >
> > I am almost sure that READ timeout happens because of too much SSTables.
> > Anyway firstly I would like to fix to many pending compactions. I still
> > don't know how to speed up them.
> >
> >
> > On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar  >
> > wrote:
> >>
> >> Are you running repairs within gc_grace_seconds? (default is 10 days)
> >>
> >>
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
> >>
> >> Double check if you set cold_reads_to_omit to 0.0 on tables with STCS
> >> that you do not read often.
> >>
> >> Are you using default values for the properties
> >> min_compaction_threshold(4) and max_compaction_threshold(32)?
> >>
> >> Which Consistency Level are you using for reading operations? Check if
> >> you are not reading from DC_B due to your Replication Factor and CL.
> >>
> >>
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
> >>
> >>
> >> Cheers,
> >>
> >> Roni Balthazar
> >>
> >> On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam  wrote:
> >> > I don't have problems with DC_B (replica) only in DC_A(my system write
> >> > only
> >> > to it) I have read timeouts.
> >> >
> >> > I checked in OpsCenter SSTable count  and I have:
> >> > 1) in DC_A  same +-10% for last week, a small increase for last 24h
> (it
> >> > is
> >> > more than 15000-2 SSTables depends on node)
> >> > 2) in DC_B last 24h shows up to 50% decrease, which give nice
> >> > prognostics.
> >> > Now I have less then 1000 SSTables
> >> >
> >> > What did you measure during system optimizations? Or do you have an
> idea
> >> > what more should I check?
> >> > 1) I look at CPU Idle (one node is 50% idle, rest 70% idle)
> >> > 2) Disk queue -> mostly is it near zero: avg 0.09. Sometimes there are
> >> > spikes
> >> > 3) system RAM usage is almost full
> >> > 4) In Total Bytes Compacted most most lines are below 3MB/s. For total
> >> > DC_A
> >> > it is less than 10MB/s, in DC_B it looks much better (avg is like
> >> > 17MB/s)
> >> >
> >> > something else?
> >> >
> >> >
> >> >
> >> > On Wed, Feb 18, 2015 at 1:32 PM, Roni Balthazar
> >> > 
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> You can check if the number of SSTables is decreasing. Look for the
> >> >> "SSTable count" information of your tables using "nodetool cfstats".
> >> >> The compaction history can be viewed using "nodetool
> >> >> compactionhistory".
> >> >>
> >> >> About t

Re: Many pending compactions

2015-02-18 Thread Roni Balthazar
Which error are you getting when running repairs?
You need to run repair on your nodes within gc_grace_seconds (eg:
weekly). They have data that are not read frequently. You can run
"repair -pr" on all nodes. Since you do not have deletes, you will not
have trouble with that. If you have deletes, it's better to increase
gc_grace_seconds before the repair.
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
After repair, try to run a "nodetool cleanup".

Check if the number of SSTables goes down after that... Pending
compactions must decrease as well...

Cheers,

Roni Balthazar




On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam  wrote:
> 1) we tried to run repairs but they usually does not succeed. But we had
> Leveled compaction before. Last week we ALTER tables to STCS, because guys
> from DataStax suggest us that we should not use Leveled and alter tables in
> STCS, because we don't have SSD. After this change we did not run any
> repair. Anyway I don't think it will change anything in SSTable count - if I
> am wrong please give me an information
>
> 2) I did this. My tables are 99% write only. It is audit system
>
> 3) Yes I am using default values
>
> 4) In both operations I am using LOCAL_QUORUM.
>
> I am almost sure that READ timeout happens because of too much SSTables.
> Anyway firstly I would like to fix to many pending compactions. I still
> don't know how to speed up them.
>
>
> On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar 
> wrote:
>>
>> Are you running repairs within gc_grace_seconds? (default is 10 days)
>>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
>>
>> Double check if you set cold_reads_to_omit to 0.0 on tables with STCS
>> that you do not read often.
>>
>> Are you using default values for the properties
>> min_compaction_threshold(4) and max_compaction_threshold(32)?
>>
>> Which Consistency Level are you using for reading operations? Check if
>> you are not reading from DC_B due to your Replication Factor and CL.
>>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
>>
>>
>> Cheers,
>>
>> Roni Balthazar
>>
>> On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam  wrote:
>> > I don't have problems with DC_B (replica) only in DC_A(my system write
>> > only
>> > to it) I have read timeouts.
>> >
>> > I checked in OpsCenter SSTable count  and I have:
>> > 1) in DC_A  same +-10% for last week, a small increase for last 24h (it
>> > is
>> > more than 15000-2 SSTables depends on node)
>> > 2) in DC_B last 24h shows up to 50% decrease, which give nice
>> > prognostics.
>> > Now I have less then 1000 SSTables
>> >
>> > What did you measure during system optimizations? Or do you have an idea
>> > what more should I check?
>> > 1) I look at CPU Idle (one node is 50% idle, rest 70% idle)
>> > 2) Disk queue -> mostly is it near zero: avg 0.09. Sometimes there are
>> > spikes
>> > 3) system RAM usage is almost full
>> > 4) In Total Bytes Compacted most most lines are below 3MB/s. For total
>> > DC_A
>> > it is less than 10MB/s, in DC_B it looks much better (avg is like
>> > 17MB/s)
>> >
>> > something else?
>> >
>> >
>> >
>> > On Wed, Feb 18, 2015 at 1:32 PM, Roni Balthazar
>> > 
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> You can check if the number of SSTables is decreasing. Look for the
>> >> "SSTable count" information of your tables using "nodetool cfstats".
>> >> The compaction history can be viewed using "nodetool
>> >> compactionhistory".
>> >>
>> >> About the timeouts, check this out:
>> >>
>> >> http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure
>> >> Also try to run "nodetool tpstats" to see the threads statistics. It
>> >> can lead you to know if you are having performance problems. If you
>> >> are having too many pending tasks or dropped messages, maybe will you
>> >> need to tune your system (eg: driver's timeout, concurrent reads and
>> >> so on)
>> >>
>> >> Regards,
>> >>
>> >> Roni Balthazar
>> >>
>> >> On Wed, Feb 18, 2015 at 9:51 AM, Ja Sam  wrote:
>> >> > Hi,
>> >> > Thanks for your "tip&

Re: Many pending compactions

2015-02-18 Thread Ja Sam
1) we tried to run repairs but they usually does not succeed. But we had
Leveled compaction before. Last week we ALTER tables to STCS, because guys
from DataStax suggest us that we should not use Leveled and alter tables in
STCS, because we don't have SSD. After this change we did not run any
repair. Anyway I don't think it will change anything in SSTable count - if
I am wrong please give me an information

2) I did this. My tables are 99% write only. It is audit system

3) Yes I am using default values

4) In both operations I am using LOCAL_QUORUM.

I am almost sure that READ timeout happens because of too much SSTables.
Anyway firstly I would like to fix to many pending compactions. I still
don't know how to speed up them.


On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar 
wrote:

> Are you running repairs within gc_grace_seconds? (default is 10 days)
>
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
>
> Double check if you set cold_reads_to_omit to 0.0 on tables with STCS
> that you do not read often.
>
> Are you using default values for the properties
> min_compaction_threshold(4) and max_compaction_threshold(32)?
>
> Which Consistency Level are you using for reading operations? Check if
> you are not reading from DC_B due to your Replication Factor and CL.
>
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
>
>
> Cheers,
>
> Roni Balthazar
>
> On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam  wrote:
> > I don't have problems with DC_B (replica) only in DC_A(my system write
> only
> > to it) I have read timeouts.
> >
> > I checked in OpsCenter SSTable count  and I have:
> > 1) in DC_A  same +-10% for last week, a small increase for last 24h (it
> is
> > more than 15000-2 SSTables depends on node)
> > 2) in DC_B last 24h shows up to 50% decrease, which give nice
> prognostics.
> > Now I have less then 1000 SSTables
> >
> > What did you measure during system optimizations? Or do you have an idea
> > what more should I check?
> > 1) I look at CPU Idle (one node is 50% idle, rest 70% idle)
> > 2) Disk queue -> mostly is it near zero: avg 0.09. Sometimes there are
> > spikes
> > 3) system RAM usage is almost full
> > 4) In Total Bytes Compacted most most lines are below 3MB/s. For total
> DC_A
> > it is less than 10MB/s, in DC_B it looks much better (avg is like 17MB/s)
> >
> > something else?
> >
> >
> >
> > On Wed, Feb 18, 2015 at 1:32 PM, Roni Balthazar  >
> > wrote:
> >>
> >> Hi,
> >>
> >> You can check if the number of SSTables is decreasing. Look for the
> >> "SSTable count" information of your tables using "nodetool cfstats".
> >> The compaction history can be viewed using "nodetool
> >> compactionhistory".
> >>
> >> About the timeouts, check this out:
> >>
> http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure
> >> Also try to run "nodetool tpstats" to see the threads statistics. It
> >> can lead you to know if you are having performance problems. If you
> >> are having too many pending tasks or dropped messages, maybe will you
> >> need to tune your system (eg: driver's timeout, concurrent reads and
> >> so on)
> >>
> >> Regards,
> >>
> >> Roni Balthazar
> >>
> >> On Wed, Feb 18, 2015 at 9:51 AM, Ja Sam  wrote:
> >> > Hi,
> >> > Thanks for your "tip" it looks that something changed - I still don't
> >> > know
> >> > if it is ok.
> >> >
> >> > My nodes started to do more compaction, but it looks that some
> >> > compactions
> >> > are really slow.
> >> > In IO we have idle, CPU is quite ok (30%-40%). We set
> compactionthrouput
> >> > to
> >> > 999, but I do not see difference.
> >> >
> >> > Can we check something more? Or do you have any method to monitor
> >> > progress
> >> > with small files?
> >> >
> >> > Regards
> >> >
> >> > On Tue, Feb 17, 2015 at 2:43 PM, Roni Balthazar
> >> > 
> >> > wrote:
> >> >>
> >> >> HI,
> >> >>
> >> >> Yes... I had the same issue and setting cold_reads_to_omit to 0.0 was
> >> >> the solution...
> >> >> The number of SSTables decreased from many thousands to a number
> below
> >> >> a hundred and the SSTable

Re: Many pending compactions

2015-02-18 Thread Roni Balthazar
Are you running repairs within gc_grace_seconds? (default is 10 days)
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html

Double check if you set cold_reads_to_omit to 0.0 on tables with STCS
that you do not read often.

Are you using default values for the properties
min_compaction_threshold(4) and max_compaction_threshold(32)?

Which Consistency Level are you using for reading operations? Check if
you are not reading from DC_B due to your Replication Factor and CL.
http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html


Cheers,

Roni Balthazar

On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam  wrote:
> I don't have problems with DC_B (replica) only in DC_A(my system write only
> to it) I have read timeouts.
>
> I checked in OpsCenter SSTable count  and I have:
> 1) in DC_A  same +-10% for last week, a small increase for last 24h (it is
> more than 15000-2 SSTables depends on node)
> 2) in DC_B last 24h shows up to 50% decrease, which give nice prognostics.
> Now I have less then 1000 SSTables
>
> What did you measure during system optimizations? Or do you have an idea
> what more should I check?
> 1) I look at CPU Idle (one node is 50% idle, rest 70% idle)
> 2) Disk queue -> mostly is it near zero: avg 0.09. Sometimes there are
> spikes
> 3) system RAM usage is almost full
> 4) In Total Bytes Compacted most most lines are below 3MB/s. For total DC_A
> it is less than 10MB/s, in DC_B it looks much better (avg is like 17MB/s)
>
> something else?
>
>
>
> On Wed, Feb 18, 2015 at 1:32 PM, Roni Balthazar 
> wrote:
>>
>> Hi,
>>
>> You can check if the number of SSTables is decreasing. Look for the
>> "SSTable count" information of your tables using "nodetool cfstats".
>> The compaction history can be viewed using "nodetool
>> compactionhistory".
>>
>> About the timeouts, check this out:
>> http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure
>> Also try to run "nodetool tpstats" to see the threads statistics. It
>> can lead you to know if you are having performance problems. If you
>> are having too many pending tasks or dropped messages, maybe will you
>> need to tune your system (eg: driver's timeout, concurrent reads and
>> so on)
>>
>> Regards,
>>
>> Roni Balthazar
>>
>> On Wed, Feb 18, 2015 at 9:51 AM, Ja Sam  wrote:
>> > Hi,
>> > Thanks for your "tip" it looks that something changed - I still don't
>> > know
>> > if it is ok.
>> >
>> > My nodes started to do more compaction, but it looks that some
>> > compactions
>> > are really slow.
>> > In IO we have idle, CPU is quite ok (30%-40%). We set compactionthrouput
>> > to
>> > 999, but I do not see difference.
>> >
>> > Can we check something more? Or do you have any method to monitor
>> > progress
>> > with small files?
>> >
>> > Regards
>> >
>> > On Tue, Feb 17, 2015 at 2:43 PM, Roni Balthazar
>> > 
>> > wrote:
>> >>
>> >> HI,
>> >>
>> >> Yes... I had the same issue and setting cold_reads_to_omit to 0.0 was
>> >> the solution...
>> >> The number of SSTables decreased from many thousands to a number below
>> >> a hundred and the SSTables are now much bigger with several gigabytes
>> >> (most of them).
>> >>
>> >> Cheers,
>> >>
>> >> Roni Balthazar
>> >>
>> >>
>> >>
>> >> On Tue, Feb 17, 2015 at 11:32 AM, Ja Sam  wrote:
>> >> > After some diagnostic ( we didn't set yet cold_reads_to_omit ).
>> >> > Compaction
>> >> > are running but VERY slow with "idle" IO.
>> >> >
>> >> > We had a lot of "Data files" in Cassandra. In DC_A it is about
>> >> > ~12
>> >> > (only
>> >> > xxx-Data.db) in DC_B has only ~4000.
>> >> >
>> >> > I don't know if this change anything but:
>> >> > 1) in DC_A avg size of Data.db file is ~13 mb. I have few a really
>> >> > big
>> >> > ones,
>> >> > but most is really small (almost 1 files are less then 100mb).
>> >> > 2) in DC_B avg size of Data.db is much bigger ~260mb.
>> >> >
>> >> > Do you think that above flag will help us?
>> >> >
>> >> >
>> >> > On Tue, Feb 17, 2015 at 9:04 AM, Ja Sam  wrote:
>> >> >>
>> >> >> I set setcompactionthroughput 999 permanently and it doesn't change
>> >> >> anything. IO is still same. CPU is idle.
>> >> >>
>> >> >> On Tue, Feb 17, 2015 at 1:15 AM, Roni Balthazar
>> >> >> 
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi,
>> >> >>>
>> >> >>> You can run "nodetool compactionstats" to view statistics on
>> >> >>> compactions.
>> >> >>> Setting cold_reads_to_omit to 0.0 can help to reduce the number of
>> >> >>> SSTables when you use Size-Tiered compaction.
>> >> >>> You can also create a cron job to increase the value of
>> >> >>> setcompactionthroughput during the night or when your IO is not
>> >> >>> busy.
>> >> >>>
>> >> >>> From http://wiki.apache.org/cassandra/NodeTool:
>> >> >>> 0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999
>> >> >>> 0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16
>> >> >>>
>> >> >>> Cheers,
>> >> >>>
>> >> >>> Roni Balthazar
>> >> >>>
>> >> >

Re: Many pending compactions

2015-02-18 Thread Ja Sam
I don't have problems with DC_B (replica) only in DC_A(my system write only
to it) I have read timeouts.

I checked in OpsCenter SSTable count  and I have:
1) in DC_A  same +-10% for last week, a small increase for last 24h (it is
more than 15000-2 SSTables depends on node)
2) in DC_B last 24h shows up to 50% decrease, which give nice prognostics.
Now I have less then 1000 SSTables

What did you measure during system optimizations? Or do you have an idea
what more should I check?
1) I look at CPU Idle (one node is 50% idle, rest 70% idle)
2) Disk queue -> mostly is it near zero: avg 0.09. Sometimes there are
spikes
3) system RAM usage is almost full
4) In Total Bytes Compacted most most lines are below 3MB/s. For total DC_A
it is less than 10MB/s, in DC_B it looks much better (avg is like 17MB/s)

something else?



On Wed, Feb 18, 2015 at 1:32 PM, Roni Balthazar 
wrote:

> Hi,
>
> You can check if the number of SSTables is decreasing. Look for the
> "SSTable count" information of your tables using "nodetool cfstats".
> The compaction history can be viewed using "nodetool
> compactionhistory".
>
> About the timeouts, check this out:
> http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure
> Also try to run "nodetool tpstats" to see the threads statistics. It
> can lead you to know if you are having performance problems. If you
> are having too many pending tasks or dropped messages, maybe will you
> need to tune your system (eg: driver's timeout, concurrent reads and
> so on)
>
> Regards,
>
> Roni Balthazar
>
> On Wed, Feb 18, 2015 at 9:51 AM, Ja Sam  wrote:
> > Hi,
> > Thanks for your "tip" it looks that something changed - I still don't
> know
> > if it is ok.
> >
> > My nodes started to do more compaction, but it looks that some
> compactions
> > are really slow.
> > In IO we have idle, CPU is quite ok (30%-40%). We set compactionthrouput
> to
> > 999, but I do not see difference.
> >
> > Can we check something more? Or do you have any method to monitor
> progress
> > with small files?
> >
> > Regards
> >
> > On Tue, Feb 17, 2015 at 2:43 PM, Roni Balthazar  >
> > wrote:
> >>
> >> HI,
> >>
> >> Yes... I had the same issue and setting cold_reads_to_omit to 0.0 was
> >> the solution...
> >> The number of SSTables decreased from many thousands to a number below
> >> a hundred and the SSTables are now much bigger with several gigabytes
> >> (most of them).
> >>
> >> Cheers,
> >>
> >> Roni Balthazar
> >>
> >>
> >>
> >> On Tue, Feb 17, 2015 at 11:32 AM, Ja Sam  wrote:
> >> > After some diagnostic ( we didn't set yet cold_reads_to_omit ).
> >> > Compaction
> >> > are running but VERY slow with "idle" IO.
> >> >
> >> > We had a lot of "Data files" in Cassandra. In DC_A it is about ~12
> >> > (only
> >> > xxx-Data.db) in DC_B has only ~4000.
> >> >
> >> > I don't know if this change anything but:
> >> > 1) in DC_A avg size of Data.db file is ~13 mb. I have few a really big
> >> > ones,
> >> > but most is really small (almost 1 files are less then 100mb).
> >> > 2) in DC_B avg size of Data.db is much bigger ~260mb.
> >> >
> >> > Do you think that above flag will help us?
> >> >
> >> >
> >> > On Tue, Feb 17, 2015 at 9:04 AM, Ja Sam  wrote:
> >> >>
> >> >> I set setcompactionthroughput 999 permanently and it doesn't change
> >> >> anything. IO is still same. CPU is idle.
> >> >>
> >> >> On Tue, Feb 17, 2015 at 1:15 AM, Roni Balthazar
> >> >> 
> >> >> wrote:
> >> >>>
> >> >>> Hi,
> >> >>>
> >> >>> You can run "nodetool compactionstats" to view statistics on
> >> >>> compactions.
> >> >>> Setting cold_reads_to_omit to 0.0 can help to reduce the number of
> >> >>> SSTables when you use Size-Tiered compaction.
> >> >>> You can also create a cron job to increase the value of
> >> >>> setcompactionthroughput during the night or when your IO is not
> busy.
> >> >>>
> >> >>> From http://wiki.apache.org/cassandra/NodeTool:
> >> >>> 0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999
> >> >>> 0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16
> >> >>>
> >> >>> Cheers,
> >> >>>
> >> >>> Roni Balthazar
> >> >>>
> >> >>> On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam 
> wrote:
> >> >>> > One think I do not understand. In my case compaction is running
> >> >>> > permanently.
> >> >>> > Is there a way to check which compaction is pending? The only
> >> >>> > information is
> >> >>> > about total count.
> >> >>> >
> >> >>> >
> >> >>> > On Monday, February 16, 2015, Ja Sam  wrote:
> >> >>> >>
> >> >>> >> Of couse I made a mistake. I am using 2.1.2. Anyway night build
> is
> >> >>> >> available from
> >> >>> >> http://cassci.datastax.com/job/cassandra-2.1/
> >> >>> >>
> >> >>> >> I read about cold_reads_to_omit It looks promising. Should I set
> >> >>> >> also
> >> >>> >> compaction throughput?
> >> >>> >>
> >> >>> >> p.s. I am really sad that I didn't read this before:
> >> >>> >>
> >> >>> >>
> >> >>> >>
> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>

Re: Many pending compactions

2015-02-18 Thread Roni Balthazar
Hi,

You can check if the number of SSTables is decreasing. Look for the
"SSTable count" information of your tables using "nodetool cfstats".
The compaction history can be viewed using "nodetool
compactionhistory".

About the timeouts, check this out:
http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure
Also try to run "nodetool tpstats" to see the threads statistics. It
can lead you to know if you are having performance problems. If you
are having too many pending tasks or dropped messages, maybe will you
need to tune your system (eg: driver's timeout, concurrent reads and
so on)

Regards,

Roni Balthazar

On Wed, Feb 18, 2015 at 9:51 AM, Ja Sam  wrote:
> Hi,
> Thanks for your "tip" it looks that something changed - I still don't know
> if it is ok.
>
> My nodes started to do more compaction, but it looks that some compactions
> are really slow.
> In IO we have idle, CPU is quite ok (30%-40%). We set compactionthrouput to
> 999, but I do not see difference.
>
> Can we check something more? Or do you have any method to monitor progress
> with small files?
>
> Regards
>
> On Tue, Feb 17, 2015 at 2:43 PM, Roni Balthazar 
> wrote:
>>
>> HI,
>>
>> Yes... I had the same issue and setting cold_reads_to_omit to 0.0 was
>> the solution...
>> The number of SSTables decreased from many thousands to a number below
>> a hundred and the SSTables are now much bigger with several gigabytes
>> (most of them).
>>
>> Cheers,
>>
>> Roni Balthazar
>>
>>
>>
>> On Tue, Feb 17, 2015 at 11:32 AM, Ja Sam  wrote:
>> > After some diagnostic ( we didn't set yet cold_reads_to_omit ).
>> > Compaction
>> > are running but VERY slow with "idle" IO.
>> >
>> > We had a lot of "Data files" in Cassandra. In DC_A it is about ~12
>> > (only
>> > xxx-Data.db) in DC_B has only ~4000.
>> >
>> > I don't know if this change anything but:
>> > 1) in DC_A avg size of Data.db file is ~13 mb. I have few a really big
>> > ones,
>> > but most is really small (almost 1 files are less then 100mb).
>> > 2) in DC_B avg size of Data.db is much bigger ~260mb.
>> >
>> > Do you think that above flag will help us?
>> >
>> >
>> > On Tue, Feb 17, 2015 at 9:04 AM, Ja Sam  wrote:
>> >>
>> >> I set setcompactionthroughput 999 permanently and it doesn't change
>> >> anything. IO is still same. CPU is idle.
>> >>
>> >> On Tue, Feb 17, 2015 at 1:15 AM, Roni Balthazar
>> >> 
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> You can run "nodetool compactionstats" to view statistics on
>> >>> compactions.
>> >>> Setting cold_reads_to_omit to 0.0 can help to reduce the number of
>> >>> SSTables when you use Size-Tiered compaction.
>> >>> You can also create a cron job to increase the value of
>> >>> setcompactionthroughput during the night or when your IO is not busy.
>> >>>
>> >>> From http://wiki.apache.org/cassandra/NodeTool:
>> >>> 0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999
>> >>> 0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16
>> >>>
>> >>> Cheers,
>> >>>
>> >>> Roni Balthazar
>> >>>
>> >>> On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam  wrote:
>> >>> > One think I do not understand. In my case compaction is running
>> >>> > permanently.
>> >>> > Is there a way to check which compaction is pending? The only
>> >>> > information is
>> >>> > about total count.
>> >>> >
>> >>> >
>> >>> > On Monday, February 16, 2015, Ja Sam  wrote:
>> >>> >>
>> >>> >> Of couse I made a mistake. I am using 2.1.2. Anyway night build is
>> >>> >> available from
>> >>> >> http://cassci.datastax.com/job/cassandra-2.1/
>> >>> >>
>> >>> >> I read about cold_reads_to_omit It looks promising. Should I set
>> >>> >> also
>> >>> >> compaction throughput?
>> >>> >>
>> >>> >> p.s. I am really sad that I didn't read this before:
>> >>> >>
>> >>> >>
>> >>> >> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> On Monday, February 16, 2015, Carlos Rolo  wrote:
>> >>> >>>
>> >>> >>> Hi 100% in agreement with Roland,
>> >>> >>>
>> >>> >>> 2.1.x series is a pain! I would never recommend the current 2.1.x
>> >>> >>> series
>> >>> >>> for production.
>> >>> >>>
>> >>> >>> Clocks is a pain, and check your connectivity! Also check tpstats
>> >>> >>> to
>> >>> >>> see
>> >>> >>> if your threadpools are being overrun.
>> >>> >>>
>> >>> >>> Regards,
>> >>> >>>
>> >>> >>> Carlos Juzarte Rolo
>> >>> >>> Cassandra Consultant
>> >>> >>>
>> >>> >>> Pythian - Love your data
>> >>> >>>
>> >>> >>> rolo@pythian | Twitter: cjrolo | Linkedin:
>> >>> >>> linkedin.com/in/carlosjuzarterolo
>> >>> >>> Tel: 1649
>> >>> >>> www.pythian.com
>> >>> >>>
>> >>> >>> On Mon, Feb 16, 2015 at 8:12 PM, Roland Etzenhammer
>> >>> >>>  wrote:
>> >>> 
>> >>>  Hi,
>> >>> 
>> >>>  1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested
>> >>>  by
>> >>>  Al
>> >>>  Tobey from DataStax)
>> >>>  7) minimal reads (usually none, sometimes few)
>> >>> 
>> >>>  those two points keep me 

Re: Many pending compactions

2015-02-18 Thread Ja Sam
Hi,
Thanks for your "tip" it looks that something changed - I still don't know
if it is ok.

My nodes started to do more compaction, but it looks that some compactions
are really slow.
In IO we have idle, CPU is quite ok (30%-40%). We set compactionthrouput to
999, but I do not see difference.

Can we check something more? Or do you have any method to monitor progress
with small files?

Regards

On Tue, Feb 17, 2015 at 2:43 PM, Roni Balthazar 
wrote:

> HI,
>
> Yes... I had the same issue and setting cold_reads_to_omit to 0.0 was
> the solution...
> The number of SSTables decreased from many thousands to a number below
> a hundred and the SSTables are now much bigger with several gigabytes
> (most of them).
>
> Cheers,
>
> Roni Balthazar
>
>
>
> On Tue, Feb 17, 2015 at 11:32 AM, Ja Sam  wrote:
> > After some diagnostic ( we didn't set yet cold_reads_to_omit ).
> Compaction
> > are running but VERY slow with "idle" IO.
> >
> > We had a lot of "Data files" in Cassandra. In DC_A it is about ~12
> (only
> > xxx-Data.db) in DC_B has only ~4000.
> >
> > I don't know if this change anything but:
> > 1) in DC_A avg size of Data.db file is ~13 mb. I have few a really big
> ones,
> > but most is really small (almost 1 files are less then 100mb).
> > 2) in DC_B avg size of Data.db is much bigger ~260mb.
> >
> > Do you think that above flag will help us?
> >
> >
> > On Tue, Feb 17, 2015 at 9:04 AM, Ja Sam  wrote:
> >>
> >> I set setcompactionthroughput 999 permanently and it doesn't change
> >> anything. IO is still same. CPU is idle.
> >>
> >> On Tue, Feb 17, 2015 at 1:15 AM, Roni Balthazar <
> ronibaltha...@gmail.com>
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> You can run "nodetool compactionstats" to view statistics on
> compactions.
> >>> Setting cold_reads_to_omit to 0.0 can help to reduce the number of
> >>> SSTables when you use Size-Tiered compaction.
> >>> You can also create a cron job to increase the value of
> >>> setcompactionthroughput during the night or when your IO is not busy.
> >>>
> >>> From http://wiki.apache.org/cassandra/NodeTool:
> >>> 0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999
> >>> 0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16
> >>>
> >>> Cheers,
> >>>
> >>> Roni Balthazar
> >>>
> >>> On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam  wrote:
> >>> > One think I do not understand. In my case compaction is running
> >>> > permanently.
> >>> > Is there a way to check which compaction is pending? The only
> >>> > information is
> >>> > about total count.
> >>> >
> >>> >
> >>> > On Monday, February 16, 2015, Ja Sam  wrote:
> >>> >>
> >>> >> Of couse I made a mistake. I am using 2.1.2. Anyway night build is
> >>> >> available from
> >>> >> http://cassci.datastax.com/job/cassandra-2.1/
> >>> >>
> >>> >> I read about cold_reads_to_omit It looks promising. Should I set
> also
> >>> >> compaction throughput?
> >>> >>
> >>> >> p.s. I am really sad that I didn't read this before:
> >>> >>
> >>> >>
> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
> >>> >>
> >>> >>
> >>> >>
> >>> >> On Monday, February 16, 2015, Carlos Rolo  wrote:
> >>> >>>
> >>> >>> Hi 100% in agreement with Roland,
> >>> >>>
> >>> >>> 2.1.x series is a pain! I would never recommend the current 2.1.x
> >>> >>> series
> >>> >>> for production.
> >>> >>>
> >>> >>> Clocks is a pain, and check your connectivity! Also check tpstats
> to
> >>> >>> see
> >>> >>> if your threadpools are being overrun.
> >>> >>>
> >>> >>> Regards,
> >>> >>>
> >>> >>> Carlos Juzarte Rolo
> >>> >>> Cassandra Consultant
> >>> >>>
> >>> >>> Pythian - Love your data
> >>> >>>
> >>> >>> rolo@pythian | Twitter: cjrolo | Linkedin:
> >>> >>> linkedin.com/in/carlosjuzarterolo
> >>> >>> Tel: 1649
> >>> >>> www.pythian.com
> >>> >>>
> >>> >>> On Mon, Feb 16, 2015 at 8:12 PM, Roland Etzenhammer
> >>> >>>  wrote:
> >>> 
> >>>  Hi,
> >>> 
> >>>  1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested
> by
> >>>  Al
> >>>  Tobey from DataStax)
> >>>  7) minimal reads (usually none, sometimes few)
> >>> 
> >>>  those two points keep me repeating an anwser I got. First where
> did
> >>>  you
> >>>  get 2.1.3 from? Maybe I missed it, I will have a look. But if it
> is
> >>>  2.1.2
> >>>  whis is the latest released version, that version has many bugs -
> >>>  most of
> >>>  them I got kicked by while testing 2.1.2. I got many problems with
> >>>  compactions not beeing triggred on column families not beeing
> read,
> >>>  compactions and repairs not beeing completed.  See
> >>> 
> >>> 
> >>> 
> >>> 
> https://www.mail-archive.com/search?l=user@cassandra.apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%22&o=newest&f=1
> >>> 
> >>> 
> https://www.mail-archive.com/user%40cassandra.apache.org/msg40768.html
> >>> 
> >>>  Apart from that, how are those both datacenters connected? Maybe
> >>>  there
> >

Re: Many pending compactions

2015-02-17 Thread Roni Balthazar
HI,

Yes... I had the same issue and setting cold_reads_to_omit to 0.0 was
the solution...
The number of SSTables decreased from many thousands to a number below
a hundred and the SSTables are now much bigger with several gigabytes
(most of them).

Cheers,

Roni Balthazar



On Tue, Feb 17, 2015 at 11:32 AM, Ja Sam  wrote:
> After some diagnostic ( we didn't set yet cold_reads_to_omit ). Compaction
> are running but VERY slow with "idle" IO.
>
> We had a lot of "Data files" in Cassandra. In DC_A it is about ~12 (only
> xxx-Data.db) in DC_B has only ~4000.
>
> I don't know if this change anything but:
> 1) in DC_A avg size of Data.db file is ~13 mb. I have few a really big ones,
> but most is really small (almost 1 files are less then 100mb).
> 2) in DC_B avg size of Data.db is much bigger ~260mb.
>
> Do you think that above flag will help us?
>
>
> On Tue, Feb 17, 2015 at 9:04 AM, Ja Sam  wrote:
>>
>> I set setcompactionthroughput 999 permanently and it doesn't change
>> anything. IO is still same. CPU is idle.
>>
>> On Tue, Feb 17, 2015 at 1:15 AM, Roni Balthazar 
>> wrote:
>>>
>>> Hi,
>>>
>>> You can run "nodetool compactionstats" to view statistics on compactions.
>>> Setting cold_reads_to_omit to 0.0 can help to reduce the number of
>>> SSTables when you use Size-Tiered compaction.
>>> You can also create a cron job to increase the value of
>>> setcompactionthroughput during the night or when your IO is not busy.
>>>
>>> From http://wiki.apache.org/cassandra/NodeTool:
>>> 0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999
>>> 0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16
>>>
>>> Cheers,
>>>
>>> Roni Balthazar
>>>
>>> On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam  wrote:
>>> > One think I do not understand. In my case compaction is running
>>> > permanently.
>>> > Is there a way to check which compaction is pending? The only
>>> > information is
>>> > about total count.
>>> >
>>> >
>>> > On Monday, February 16, 2015, Ja Sam  wrote:
>>> >>
>>> >> Of couse I made a mistake. I am using 2.1.2. Anyway night build is
>>> >> available from
>>> >> http://cassci.datastax.com/job/cassandra-2.1/
>>> >>
>>> >> I read about cold_reads_to_omit It looks promising. Should I set also
>>> >> compaction throughput?
>>> >>
>>> >> p.s. I am really sad that I didn't read this before:
>>> >>
>>> >> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>>> >>
>>> >>
>>> >>
>>> >> On Monday, February 16, 2015, Carlos Rolo  wrote:
>>> >>>
>>> >>> Hi 100% in agreement with Roland,
>>> >>>
>>> >>> 2.1.x series is a pain! I would never recommend the current 2.1.x
>>> >>> series
>>> >>> for production.
>>> >>>
>>> >>> Clocks is a pain, and check your connectivity! Also check tpstats to
>>> >>> see
>>> >>> if your threadpools are being overrun.
>>> >>>
>>> >>> Regards,
>>> >>>
>>> >>> Carlos Juzarte Rolo
>>> >>> Cassandra Consultant
>>> >>>
>>> >>> Pythian - Love your data
>>> >>>
>>> >>> rolo@pythian | Twitter: cjrolo | Linkedin:
>>> >>> linkedin.com/in/carlosjuzarterolo
>>> >>> Tel: 1649
>>> >>> www.pythian.com
>>> >>>
>>> >>> On Mon, Feb 16, 2015 at 8:12 PM, Roland Etzenhammer
>>> >>>  wrote:
>>> 
>>>  Hi,
>>> 
>>>  1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested by
>>>  Al
>>>  Tobey from DataStax)
>>>  7) minimal reads (usually none, sometimes few)
>>> 
>>>  those two points keep me repeating an anwser I got. First where did
>>>  you
>>>  get 2.1.3 from? Maybe I missed it, I will have a look. But if it is
>>>  2.1.2
>>>  whis is the latest released version, that version has many bugs -
>>>  most of
>>>  them I got kicked by while testing 2.1.2. I got many problems with
>>>  compactions not beeing triggred on column families not beeing read,
>>>  compactions and repairs not beeing completed.  See
>>> 
>>> 
>>> 
>>>  https://www.mail-archive.com/search?l=user@cassandra.apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%22&o=newest&f=1
>>> 
>>>  https://www.mail-archive.com/user%40cassandra.apache.org/msg40768.html
>>> 
>>>  Apart from that, how are those both datacenters connected? Maybe
>>>  there
>>>  is a bottleneck.
>>> 
>>>  Also do you have ntp up and running on all nodes to keep all clocks
>>>  in
>>>  thight sync?
>>> 
>>>  Note: I'm no expert (yet) - just sharing my 2 cents.
>>> 
>>>  Cheers,
>>>  Roland
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>>
>>> >>>
>>> >>>
>>> >
>>
>>
>


Re: Many pending compactions

2015-02-17 Thread Ja Sam
After some diagnostic ( we didn't set yet cold_reads_to_omit ). Compaction
are running but VERY slow with "idle" IO.

We had a lot of "Data files" in Cassandra. In DC_A it is about ~12
(only xxx-Data.db) in DC_B has only ~4000.

I don't know if this change anything but:
1) in DC_A avg size of Data.db file is ~13 mb. I have few a really big
ones, but most is really small (almost 1 files are less then 100mb).
2) in DC_B avg size of Data.db is much bigger ~260mb.

Do you think that above flag will help us?


On Tue, Feb 17, 2015 at 9:04 AM, Ja Sam  wrote:

> I set setcompactionthroughput 999 permanently and it doesn't change
> anything. IO is still same. CPU is idle.
>
> On Tue, Feb 17, 2015 at 1:15 AM, Roni Balthazar 
> wrote:
>
>> Hi,
>>
>> You can run "nodetool compactionstats" to view statistics on compactions.
>> Setting cold_reads_to_omit to 0.0 can help to reduce the number of
>> SSTables when you use Size-Tiered compaction.
>> You can also create a cron job to increase the value of
>> setcompactionthroughput during the night or when your IO is not busy.
>>
>> From http://wiki.apache.org/cassandra/NodeTool:
>> 0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999
>> 0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16
>>
>> Cheers,
>>
>> Roni Balthazar
>>
>> On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam  wrote:
>> > One think I do not understand. In my case compaction is running
>> permanently.
>> > Is there a way to check which compaction is pending? The only
>> information is
>> > about total count.
>> >
>> >
>> > On Monday, February 16, 2015, Ja Sam  wrote:
>> >>
>> >> Of couse I made a mistake. I am using 2.1.2. Anyway night build is
>> >> available from
>> >> http://cassci.datastax.com/job/cassandra-2.1/
>> >>
>> >> I read about cold_reads_to_omit It looks promising. Should I set also
>> >> compaction throughput?
>> >>
>> >> p.s. I am really sad that I didn't read this before:
>> >>
>> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>> >>
>> >>
>> >>
>> >> On Monday, February 16, 2015, Carlos Rolo  wrote:
>> >>>
>> >>> Hi 100% in agreement with Roland,
>> >>>
>> >>> 2.1.x series is a pain! I would never recommend the current 2.1.x
>> series
>> >>> for production.
>> >>>
>> >>> Clocks is a pain, and check your connectivity! Also check tpstats to
>> see
>> >>> if your threadpools are being overrun.
>> >>>
>> >>> Regards,
>> >>>
>> >>> Carlos Juzarte Rolo
>> >>> Cassandra Consultant
>> >>>
>> >>> Pythian - Love your data
>> >>>
>> >>> rolo@pythian | Twitter: cjrolo | Linkedin:
>> >>> linkedin.com/in/carlosjuzarterolo
>> >>> Tel: 1649
>> >>> www.pythian.com
>> >>>
>> >>> On Mon, Feb 16, 2015 at 8:12 PM, Roland Etzenhammer
>> >>>  wrote:
>> 
>>  Hi,
>> 
>>  1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested by
>> Al
>>  Tobey from DataStax)
>>  7) minimal reads (usually none, sometimes few)
>> 
>>  those two points keep me repeating an anwser I got. First where did
>> you
>>  get 2.1.3 from? Maybe I missed it, I will have a look. But if it is
>> 2.1.2
>>  whis is the latest released version, that version has many bugs -
>> most of
>>  them I got kicked by while testing 2.1.2. I got many problems with
>>  compactions not beeing triggred on column families not beeing read,
>>  compactions and repairs not beeing completed.  See
>> 
>> 
>> 
>> https://www.mail-archive.com/search?l=user@cassandra.apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%22&o=newest&f=1
>> 
>> https://www.mail-archive.com/user%40cassandra.apache.org/msg40768.html
>> 
>>  Apart from that, how are those both datacenters connected? Maybe
>> there
>>  is a bottleneck.
>> 
>>  Also do you have ntp up and running on all nodes to keep all clocks
>> in
>>  thight sync?
>> 
>>  Note: I'm no expert (yet) - just sharing my 2 cents.
>> 
>>  Cheers,
>>  Roland
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>>
>> >>>
>> >>>
>> >
>>
>
>


Re: Many pending compactions

2015-02-17 Thread Ja Sam
I set setcompactionthroughput 999 permanently and it doesn't change
anything. IO is still same. CPU is idle.

On Tue, Feb 17, 2015 at 1:15 AM, Roni Balthazar 
wrote:

> Hi,
>
> You can run "nodetool compactionstats" to view statistics on compactions.
> Setting cold_reads_to_omit to 0.0 can help to reduce the number of
> SSTables when you use Size-Tiered compaction.
> You can also create a cron job to increase the value of
> setcompactionthroughput during the night or when your IO is not busy.
>
> From http://wiki.apache.org/cassandra/NodeTool:
> 0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999
> 0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16
>
> Cheers,
>
> Roni Balthazar
>
> On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam  wrote:
> > One think I do not understand. In my case compaction is running
> permanently.
> > Is there a way to check which compaction is pending? The only
> information is
> > about total count.
> >
> >
> > On Monday, February 16, 2015, Ja Sam  wrote:
> >>
> >> Of couse I made a mistake. I am using 2.1.2. Anyway night build is
> >> available from
> >> http://cassci.datastax.com/job/cassandra-2.1/
> >>
> >> I read about cold_reads_to_omit It looks promising. Should I set also
> >> compaction throughput?
> >>
> >> p.s. I am really sad that I didn't read this before:
> >>
> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
> >>
> >>
> >>
> >> On Monday, February 16, 2015, Carlos Rolo  wrote:
> >>>
> >>> Hi 100% in agreement with Roland,
> >>>
> >>> 2.1.x series is a pain! I would never recommend the current 2.1.x
> series
> >>> for production.
> >>>
> >>> Clocks is a pain, and check your connectivity! Also check tpstats to
> see
> >>> if your threadpools are being overrun.
> >>>
> >>> Regards,
> >>>
> >>> Carlos Juzarte Rolo
> >>> Cassandra Consultant
> >>>
> >>> Pythian - Love your data
> >>>
> >>> rolo@pythian | Twitter: cjrolo | Linkedin:
> >>> linkedin.com/in/carlosjuzarterolo
> >>> Tel: 1649
> >>> www.pythian.com
> >>>
> >>> On Mon, Feb 16, 2015 at 8:12 PM, Roland Etzenhammer
> >>>  wrote:
> 
>  Hi,
> 
>  1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested by Al
>  Tobey from DataStax)
>  7) minimal reads (usually none, sometimes few)
> 
>  those two points keep me repeating an anwser I got. First where did
> you
>  get 2.1.3 from? Maybe I missed it, I will have a look. But if it is
> 2.1.2
>  whis is the latest released version, that version has many bugs -
> most of
>  them I got kicked by while testing 2.1.2. I got many problems with
>  compactions not beeing triggred on column families not beeing read,
>  compactions and repairs not beeing completed.  See
> 
> 
> 
> https://www.mail-archive.com/search?l=user@cassandra.apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%22&o=newest&f=1
> 
> https://www.mail-archive.com/user%40cassandra.apache.org/msg40768.html
> 
>  Apart from that, how are those both datacenters connected? Maybe there
>  is a bottleneck.
> 
>  Also do you have ntp up and running on all nodes to keep all clocks in
>  thight sync?
> 
>  Note: I'm no expert (yet) - just sharing my 2 cents.
> 
>  Cheers,
>  Roland
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>>
> >>>
> >
>


Re: Many pending compactions

2015-02-16 Thread Roni Balthazar
Hi,

You can run "nodetool compactionstats" to view statistics on compactions.
Setting cold_reads_to_omit to 0.0 can help to reduce the number of
SSTables when you use Size-Tiered compaction.
You can also create a cron job to increase the value of
setcompactionthroughput during the night or when your IO is not busy.

>From http://wiki.apache.org/cassandra/NodeTool:
0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999
0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16

Cheers,

Roni Balthazar

On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam  wrote:
> One think I do not understand. In my case compaction is running permanently.
> Is there a way to check which compaction is pending? The only information is
> about total count.
>
>
> On Monday, February 16, 2015, Ja Sam  wrote:
>>
>> Of couse I made a mistake. I am using 2.1.2. Anyway night build is
>> available from
>> http://cassci.datastax.com/job/cassandra-2.1/
>>
>> I read about cold_reads_to_omit It looks promising. Should I set also
>> compaction throughput?
>>
>> p.s. I am really sad that I didn't read this before:
>> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>>
>>
>>
>> On Monday, February 16, 2015, Carlos Rolo  wrote:
>>>
>>> Hi 100% in agreement with Roland,
>>>
>>> 2.1.x series is a pain! I would never recommend the current 2.1.x series
>>> for production.
>>>
>>> Clocks is a pain, and check your connectivity! Also check tpstats to see
>>> if your threadpools are being overrun.
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: cjrolo | Linkedin:
>>> linkedin.com/in/carlosjuzarterolo
>>> Tel: 1649
>>> www.pythian.com
>>>
>>> On Mon, Feb 16, 2015 at 8:12 PM, Roland Etzenhammer
>>>  wrote:

 Hi,

 1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested by Al
 Tobey from DataStax)
 7) minimal reads (usually none, sometimes few)

 those two points keep me repeating an anwser I got. First where did you
 get 2.1.3 from? Maybe I missed it, I will have a look. But if it is 2.1.2
 whis is the latest released version, that version has many bugs - most of
 them I got kicked by while testing 2.1.2. I got many problems with
 compactions not beeing triggred on column families not beeing read,
 compactions and repairs not beeing completed.  See


 https://www.mail-archive.com/search?l=user@cassandra.apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%22&o=newest&f=1
 https://www.mail-archive.com/user%40cassandra.apache.org/msg40768.html

 Apart from that, how are those both datacenters connected? Maybe there
 is a bottleneck.

 Also do you have ntp up and running on all nodes to keep all clocks in
 thight sync?

 Note: I'm no expert (yet) - just sharing my 2 cents.

 Cheers,
 Roland
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>


Re: Many pending compactions

2015-02-16 Thread Ja Sam
One think I do not understand. In my case compaction is running
permanently. Is there a way to check which compaction is pending? The only
information is about total count.

On Monday, February 16, 2015, Ja Sam  wrote:

> Of couse I made a mistake. I am using 2.1.2. Anyway night build is
> available from
> http://cassci.datastax.com/job/cassandra-2.1/
>
> I read about cold_reads_to_omit It looks promising. Should I set also
> compaction throughput?
>
> p.s. I am really sad that I didn't read this before:
> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>
>
>
> On Monday, February 16, 2015, Carlos Rolo  wrote:
>
>> Hi 100% in agreement with Roland,
>>
>> 2.1.x series is a pain! I would never recommend the current 2.1.x series
>> for production.
>>
>> Clocks is a pain, and check your connectivity! Also check tpstats to see
>> if your threadpools are being overrun.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>> *
>> Tel: 1649
>> www.pythian.com
>>
>> On Mon, Feb 16, 2015 at 8:12 PM, Roland Etzenhammer <
>> r.etzenham...@t-online.de> wrote:
>>
>>> Hi,
>>>
>>> 1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested by Al
>>> Tobey from DataStax)
>>> 7) minimal reads (usually none, sometimes few)
>>>
>>> those two points keep me repeating an anwser I got. First where did you
>>> get 2.1.3 from? Maybe I missed it, I will have a look. But if it is 2.1.2
>>> whis is the latest released version, that version has many bugs - most of
>>> them I got kicked by while testing 2.1.2. I got many problems with
>>> compactions not beeing triggred on column families not beeing read,
>>> compactions and repairs not beeing completed.  See
>>>
>>> https://www.mail-archive.com/search?l=user@cassandra.
>>> apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%
>>> 22&o=newest&f=1
>>> https://www.mail-archive.com/user%40cassandra.apache.org/msg40768.html
>>>
>>> Apart from that, how are those both datacenters connected? Maybe there
>>> is a bottleneck.
>>>
>>> Also do you have ntp up and running on all nodes to keep all clocks in
>>> thight sync?
>>>
>>> Note: I'm no expert (yet) - just sharing my 2 cents.
>>>
>>> Cheers,
>>> Roland
>>>
>>
>>
>> --
>>
>>
>>
>>


Many pending compactions

2015-02-16 Thread Ja Sam
Of couse I made a mistake. I am using 2.1.2. Anyway night build is
available from
http://cassci.datastax.com/job/cassandra-2.1/

I read about cold_reads_to_omit It looks promising. Should I set also
compaction throughput?

p.s. I am really sad that I didn't read this before:
https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/



On Monday, February 16, 2015, Carlos Rolo > wrote:

> Hi 100% in agreement with Roland,
>
> 2.1.x series is a pain! I would never recommend the current 2.1.x series
> for production.
>
> Clocks is a pain, and check your connectivity! Also check tpstats to see
> if your threadpools are being overrun.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
> *
> Tel: 1649
> www.pythian.com
>
> On Mon, Feb 16, 2015 at 8:12 PM, Roland Etzenhammer <
> r.etzenham...@t-online.de> wrote:
>
>> Hi,
>>
>> 1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested by Al
>> Tobey from DataStax)
>> 7) minimal reads (usually none, sometimes few)
>>
>> those two points keep me repeating an anwser I got. First where did you
>> get 2.1.3 from? Maybe I missed it, I will have a look. But if it is 2.1.2
>> whis is the latest released version, that version has many bugs - most of
>> them I got kicked by while testing 2.1.2. I got many problems with
>> compactions not beeing triggred on column families not beeing read,
>> compactions and repairs not beeing completed.  See
>>
>> https://www.mail-archive.com/search?l=user@cassandra.
>> apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%
>> 22&o=newest&f=1
>> https://www.mail-archive.com/user%40cassandra.apache.org/msg40768.html
>>
>> Apart from that, how are those both datacenters connected? Maybe there is
>> a bottleneck.
>>
>> Also do you have ntp up and running on all nodes to keep all clocks in
>> thight sync?
>>
>> Note: I'm no expert (yet) - just sharing my 2 cents.
>>
>> Cheers,
>> Roland
>>
>
>
> --
>
>
>
>


Re: Many pending compactions

2015-02-16 Thread Carlos Rolo
Hi 100% in agreement with Roland,

2.1.x series is a pain! I would never recommend the current 2.1.x series
for production.

Clocks is a pain, and check your connectivity! Also check tpstats to see if
your threadpools are being overrun.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
*
Tel: 1649
www.pythian.com

On Mon, Feb 16, 2015 at 8:12 PM, Roland Etzenhammer <
r.etzenham...@t-online.de> wrote:

> Hi,
>
> 1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested by Al
> Tobey from DataStax)
> 7) minimal reads (usually none, sometimes few)
>
> those two points keep me repeating an anwser I got. First where did you
> get 2.1.3 from? Maybe I missed it, I will have a look. But if it is 2.1.2
> whis is the latest released version, that version has many bugs - most of
> them I got kicked by while testing 2.1.2. I got many problems with
> compactions not beeing triggred on column families not beeing read,
> compactions and repairs not beeing completed.  See
>
> https://www.mail-archive.com/search?l=user@cassandra.
> apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%
> 22&o=newest&f=1
> https://www.mail-archive.com/user%40cassandra.apache.org/msg40768.html
>
> Apart from that, how are those both datacenters connected? Maybe there is
> a bottleneck.
>
> Also do you have ntp up and running on all nodes to keep all clocks in
> thight sync?
>
> Note: I'm no expert (yet) - just sharing my 2 cents.
>
> Cheers,
> Roland
>

-- 


--





Re: Many pending compactions

2015-02-16 Thread Roland Etzenhammer

Hi,

1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested by Al 
Tobey from DataStax)

7) minimal reads (usually none, sometimes few)

those two points keep me repeating an anwser I got. First where did you 
get 2.1.3 from? Maybe I missed it, I will have a look. But if it is 
2.1.2 whis is the latest released version, that version has many bugs - 
most of them I got kicked by while testing 2.1.2. I got many problems 
with compactions not beeing triggred on column families not beeing read, 
compactions and repairs not beeing completed.  See


https://www.mail-archive.com/search?l=user@cassandra.apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%22&o=newest&f=1
https://www.mail-archive.com/user%40cassandra.apache.org/msg40768.html

Apart from that, how are those both datacenters connected? Maybe there 
is a bottleneck.


Also do you have ntp up and running on all nodes to keep all clocks in 
thight sync?


Note: I'm no expert (yet) - just sharing my 2 cents.

Cheers,
Roland


Many pending compactions

2015-02-16 Thread Ja Sam
*Environment*
1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested by Al
Tobey from DataStax)
2) not using vnodes
3)Two data centres: 5 nodes in one DC (DC_A), 4 nodes in second DC (DC_B)
4) each node is set up on a physical box with two 16-Core HT Xeon
processors (E5-2660), 64GB RAM and 10x2TB 7.2K SAS disks (one for
commitlog, nine for Cassandra data file directories), 1Gbps network. No
RAID, only JBOD.
5) 3500 writes per seconds, I write only to DC_A with local_quorum with
RF=5 in the local DC_A on our largest CF’s.
6) acceptable write times (usually a few ms unless we encounter some
problem within the cluster)
7) minimal reads (usually none, sometimes few)
8) iostat looks like ok ->
http://serverfault.com/questions/666136/interpreting-disk-stats-using-sar
9) We use SizeTired compaction. We convert to it from LevelTired


*Problems*
Nowadays we see two main problems:
1) In DC_A we have a rally lot of pending compactions (400-700 depending on
node). In DC_B everything is fine (10 is short term maximum, usually is
less then 3). The pending compaction does not change in long term.
2) In DC_A reads usually has timeout exception. In DC_B is fast and works
without problems.

*The question*
Is there a way how can I diagnose what is wrong with my servers? I
understand that DC_A is doing much more work than DC_B, but tested much
bigger load on test machine for few days and everything was fine.