Re: Overload because of hint pressure + MVs

2020-02-10 Thread Erick Ramirez
>
> Currently the value of phi_convict_threshold is not set which makes it to
> 8 (default) .
> Can this also cause hints buildup even when we can see that all nodes are
> UP ?


You can bump it up to 12 to reduce the sensitivity but it's likely GC
pauses causing it. Phi convict is the side-effect, not the cause.

Just to add , we are using 24GB heap size.


Are you using CMS? If using G1, I'd recommend bumping it up to 31GB if the
servers have 40+ GB of RAM. Cheers!


Re: What is "will be anticompacted on range" ?

2020-02-10 Thread Alexander Dejanovski
Hi,

Full repair triggers anticompaction as well.
Only subrange repair doesn't trigger anticompaction, and in 4.0, AFAIK,
full repairs won't involve anticompaction anymore.

Cheers,

Le lun. 10 févr. 2020 à 19:17, Krish Donald  a écrit :

> Thanks Jeff, But we are running repair using below command , how do we
> know if incremental repair is enabled?
>
> repair -full -pr
>
> Thanks
> KD
>
> On Mon, Feb 10, 2020 at 10:09 AM Jeff Jirsa  wrote:
>
>> Incremental repair is splitting the data it repaired from the data it
>> didnt repair so it can mark the repaired data with a repairedAt timestamp
>> annotation on the data file / sstable.
>>
>>
>> On Mon, Feb 10, 2020 at 9:39 AM Krish Donald 
>> wrote:
>>
>>> Hi,
>>>
>>> I noticed few messages in system.log like below:
>>> INFO  [CompactionExecutor:21] 2020-02-08 17:56:16,998
>>> CompactionManager.java:677 - [repair #fb044b01-4ab5-11ea-a736-a367dba4ed71]
>>> SSTable BigTableReader(path='xyz/mc-79976-big-Data.db')
>>> ((-8828745000913291684,8954981413747359495]) will be anticompacted on range
>>> (1298637302462891853,1299655718091763872]
>>>
>>> And compactionstats was showing below .
>>> id   compaction type
>>> keyspace table   completedtotalunit  progress
>>> 82ee9720-3c86-11ea-adda-b11edeb80235 Anticompaction after repair
>>> customer profile 182882813624 196589990177 bytes 93.03%
>>>
>>> We are on 3.11.
>>>
>>> What is the meaning of this compaction type  "nticompaction after repair
>>> "?
>>> Havent noticed this in 2.x version
>>>
>>> Thanks
>>> KD
>>>
>>>


Re: What is "will be anticompacted on range" ?

2020-02-10 Thread Krish Donald
Thanks Jeff, But we are running repair using below command , how do we know
if incremental repair is enabled?

repair -full -pr

Thanks
KD

On Mon, Feb 10, 2020 at 10:09 AM Jeff Jirsa  wrote:

> Incremental repair is splitting the data it repaired from the data it
> didnt repair so it can mark the repaired data with a repairedAt timestamp
> annotation on the data file / sstable.
>
>
> On Mon, Feb 10, 2020 at 9:39 AM Krish Donald  wrote:
>
>> Hi,
>>
>> I noticed few messages in system.log like below:
>> INFO  [CompactionExecutor:21] 2020-02-08 17:56:16,998
>> CompactionManager.java:677 - [repair #fb044b01-4ab5-11ea-a736-a367dba4ed71]
>> SSTable BigTableReader(path='xyz/mc-79976-big-Data.db')
>> ((-8828745000913291684,8954981413747359495]) will be anticompacted on range
>> (1298637302462891853,1299655718091763872]
>>
>> And compactionstats was showing below .
>> id   compaction type keyspace
>> table   completedtotalunit  progress
>> 82ee9720-3c86-11ea-adda-b11edeb80235 Anticompaction after repair
>> customer profile 182882813624 196589990177 bytes 93.03%
>>
>> We are on 3.11.
>>
>> What is the meaning of this compaction type  "nticompaction after repair
>> "?
>> Havent noticed this in 2.x version
>>
>> Thanks
>> KD
>>
>>


Re: What is "will be anticompacted on range" ?

2020-02-10 Thread Jeff Jirsa
Incremental repair is splitting the data it repaired from the data it didnt
repair so it can mark the repaired data with a repairedAt timestamp
annotation on the data file / sstable.


On Mon, Feb 10, 2020 at 9:39 AM Krish Donald  wrote:

> Hi,
>
> I noticed few messages in system.log like below:
> INFO  [CompactionExecutor:21] 2020-02-08 17:56:16,998
> CompactionManager.java:677 - [repair #fb044b01-4ab5-11ea-a736-a367dba4ed71]
> SSTable BigTableReader(path='xyz/mc-79976-big-Data.db')
> ((-8828745000913291684,8954981413747359495]) will be anticompacted on range
> (1298637302462891853,1299655718091763872]
>
> And compactionstats was showing below .
> id   compaction type keyspace
> table   completedtotalunit  progress
> 82ee9720-3c86-11ea-adda-b11edeb80235 Anticompaction after repair customer
>profile 182882813624 196589990177 bytes 93.03%
>
> We are on 3.11.
>
> What is the meaning of this compaction type  "nticompaction after repair "?
> Havent noticed this in 2.x version
>
> Thanks
> KD
>
>


What is "will be anticompacted on range" ?

2020-02-10 Thread Krish Donald
Hi,

I noticed few messages in system.log like below:
INFO  [CompactionExecutor:21] 2020-02-08 17:56:16,998
CompactionManager.java:677 - [repair #fb044b01-4ab5-11ea-a736-a367dba4ed71]
SSTable BigTableReader(path='xyz/mc-79976-big-Data.db')
((-8828745000913291684,8954981413747359495]) will be anticompacted on range
(1298637302462891853,1299655718091763872]

And compactionstats was showing below .
id   compaction type keyspace
table   completedtotalunit  progress
82ee9720-3c86-11ea-adda-b11edeb80235 Anticompaction after repair customer
   profile 182882813624 196589990177 bytes 93.03%

We are on 3.11.

What is the meaning of this compaction type  "nticompaction after repair "?
Havent noticed this in 2.x version

Thanks
KD


Re: Overload because of hint pressure + MVs

2020-02-10 Thread Surbhi Gupta
Just to add , we are using 24GB heap size.

On Mon, 10 Feb 2020 at 09:08, Surbhi Gupta  wrote:

> Hi Jon,
>
> We are on multi datacenter(On Prim) setup.
> We also noticed too many messages like below:
>
> DEBUG [GossipStage:1] 2020-02-10 09:38:52,953 FailureDetector.java:457 -
> Ignoring interval time of 3258125997 for /10.x.x.x
>
> DEBUG [GossipStage:1] 2020-02-10 09:38:52,954 FailureDetector.java:457 -
> Ignoring interval time of 2045630029 for /10.y.y.y
>
> DEBUG [GossipStage:1] 2020-02-10 09:38:52,954 FailureDetector.java:457 -
> Ignoring interval time of 2045416737 for /10.z.z.z
>
>
>
> Currently the value of phi_convict_threshold is not set which makes it to
> 8 (default) .
> Can this also cause hints buildup even when we can see that all nodes are
> UP ?
> Recommended value of phi_convict_threshold  is 12 in AWS multi datacenter
> environment.
>
> Thanks
> Surbhi
>
> On Sun, 9 Feb 2020 at 21:42, Surbhi Gupta 
> wrote:
>
>> Thanks a lot Jon..
>> Will try the recommendations and let you know the results
>>
>> On Fri, Feb 7, 2020 at 10:52 AM Jon Haddad  wrote:
>>
>>> There's a few things you can do here that might help.
>>>
>>> First off, if you're using the default heap settings, that's a serious
>>> problem.  If you've got the head room, my recommendation is to use 16GB
>>> heap with 12 GB new gen and pin your memtable heap space to 2GB.  Set your
>>> max tenuring threshold to 6 and your survivor ratio to 6.  You don't need a
>>> lot of old gen space with cassandra, almost everything that will show up
>>> there is memtable related, and we allocate a *lot* whenever we read data
>>> off disk.
>>>
>>> Most folks use the default disk read ahead setting of 128KB.  You can
>>> check this setting using blockdev --report, under the RA column.  You'll
>>> see 256 there, that's in 512 byte sectors.  MVs rely on a read before a
>>> write, so for every read off disk you do, you'll pull additional 128KB into
>>> your page cache.  This is usually a waste and puts WAY too much pressure on
>>> your disk.  On SSD, I always change this to 4KB.
>>>
>>> Next, be sure you're setting your compression rate accordingly.  I wrote
>>> a long post on the topic here:
>>> https://thelastpickle.com/blog/2018/08/08/compression_performance.html.
>>> Our default compression is very unfriendly for read heavy workloads if
>>> you're reading small rows.  If your records are small, 4KB compression
>>> chunk length is your friend.
>>>
>>> I have some slides showing pretty good performance improvements from the
>>> above 2 changes.  Specifically, I went from 16K reads a second at 180ms p99
>>> latency up to 63K reads / second at 21ms p99.  Disk usage dropped by a
>>> factor of 10.  Throw in those JVM changes I recommended and things should
>>> improve even further.
>>>
>>> Generally speaking, I recommend avoiding MVs, as they can be a giant
>>> mine if you aren't careful.  They're not doing any magic behind the scenes
>>> that makes scaling easier, and in a lot of cases they're a hinderance.  You
>>> still need to understand the underlying data and how it's laid out to use
>>> them properly, which is 99% of the work.
>>>
>>> Jon
>>>
>>> On Fri, Feb 7, 2020 at 10:32 AM Michael Shuler 
>>> wrote:
>>>
 That JIRA still says Open, so no, it has not been fixed (unless there's
 a fixed duplicate in JIRA somewhere).

 For clarification, you could update that ticket with a comment
 including
 your environmental details, usage of MV, etc. I'll bump the priority up
 and include some possible branchX fixvers.

 Michael

 On 2/7/20 10:53 AM, Surbhi Gupta wrote:
 > Hi,
 >
 > We are getting hit by the below bug.
 > Other than lowering hinted_handoff_throttle_in_kb to 100 any other
 work
 > around ?
 >
 > https://issues.apache.org/jira/browse/CASSANDRA-13810
 >
 > Any idea if it got fixed in later version.
 > We are on Open source Cassandra 3.11.1  .
 >
 > Thanks
 > Surbhi
 >
 >

 -
 To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: user-h...@cassandra.apache.org




Re: Overload because of hint pressure + MVs

2020-02-10 Thread Surbhi Gupta
Hi Jon,

We are on multi datacenter(On Prim) setup.
We also noticed too many messages like below:

DEBUG [GossipStage:1] 2020-02-10 09:38:52,953 FailureDetector.java:457 -
Ignoring interval time of 3258125997 for /10.x.x.x

DEBUG [GossipStage:1] 2020-02-10 09:38:52,954 FailureDetector.java:457 -
Ignoring interval time of 2045630029 for /10.y.y.y

DEBUG [GossipStage:1] 2020-02-10 09:38:52,954 FailureDetector.java:457 -
Ignoring interval time of 2045416737 for /10.z.z.z



Currently the value of phi_convict_threshold is not set which makes it to 8
(default) .
Can this also cause hints buildup even when we can see that all nodes are
UP ?
Recommended value of phi_convict_threshold  is 12 in AWS multi datacenter
environment.

Thanks
Surbhi

On Sun, 9 Feb 2020 at 21:42, Surbhi Gupta  wrote:

> Thanks a lot Jon..
> Will try the recommendations and let you know the results
>
> On Fri, Feb 7, 2020 at 10:52 AM Jon Haddad  wrote:
>
>> There's a few things you can do here that might help.
>>
>> First off, if you're using the default heap settings, that's a serious
>> problem.  If you've got the head room, my recommendation is to use 16GB
>> heap with 12 GB new gen and pin your memtable heap space to 2GB.  Set your
>> max tenuring threshold to 6 and your survivor ratio to 6.  You don't need a
>> lot of old gen space with cassandra, almost everything that will show up
>> there is memtable related, and we allocate a *lot* whenever we read data
>> off disk.
>>
>> Most folks use the default disk read ahead setting of 128KB.  You can
>> check this setting using blockdev --report, under the RA column.  You'll
>> see 256 there, that's in 512 byte sectors.  MVs rely on a read before a
>> write, so for every read off disk you do, you'll pull additional 128KB into
>> your page cache.  This is usually a waste and puts WAY too much pressure on
>> your disk.  On SSD, I always change this to 4KB.
>>
>> Next, be sure you're setting your compression rate accordingly.  I wrote
>> a long post on the topic here:
>> https://thelastpickle.com/blog/2018/08/08/compression_performance.html.
>> Our default compression is very unfriendly for read heavy workloads if
>> you're reading small rows.  If your records are small, 4KB compression
>> chunk length is your friend.
>>
>> I have some slides showing pretty good performance improvements from the
>> above 2 changes.  Specifically, I went from 16K reads a second at 180ms p99
>> latency up to 63K reads / second at 21ms p99.  Disk usage dropped by a
>> factor of 10.  Throw in those JVM changes I recommended and things should
>> improve even further.
>>
>> Generally speaking, I recommend avoiding MVs, as they can be a giant mine
>> if you aren't careful.  They're not doing any magic behind the scenes that
>> makes scaling easier, and in a lot of cases they're a hinderance.  You
>> still need to understand the underlying data and how it's laid out to use
>> them properly, which is 99% of the work.
>>
>> Jon
>>
>> On Fri, Feb 7, 2020 at 10:32 AM Michael Shuler 
>> wrote:
>>
>>> That JIRA still says Open, so no, it has not been fixed (unless there's
>>> a fixed duplicate in JIRA somewhere).
>>>
>>> For clarification, you could update that ticket with a comment including
>>> your environmental details, usage of MV, etc. I'll bump the priority up
>>> and include some possible branchX fixvers.
>>>
>>> Michael
>>>
>>> On 2/7/20 10:53 AM, Surbhi Gupta wrote:
>>> > Hi,
>>> >
>>> > We are getting hit by the below bug.
>>> > Other than lowering hinted_handoff_throttle_in_kb to 100 any other
>>> work
>>> > around ?
>>> >
>>> > https://issues.apache.org/jira/browse/CASSANDRA-13810
>>> >
>>> > Any idea if it got fixed in later version.
>>> > We are on Open source Cassandra 3.11.1  .
>>> >
>>> > Thanks
>>> > Surbhi
>>> >
>>> >
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>