Re: Overload because of hint pressure + MVs
> > Currently the value of phi_convict_threshold is not set which makes it to > 8 (default) . > Can this also cause hints buildup even when we can see that all nodes are > UP ? You can bump it up to 12 to reduce the sensitivity but it's likely GC pauses causing it. Phi convict is the side-effect, not the cause. Just to add , we are using 24GB heap size. Are you using CMS? If using G1, I'd recommend bumping it up to 31GB if the servers have 40+ GB of RAM. Cheers!
Re: What is "will be anticompacted on range" ?
Hi, Full repair triggers anticompaction as well. Only subrange repair doesn't trigger anticompaction, and in 4.0, AFAIK, full repairs won't involve anticompaction anymore. Cheers, Le lun. 10 févr. 2020 à 19:17, Krish Donald a écrit : > Thanks Jeff, But we are running repair using below command , how do we > know if incremental repair is enabled? > > repair -full -pr > > Thanks > KD > > On Mon, Feb 10, 2020 at 10:09 AM Jeff Jirsa wrote: > >> Incremental repair is splitting the data it repaired from the data it >> didnt repair so it can mark the repaired data with a repairedAt timestamp >> annotation on the data file / sstable. >> >> >> On Mon, Feb 10, 2020 at 9:39 AM Krish Donald >> wrote: >> >>> Hi, >>> >>> I noticed few messages in system.log like below: >>> INFO [CompactionExecutor:21] 2020-02-08 17:56:16,998 >>> CompactionManager.java:677 - [repair #fb044b01-4ab5-11ea-a736-a367dba4ed71] >>> SSTable BigTableReader(path='xyz/mc-79976-big-Data.db') >>> ((-8828745000913291684,8954981413747359495]) will be anticompacted on range >>> (1298637302462891853,1299655718091763872] >>> >>> And compactionstats was showing below . >>> id compaction type >>> keyspace table completedtotalunit progress >>> 82ee9720-3c86-11ea-adda-b11edeb80235 Anticompaction after repair >>> customer profile 182882813624 196589990177 bytes 93.03% >>> >>> We are on 3.11. >>> >>> What is the meaning of this compaction type "nticompaction after repair >>> "? >>> Havent noticed this in 2.x version >>> >>> Thanks >>> KD >>> >>>
Re: What is "will be anticompacted on range" ?
Thanks Jeff, But we are running repair using below command , how do we know if incremental repair is enabled? repair -full -pr Thanks KD On Mon, Feb 10, 2020 at 10:09 AM Jeff Jirsa wrote: > Incremental repair is splitting the data it repaired from the data it > didnt repair so it can mark the repaired data with a repairedAt timestamp > annotation on the data file / sstable. > > > On Mon, Feb 10, 2020 at 9:39 AM Krish Donald wrote: > >> Hi, >> >> I noticed few messages in system.log like below: >> INFO [CompactionExecutor:21] 2020-02-08 17:56:16,998 >> CompactionManager.java:677 - [repair #fb044b01-4ab5-11ea-a736-a367dba4ed71] >> SSTable BigTableReader(path='xyz/mc-79976-big-Data.db') >> ((-8828745000913291684,8954981413747359495]) will be anticompacted on range >> (1298637302462891853,1299655718091763872] >> >> And compactionstats was showing below . >> id compaction type keyspace >> table completedtotalunit progress >> 82ee9720-3c86-11ea-adda-b11edeb80235 Anticompaction after repair >> customer profile 182882813624 196589990177 bytes 93.03% >> >> We are on 3.11. >> >> What is the meaning of this compaction type "nticompaction after repair >> "? >> Havent noticed this in 2.x version >> >> Thanks >> KD >> >>
Re: What is "will be anticompacted on range" ?
Incremental repair is splitting the data it repaired from the data it didnt repair so it can mark the repaired data with a repairedAt timestamp annotation on the data file / sstable. On Mon, Feb 10, 2020 at 9:39 AM Krish Donald wrote: > Hi, > > I noticed few messages in system.log like below: > INFO [CompactionExecutor:21] 2020-02-08 17:56:16,998 > CompactionManager.java:677 - [repair #fb044b01-4ab5-11ea-a736-a367dba4ed71] > SSTable BigTableReader(path='xyz/mc-79976-big-Data.db') > ((-8828745000913291684,8954981413747359495]) will be anticompacted on range > (1298637302462891853,1299655718091763872] > > And compactionstats was showing below . > id compaction type keyspace > table completedtotalunit progress > 82ee9720-3c86-11ea-adda-b11edeb80235 Anticompaction after repair customer >profile 182882813624 196589990177 bytes 93.03% > > We are on 3.11. > > What is the meaning of this compaction type "nticompaction after repair "? > Havent noticed this in 2.x version > > Thanks > KD > >
What is "will be anticompacted on range" ?
Hi, I noticed few messages in system.log like below: INFO [CompactionExecutor:21] 2020-02-08 17:56:16,998 CompactionManager.java:677 - [repair #fb044b01-4ab5-11ea-a736-a367dba4ed71] SSTable BigTableReader(path='xyz/mc-79976-big-Data.db') ((-8828745000913291684,8954981413747359495]) will be anticompacted on range (1298637302462891853,1299655718091763872] And compactionstats was showing below . id compaction type keyspace table completedtotalunit progress 82ee9720-3c86-11ea-adda-b11edeb80235 Anticompaction after repair customer profile 182882813624 196589990177 bytes 93.03% We are on 3.11. What is the meaning of this compaction type "nticompaction after repair "? Havent noticed this in 2.x version Thanks KD
Re: Overload because of hint pressure + MVs
Just to add , we are using 24GB heap size. On Mon, 10 Feb 2020 at 09:08, Surbhi Gupta wrote: > Hi Jon, > > We are on multi datacenter(On Prim) setup. > We also noticed too many messages like below: > > DEBUG [GossipStage:1] 2020-02-10 09:38:52,953 FailureDetector.java:457 - > Ignoring interval time of 3258125997 for /10.x.x.x > > DEBUG [GossipStage:1] 2020-02-10 09:38:52,954 FailureDetector.java:457 - > Ignoring interval time of 2045630029 for /10.y.y.y > > DEBUG [GossipStage:1] 2020-02-10 09:38:52,954 FailureDetector.java:457 - > Ignoring interval time of 2045416737 for /10.z.z.z > > > > Currently the value of phi_convict_threshold is not set which makes it to > 8 (default) . > Can this also cause hints buildup even when we can see that all nodes are > UP ? > Recommended value of phi_convict_threshold is 12 in AWS multi datacenter > environment. > > Thanks > Surbhi > > On Sun, 9 Feb 2020 at 21:42, Surbhi Gupta > wrote: > >> Thanks a lot Jon.. >> Will try the recommendations and let you know the results >> >> On Fri, Feb 7, 2020 at 10:52 AM Jon Haddad wrote: >> >>> There's a few things you can do here that might help. >>> >>> First off, if you're using the default heap settings, that's a serious >>> problem. If you've got the head room, my recommendation is to use 16GB >>> heap with 12 GB new gen and pin your memtable heap space to 2GB. Set your >>> max tenuring threshold to 6 and your survivor ratio to 6. You don't need a >>> lot of old gen space with cassandra, almost everything that will show up >>> there is memtable related, and we allocate a *lot* whenever we read data >>> off disk. >>> >>> Most folks use the default disk read ahead setting of 128KB. You can >>> check this setting using blockdev --report, under the RA column. You'll >>> see 256 there, that's in 512 byte sectors. MVs rely on a read before a >>> write, so for every read off disk you do, you'll pull additional 128KB into >>> your page cache. This is usually a waste and puts WAY too much pressure on >>> your disk. On SSD, I always change this to 4KB. >>> >>> Next, be sure you're setting your compression rate accordingly. I wrote >>> a long post on the topic here: >>> https://thelastpickle.com/blog/2018/08/08/compression_performance.html. >>> Our default compression is very unfriendly for read heavy workloads if >>> you're reading small rows. If your records are small, 4KB compression >>> chunk length is your friend. >>> >>> I have some slides showing pretty good performance improvements from the >>> above 2 changes. Specifically, I went from 16K reads a second at 180ms p99 >>> latency up to 63K reads / second at 21ms p99. Disk usage dropped by a >>> factor of 10. Throw in those JVM changes I recommended and things should >>> improve even further. >>> >>> Generally speaking, I recommend avoiding MVs, as they can be a giant >>> mine if you aren't careful. They're not doing any magic behind the scenes >>> that makes scaling easier, and in a lot of cases they're a hinderance. You >>> still need to understand the underlying data and how it's laid out to use >>> them properly, which is 99% of the work. >>> >>> Jon >>> >>> On Fri, Feb 7, 2020 at 10:32 AM Michael Shuler >>> wrote: >>> That JIRA still says Open, so no, it has not been fixed (unless there's a fixed duplicate in JIRA somewhere). For clarification, you could update that ticket with a comment including your environmental details, usage of MV, etc. I'll bump the priority up and include some possible branchX fixvers. Michael On 2/7/20 10:53 AM, Surbhi Gupta wrote: > Hi, > > We are getting hit by the below bug. > Other than lowering hinted_handoff_throttle_in_kb to 100 any other work > around ? > > https://issues.apache.org/jira/browse/CASSANDRA-13810 > > Any idea if it got fixed in later version. > We are on Open source Cassandra 3.11.1 . > > Thanks > Surbhi > > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Overload because of hint pressure + MVs
Hi Jon, We are on multi datacenter(On Prim) setup. We also noticed too many messages like below: DEBUG [GossipStage:1] 2020-02-10 09:38:52,953 FailureDetector.java:457 - Ignoring interval time of 3258125997 for /10.x.x.x DEBUG [GossipStage:1] 2020-02-10 09:38:52,954 FailureDetector.java:457 - Ignoring interval time of 2045630029 for /10.y.y.y DEBUG [GossipStage:1] 2020-02-10 09:38:52,954 FailureDetector.java:457 - Ignoring interval time of 2045416737 for /10.z.z.z Currently the value of phi_convict_threshold is not set which makes it to 8 (default) . Can this also cause hints buildup even when we can see that all nodes are UP ? Recommended value of phi_convict_threshold is 12 in AWS multi datacenter environment. Thanks Surbhi On Sun, 9 Feb 2020 at 21:42, Surbhi Gupta wrote: > Thanks a lot Jon.. > Will try the recommendations and let you know the results > > On Fri, Feb 7, 2020 at 10:52 AM Jon Haddad wrote: > >> There's a few things you can do here that might help. >> >> First off, if you're using the default heap settings, that's a serious >> problem. If you've got the head room, my recommendation is to use 16GB >> heap with 12 GB new gen and pin your memtable heap space to 2GB. Set your >> max tenuring threshold to 6 and your survivor ratio to 6. You don't need a >> lot of old gen space with cassandra, almost everything that will show up >> there is memtable related, and we allocate a *lot* whenever we read data >> off disk. >> >> Most folks use the default disk read ahead setting of 128KB. You can >> check this setting using blockdev --report, under the RA column. You'll >> see 256 there, that's in 512 byte sectors. MVs rely on a read before a >> write, so for every read off disk you do, you'll pull additional 128KB into >> your page cache. This is usually a waste and puts WAY too much pressure on >> your disk. On SSD, I always change this to 4KB. >> >> Next, be sure you're setting your compression rate accordingly. I wrote >> a long post on the topic here: >> https://thelastpickle.com/blog/2018/08/08/compression_performance.html. >> Our default compression is very unfriendly for read heavy workloads if >> you're reading small rows. If your records are small, 4KB compression >> chunk length is your friend. >> >> I have some slides showing pretty good performance improvements from the >> above 2 changes. Specifically, I went from 16K reads a second at 180ms p99 >> latency up to 63K reads / second at 21ms p99. Disk usage dropped by a >> factor of 10. Throw in those JVM changes I recommended and things should >> improve even further. >> >> Generally speaking, I recommend avoiding MVs, as they can be a giant mine >> if you aren't careful. They're not doing any magic behind the scenes that >> makes scaling easier, and in a lot of cases they're a hinderance. You >> still need to understand the underlying data and how it's laid out to use >> them properly, which is 99% of the work. >> >> Jon >> >> On Fri, Feb 7, 2020 at 10:32 AM Michael Shuler >> wrote: >> >>> That JIRA still says Open, so no, it has not been fixed (unless there's >>> a fixed duplicate in JIRA somewhere). >>> >>> For clarification, you could update that ticket with a comment including >>> your environmental details, usage of MV, etc. I'll bump the priority up >>> and include some possible branchX fixvers. >>> >>> Michael >>> >>> On 2/7/20 10:53 AM, Surbhi Gupta wrote: >>> > Hi, >>> > >>> > We are getting hit by the below bug. >>> > Other than lowering hinted_handoff_throttle_in_kb to 100 any other >>> work >>> > around ? >>> > >>> > https://issues.apache.org/jira/browse/CASSANDRA-13810 >>> > >>> > Any idea if it got fixed in later version. >>> > We are on Open source Cassandra 3.11.1 . >>> > >>> > Thanks >>> > Surbhi >>> > >>> > >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: user-h...@cassandra.apache.org >>> >>>