[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-07-11 Thread Gil Ganz (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gil Ganz updated CASSANDRA-19477:
-
Attachment: flamegraph_20240711.html

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.5, 5.0-beta2, 5.0, 5.1-alpha1
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, 
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, 
> flamegraph_20240711.html, image-2024-03-24-17-57-32-560.png, 
> image-2024-03-24-18-08-36-918.png, image-2024-03-24-18-16-50-370.png, 
> image-2024-03-24-18-17-48-334.png, image-2024-03-24-18-20-07-734.png
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-26 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-19477:
---
Fix Version/s: 5.0

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.5, 5.0-beta2, 5.0, 5.1-alpha1
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, 
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, 
> image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, 
> image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png, 
> image-2024-03-24-18-20-07-734.png
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-25 Thread Stefan Miklosovic (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-19477:
--
  Fix Version/s: 4.1.5
 5.0-beta2
 5.1-alpha1
 (was: 5.x)
 (was: 4.1.x)
 (was: 5.0-rc)
  Since Version: 4.1-alpha1
Source Control Link: 
https://github.com/apache/cassandra/commit/38408938ccfe5b8c051e25c645bdcd71b45fa66e
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.5, 5.0-beta2, 5.1-alpha1
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, 
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, 
> image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, 
> image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png, 
> image-2024-03-24-18-20-07-734.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-25 Thread Aleksey Yeschenko (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-19477:
--
Status: Ready to Commit  (was: Review In Progress)

+1, good stuff

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, 
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, 
> image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, 
> image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png, 
> image-2024-03-24-18-20-07-734.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-25 Thread Aleksey Yeschenko (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-19477:
--
Status: Patch Available  (was: Needs Committer)

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, 
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, 
> image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, 
> image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png, 
> image-2024-03-24-18-20-07-734.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-25 Thread Aleksey Yeschenko (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-19477:
--
Reviewers: Aleksey Yeschenko, Aleksey Yeschenko  (was: Aleksey Yeschenko)
   Aleksey Yeschenko, Aleksey Yeschenko  (was: Aleksey Yeschenko)
   Status: Review In Progress  (was: Patch Available)

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, 
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, 
> image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, 
> image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png, 
> image-2024-03-24-18-20-07-734.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-25 Thread Stefan Miklosovic (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-19477:
--
Status: Needs Committer  (was: Patch Available)

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, 
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, 
> image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, 
> image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png, 
> image-2024-03-24-18-20-07-734.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-24 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-19477:
---
Attachment: image-2024-03-24-18-20-07-734.png

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, 
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, 
> image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, 
> image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png, 
> image-2024-03-24-18-20-07-734.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-24 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-19477:
---
Attachment: image-2024-03-24-18-16-50-370.png

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, 
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, 
> image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, 
> image-2024-03-24-18-16-50-370.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-24 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-19477:
---
Attachment: image-2024-03-24-18-17-48-334.png

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, 
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, 
> image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, 
> image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-24 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-19477:
---
Attachment: image-2024-03-24-18-08-36-918.png

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, 
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, 
> image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-24 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-19477:
---
Attachment: image-2024-03-24-17-57-32-560.png

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, 
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, 
> image-2024-03-24-17-57-32-560.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-24 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-19477:
---
Attachment: flame-cassandra0-patched-2024-03-25_00-40-47.html

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, 
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-24 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-19477:
---
Attachment: flame-cassandra0-release-2024-03-25_00-16-44.html

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, 
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-21 Thread Stefan Miklosovic (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-19477:
--
Fix Version/s: (was: 4.0.x)

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flamegraph.cpu.html
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-20 Thread Stefan Miklosovic (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-19477:
--
Test and Documentation Plan: ci
 Status: Patch Available  (was: In Progress)

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0-rc, 5.x
>
> Attachments: flamegraph.cpu.html
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-20 Thread Stefan Miklosovic (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-19477:
--
Fix Version/s: 4.0.x
   4.1.x
   5.0-rc

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0-rc, 5.x
>
> Attachments: flamegraph.cpu.html
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-20 Thread Aleksey Yeschenko (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-19477:
--
Reviewers: Aleksey Yeschenko

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
> Attachments: flamegraph.cpu.html
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-19 Thread Stefan Miklosovic (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-19477:
--
Summary: Do not go to disk to get HintsStore.getTotalFileSize  (was: 
Significant CPU overhead in HintsStore.getTotalFileSize)

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
> Attachments: flamegraph.cpu.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org