[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gil Ganz updated CASSANDRA-19477: - Attachment: flamegraph_20240711.html > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.5, 5.0-beta2, 5.0, 5.1-alpha1 > > Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, > flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, > flamegraph_20240711.html, image-2024-03-24-17-57-32-560.png, > image-2024-03-24-18-08-36-918.png, image-2024-03-24-18-16-50-370.png, > image-2024-03-24-18-17-48-334.png, image-2024-03-24-18-20-07-734.png > > Time Spent: 4h 40m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-19477: --- Fix Version/s: 5.0 > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.5, 5.0-beta2, 5.0, 5.1-alpha1 > > Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, > flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, > image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, > image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png, > image-2024-03-24-18-20-07-734.png > > Time Spent: 4h 40m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19477: -- Fix Version/s: 4.1.5 5.0-beta2 5.1-alpha1 (was: 5.x) (was: 4.1.x) (was: 5.0-rc) Since Version: 4.1-alpha1 Source Control Link: https://github.com/apache/cassandra/commit/38408938ccfe5b8c051e25c645bdcd71b45fa66e Resolution: Fixed Status: Resolved (was: Ready to Commit) > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.5, 5.0-beta2, 5.1-alpha1 > > Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, > flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, > image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, > image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png, > image-2024-03-24-18-20-07-734.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-19477: -- Status: Ready to Commit (was: Review In Progress) +1, good stuff > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, > flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, > image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, > image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png, > image-2024-03-24-18-20-07-734.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-19477: -- Status: Patch Available (was: Needs Committer) > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, > flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, > image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, > image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png, > image-2024-03-24-18-20-07-734.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-19477: -- Reviewers: Aleksey Yeschenko, Aleksey Yeschenko (was: Aleksey Yeschenko) Aleksey Yeschenko, Aleksey Yeschenko (was: Aleksey Yeschenko) Status: Review In Progress (was: Patch Available) > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, > flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, > image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, > image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png, > image-2024-03-24-18-20-07-734.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19477: -- Status: Needs Committer (was: Patch Available) > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, > flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, > image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, > image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png, > image-2024-03-24-18-20-07-734.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Haddad updated CASSANDRA-19477: --- Attachment: image-2024-03-24-18-20-07-734.png > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, > flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, > image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, > image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png, > image-2024-03-24-18-20-07-734.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Haddad updated CASSANDRA-19477: --- Attachment: image-2024-03-24-18-16-50-370.png > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, > flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, > image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, > image-2024-03-24-18-16-50-370.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Haddad updated CASSANDRA-19477: --- Attachment: image-2024-03-24-18-17-48-334.png > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, > flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, > image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, > image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Haddad updated CASSANDRA-19477: --- Attachment: image-2024-03-24-18-08-36-918.png > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, > flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, > image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Haddad updated CASSANDRA-19477: --- Attachment: image-2024-03-24-17-57-32-560.png > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, > flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, > image-2024-03-24-17-57-32-560.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Haddad updated CASSANDRA-19477: --- Attachment: flame-cassandra0-patched-2024-03-25_00-40-47.html > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, > flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Haddad updated CASSANDRA-19477: --- Attachment: flame-cassandra0-release-2024-03-25_00-16-44.html > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, > flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19477: -- Fix Version/s: (was: 4.0.x) > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flamegraph.cpu.html > > Time Spent: 3h 50m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19477: -- Test and Documentation Plan: ci Status: Patch Available (was: In Progress) > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0-rc, 5.x > > Attachments: flamegraph.cpu.html > > Time Spent: 3h 50m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19477: -- Fix Version/s: 4.0.x 4.1.x 5.0-rc > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0-rc, 5.x > > Attachments: flamegraph.cpu.html > > Time Spent: 3h 50m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-19477: -- Reviewers: Aleksey Yeschenko > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Attachments: flamegraph.cpu.html > > Time Spent: 2h 40m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19477: -- Summary: Do not go to disk to get HintsStore.getTotalFileSize (was: Significant CPU overhead in HintsStore.getTotalFileSize) > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Attachments: flamegraph.cpu.html > > Time Spent: 20m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org