[jira] [Comment Edited] (CASSANDRA-6596) Split out outgoing stream throughput within a DC and inter-DC
[ https://issues.apache.org/jira/browse/CASSANDRA-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902842#comment-13902842 ] Vijay edited comment on CASSANDRA-6596 at 2/16/14 9:43 PM: --- Thanks Bendict, Fixed! was (Author: vijay2...@yahoo.com): Hi Bendict, Not sure where i missed it the change was to add a multiplier while initializing the throughput... {code} double currentThroughput = ((double) DatabaseDescriptor.getStreamThroughputOutboundMegabitsPerSec()) * 1024 * 1024 * 8; {code} Split out outgoing stream throughput within a DC and inter-DC - Key: CASSANDRA-6596 URL: https://issues.apache.org/jira/browse/CASSANDRA-6596 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jeremy Hanna Assignee: Vijay Priority: Minor Fix For: 2.1 Attachments: 0001-CASSANDRA-6596.patch Currently the outgoing stream throughput setting doesn't differentiate between when it goes to another node in the same DC and when it goes to another DC across a potentially bandwidth limited link. It would be nice to have that split out so that it could be tuned for each type of link. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Issue Comment Deleted] (CASSANDRA-6596) Split out outgoing stream throughput within a DC and inter-DC
[ https://issues.apache.org/jira/browse/CASSANDRA-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-6596: - Comment: was deleted (was: Thanks Bendict, Fixed!) Split out outgoing stream throughput within a DC and inter-DC - Key: CASSANDRA-6596 URL: https://issues.apache.org/jira/browse/CASSANDRA-6596 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jeremy Hanna Assignee: Vijay Priority: Minor Fix For: 2.1 Attachments: 0001-CASSANDRA-6596.patch Currently the outgoing stream throughput setting doesn't differentiate between when it goes to another node in the same DC and when it goes to another DC across a potentially bandwidth limited link. It would be nice to have that split out so that it could be tuned for each type of link. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6596) Split out outgoing stream throughput within a DC and inter-DC
[ https://issues.apache.org/jira/browse/CASSANDRA-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902849#comment-13902849 ] Vijay commented on CASSANDRA-6596: -- Thanks Bendict! Fixed it Race in the comments... Split out outgoing stream throughput within a DC and inter-DC - Key: CASSANDRA-6596 URL: https://issues.apache.org/jira/browse/CASSANDRA-6596 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jeremy Hanna Assignee: Vijay Priority: Minor Fix For: 2.1 Attachments: 0001-CASSANDRA-6596.patch Currently the outgoing stream throughput setting doesn't differentiate between when it goes to another node in the same DC and when it goes to another DC across a potentially bandwidth limited link. It would be nice to have that split out so that it could be tuned for each type of link. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6712) Equals without hashcode in SpeculativeRetry
[ https://issues.apache.org/jira/browse/CASSANDRA-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902139#comment-13902139 ] Vijay commented on CASSANDRA-6712: -- +1 Equals without hashcode in SpeculativeRetry --- Key: CASSANDRA-6712 URL: https://issues.apache.org/jira/browse/CASSANDRA-6712 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Trivial Fix For: 2.0.6 Attachments: 6712.txt This could cause problems if we were to start using supposed-to-be-equal SR objects in a Hashmap. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6590) Gossip does not heal after a temporary partition at startup
[ https://issues.apache.org/jira/browse/CASSANDRA-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895835#comment-13895835 ] Vijay commented on CASSANDRA-6590: -- Hi Brandon, Was not able to reproduce the above issue... (below is the log after network partition) {code} INFO [GossipTasks:1] 2014-02-09 05:29:10,259 Gossiper.java (line 862) InetAddress /17.198.227.155 is now DOWN INFO [HANDSHAKE-/17.198.227.155] 2014-02-09 05:29:18,023 OutboundTcpConnection.java (line 386) Handshaking version with /17.198.227.155 INFO [RequestResponseStage:33] 2014-02-09 05:29:18,038 Gossiper.java (line 848) InetAddress /17.198.227.155 is now UP {code} {quote} I think we'll need a separate yaml option {quote} Done {quote} I'm not sure why the block in handleMajorStateChange moved {quote} Since the message was wrong, Up doesn't happen until echo completes, any ways i reverted that. rebased @ https://github.com/Vijay2win/cassandra/tree/6590-v4 Gossip does not heal after a temporary partition at startup --- Key: CASSANDRA-6590 URL: https://issues.apache.org/jira/browse/CASSANDRA-6590 Project: Cassandra Issue Type: Bug Components: Core Reporter: Brandon Williams Assignee: Vijay Fix For: 2.0.6 Attachments: 0001-CASSANDRA-6590.patch, 0001-logging-for-6590.patch, 6590_disable_echo.txt See CASSANDRA-6571 for background. If a node is partitioned on startup when the echo command is sent, but then the partition heals, the halves of the partition will never mark each other up despite being able to communicate. This stems from CASSANDRA-3533. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6590) Gossip does not heal after a temporary partition at startup
[ https://issues.apache.org/jira/browse/CASSANDRA-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889826#comment-13889826 ] Vijay commented on CASSANDRA-6590: -- Sorry was shooting a different message during the startup, fixed and pushed to https://github.com/Vijay2win/cassandra/tree/6590-v3. Thanks! Gossip does not heal after a temporary partition at startup --- Key: CASSANDRA-6590 URL: https://issues.apache.org/jira/browse/CASSANDRA-6590 Project: Cassandra Issue Type: Bug Components: Core Reporter: Brandon Williams Assignee: Vijay Fix For: 2.0.6 Attachments: 0001-CASSANDRA-6590.patch, 0001-logging-for-6590.patch, 6590_disable_echo.txt See CASSANDRA-6571 for background. If a node is partitioned on startup when the echo command is sent, but then the partition heals, the halves of the partition will never mark each other up despite being able to communicate. This stems from CASSANDRA-3533. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6590) Gossip does not heal after a temporary partition at startup
[ https://issues.apache.org/jira/browse/CASSANDRA-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-6590: - Attachment: 0001-logging-for-6590.patch Hi Brandon, Looks like the realMarkAlive is called multiple times and hence the issue i removed the localState.markDead() and it works fine for now (Attached patch). Let me know... Gossip does not heal after a temporary partition at startup --- Key: CASSANDRA-6590 URL: https://issues.apache.org/jira/browse/CASSANDRA-6590 Project: Cassandra Issue Type: Bug Components: Core Reporter: Brandon Williams Assignee: Vijay Fix For: 2.0.5 Attachments: 0001-CASSANDRA-6590.patch, 0001-logging-for-6590.patch, 6590_disable_echo.txt See CASSANDRA-6571 for background. If a node is partitioned on startup when the echo command is sent, but then the partition heals, the halves of the partition will never mark each other up despite being able to communicate. This stems from CASSANDRA-3533. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6596) Split out outgoing stream throughput within a DC and inter-DC
[ https://issues.apache.org/jira/browse/CASSANDRA-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-6596: - Attachment: 0001-CASSANDRA-6596.patch Attached patch introduces inter_dc_stream_throughput_outbound_megabits_per_sec which is a subset of stream_throughput_outbound_megabits_per_sec. Currently the node throttles all the traffic which it streams (this doesn't change after this patch). In addition, this patch adds additional throttle across the DC's. One more thing: There might be a bug (in trunk) where the throttle is applied on bytes instead of bits... Since its not related to this ticket, i have not changed it. {code} int toTransfer = (int) Math.min(transferBuffer.length, length - bytesTransferred); int minReadable = (int) Math.min(transferBuffer.length, reader.length() - reader.getFilePointer()); reader.readFully(transferBuffer, 0, minReadable); if (validator != null) validator.validate(transferBuffer, 0, minReadable); limiter.acquire(toTransfer); {code} Split out outgoing stream throughput within a DC and inter-DC - Key: CASSANDRA-6596 URL: https://issues.apache.org/jira/browse/CASSANDRA-6596 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jeremy Hanna Assignee: Vijay Priority: Minor Fix For: 2.1 Attachments: 0001-CASSANDRA-6596.patch Currently the outgoing stream throughput setting doesn't differentiate between when it goes to another node in the same DC and when it goes to another DC across a potentially bandwidth limited link. It would be nice to have that split out so that it could be tuned for each type of link. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6590) Gossip does not heal after a temporary partition at startup
[ https://issues.apache.org/jira/browse/CASSANDRA-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-6590: - Attachment: 0001-CASSANDRA-6590.patch Nit: do_firewall_check is true by default in the yaml but is false in config. Attached patch is on top of the original patch by brandon, Sets the hibernate state (Dead state) as step 1 in joinTokenRing which will be later changed at the end of the method to normal. The main fix (IMHO) is in the OTCP where we timeout so we can reconnect, when the socket hangs and makes the connection un-useable during temp network partition. Please note: this patch changes the streaming_socket_timeout_in_ms configuration to socket_timeout_in_ms and reuses them. Gossip does not heal after a temporary partition at startup --- Key: CASSANDRA-6590 URL: https://issues.apache.org/jira/browse/CASSANDRA-6590 Project: Cassandra Issue Type: Bug Components: Core Reporter: Brandon Williams Assignee: Vijay Fix For: 2.0.5 Attachments: 0001-CASSANDRA-6590.patch, 6590_disable_echo.txt See CASSANDRA-6571 for background. If a node is partitioned on startup when the echo command is sent, but then the partition heals, the halves of the partition will never mark each other up despite being able to communicate. This stems from CASSANDRA-3533. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6571) Quickly restarted nodes can list others as down indefinitely
[ https://issues.apache.org/jira/browse/CASSANDRA-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868848#comment-13868848 ] Vijay commented on CASSANDRA-6571: -- We had this discussion in IRC, we need to test this before... To clarify, (1) is same as in the description {quote} I tried to fix this by defaulting isAlive=false in the constructor of EndpointState. {quote} (2) we need to recover the receiving node from the hang state (while writing to the socket), by restarting the connections... Quickly restarted nodes can list others as down indefinitely Key: CASSANDRA-6571 URL: https://issues.apache.org/jira/browse/CASSANDRA-6571 Project: Cassandra Issue Type: Bug Components: Core Reporter: Richard Low Assignee: sankalp kohli Labels: gossip Fix For: 2.0.5 Attachments: 6571.txt In a healthy cluster, if a node is restarted quickly, it may list other nodes as down when it comes back up and never list them as up. I reproduced it on a small cluster running in Docker containers. 1. Have a healthy 5 node cluster: {quote} $ nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 192.168.100.140.88 KB 256 38.3% 92930ef6-1b29-49f0-a8cd-f962b55dca1b rack1 UN 192.168.100.254 80.63 KB 256 39.6% ef15a717-9d60-48fb-80a9-e0973abdd55e rack1 UN 192.168.100.387.78 KB 256 40.8% 4e6765db-97ed-4429-a9f4-8e29de247f18 rack1 UN 192.168.100.275.22 KB 256 40.6% e89bc581-5345-4abd-88ba-7018371940fc rack1 UN 192.168.100.480.83 KB 256 40.8% 466a9798-d484-44f0-aae8-bb2b78d80331 rack1 {quote} 2. Kill a node and restart it quickly: bq. kill -9 pid start-cassandra 3. Wait for the node to come back and more often than not, it lists one or more other nodes as down indefinitely: {quote} $ nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 192.168.100.140.88 KB 256 38.3% 92930ef6-1b29-49f0-a8cd-f962b55dca1b rack1 UN 192.168.100.254 80.63 KB 256 39.6% ef15a717-9d60-48fb-80a9-e0973abdd55e rack1 DN 192.168.100.387.78 KB 256 40.8% 4e6765db-97ed-4429-a9f4-8e29de247f18 rack1 DN 192.168.100.275.22 KB 256 40.6% e89bc581-5345-4abd-88ba-7018371940fc rack1 DN 192.168.100.480.83 KB 256 40.8% 466a9798-d484-44f0-aae8-bb2b78d80331 rack1 {quote} From trace logging, here's what I think is going on: 1. The nodes are all happy gossiping 2. Restart node X. When it comes back up it starts gossiping with the other nodes. 3. Before node X marks node Y as alive, X sends an echo message (introduced in CASSANDRA-3533) 4. The echo message is received by Y. To reply, Y attempts to reuse a connection to X. The connection is dead, but the message is attempted anyway but fails. 5. X never receives the echo back, so Y isn't marked as alive. 6. X gossips to Y again, but because the endpoint isAlive() returns true, it never calls markAlive() to properly set Y as alive. I tried to fix this by defaulting isAlive=false in the constructor of EndpointState. This made it less likely to mark a node as down but it still happens. The workaround is to leave a node down for a while so the connections die on the remaining nodes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6571) Quickly restarted nodes can list others as down indefinitely
[ https://issues.apache.org/jira/browse/CASSANDRA-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868685#comment-13868685 ] Vijay commented on CASSANDRA-6571: -- Not sure if this will fix it, because the remote machine has not responded back (echo message response). 1) I think we need to always mark the nodes as dead and mark it up only after we received the echo response 2) I think we need to check or reset the socket in the receiving side, may be need to markDead (or retry the message after x seconds?) May be because we removed the hibernate during restarts, this issue shows up? (we are not restarting the states) == [~brandon.williams] I think the hang on the echo response (socket.write()) Quickly restarted nodes can list others as down indefinitely Key: CASSANDRA-6571 URL: https://issues.apache.org/jira/browse/CASSANDRA-6571 Project: Cassandra Issue Type: Bug Components: Core Reporter: Richard Low Assignee: sankalp kohli Labels: gossip Fix For: 2.0.5 In a healthy cluster, if a node is restarted quickly, it may list other nodes as down when it comes back up and never list them as up. I reproduced it on a small cluster running in Docker containers. 1. Have a healthy 5 node cluster: {quote} $ nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 192.168.100.140.88 KB 256 38.3% 92930ef6-1b29-49f0-a8cd-f962b55dca1b rack1 UN 192.168.100.254 80.63 KB 256 39.6% ef15a717-9d60-48fb-80a9-e0973abdd55e rack1 UN 192.168.100.387.78 KB 256 40.8% 4e6765db-97ed-4429-a9f4-8e29de247f18 rack1 UN 192.168.100.275.22 KB 256 40.6% e89bc581-5345-4abd-88ba-7018371940fc rack1 UN 192.168.100.480.83 KB 256 40.8% 466a9798-d484-44f0-aae8-bb2b78d80331 rack1 {quote} 2. Kill a node and restart it quickly: bq. kill -9 pid start-cassandra 3. Wait for the node to come back and more often than not, it lists one or more other nodes as down indefinitely: {quote} $ nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 192.168.100.140.88 KB 256 38.3% 92930ef6-1b29-49f0-a8cd-f962b55dca1b rack1 UN 192.168.100.254 80.63 KB 256 39.6% ef15a717-9d60-48fb-80a9-e0973abdd55e rack1 DN 192.168.100.387.78 KB 256 40.8% 4e6765db-97ed-4429-a9f4-8e29de247f18 rack1 DN 192.168.100.275.22 KB 256 40.6% e89bc581-5345-4abd-88ba-7018371940fc rack1 DN 192.168.100.480.83 KB 256 40.8% 466a9798-d484-44f0-aae8-bb2b78d80331 rack1 {quote} From trace logging, here's what I think is going on: 1. The nodes are all happy gossiping 2. Restart node X. When it comes back up it starts gossiping with the other nodes. 3. Before node X marks node Y as alive, X sends an echo message (introduced in CASSANDRA-3533) 4. The echo message is received by Y. To reply, Y attempts to reuse a connection to X. The connection is dead, but the message is attempted anyway but fails. 5. X never receives the echo back, so Y isn't marked as alive. 6. X gossips to Y again, but because the endpoint isAlive() returns true, it never calls markAlive() to properly set Y as alive. I tried to fix this by defaulting isAlive=false in the constructor of EndpointState. This made it less likely to mark a node as down but it still happens. The workaround is to leave a node down for a while so the connections die on the remaining nodes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (CASSANDRA-6571) Quickly restarted nodes can list others as down indefinitely
[ https://issues.apache.org/jira/browse/CASSANDRA-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868685#comment-13868685 ] Vijay edited comment on CASSANDRA-6571 at 1/11/14 6:46 AM: --- Not sure if this will fix it, because the remote machine has not responded back (echo message response). 1) I think we need to always mark the nodes as dead and mark it up only after we received the echo response 2) I think we need to check or reset the socket in the receiving side, may be need to markDead (or retry the message after x seconds?) May be because we removed the hibernate during restarts, this issue shows up? (we are not resetting the states) == [~brandon.williams] I think the hang on the echo response (socket.write()) was (Author: vijay2...@yahoo.com): Not sure if this will fix it, because the remote machine has not responded back (echo message response). 1) I think we need to always mark the nodes as dead and mark it up only after we received the echo response 2) I think we need to check or reset the socket in the receiving side, may be need to markDead (or retry the message after x seconds?) May be because we removed the hibernate during restarts, this issue shows up? (we are not restarting the states) == [~brandon.williams] I think the hang on the echo response (socket.write()) Quickly restarted nodes can list others as down indefinitely Key: CASSANDRA-6571 URL: https://issues.apache.org/jira/browse/CASSANDRA-6571 Project: Cassandra Issue Type: Bug Components: Core Reporter: Richard Low Assignee: sankalp kohli Labels: gossip Fix For: 2.0.5 In a healthy cluster, if a node is restarted quickly, it may list other nodes as down when it comes back up and never list them as up. I reproduced it on a small cluster running in Docker containers. 1. Have a healthy 5 node cluster: {quote} $ nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 192.168.100.140.88 KB 256 38.3% 92930ef6-1b29-49f0-a8cd-f962b55dca1b rack1 UN 192.168.100.254 80.63 KB 256 39.6% ef15a717-9d60-48fb-80a9-e0973abdd55e rack1 UN 192.168.100.387.78 KB 256 40.8% 4e6765db-97ed-4429-a9f4-8e29de247f18 rack1 UN 192.168.100.275.22 KB 256 40.6% e89bc581-5345-4abd-88ba-7018371940fc rack1 UN 192.168.100.480.83 KB 256 40.8% 466a9798-d484-44f0-aae8-bb2b78d80331 rack1 {quote} 2. Kill a node and restart it quickly: bq. kill -9 pid start-cassandra 3. Wait for the node to come back and more often than not, it lists one or more other nodes as down indefinitely: {quote} $ nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 192.168.100.140.88 KB 256 38.3% 92930ef6-1b29-49f0-a8cd-f962b55dca1b rack1 UN 192.168.100.254 80.63 KB 256 39.6% ef15a717-9d60-48fb-80a9-e0973abdd55e rack1 DN 192.168.100.387.78 KB 256 40.8% 4e6765db-97ed-4429-a9f4-8e29de247f18 rack1 DN 192.168.100.275.22 KB 256 40.6% e89bc581-5345-4abd-88ba-7018371940fc rack1 DN 192.168.100.480.83 KB 256 40.8% 466a9798-d484-44f0-aae8-bb2b78d80331 rack1 {quote} From trace logging, here's what I think is going on: 1. The nodes are all happy gossiping 2. Restart node X. When it comes back up it starts gossiping with the other nodes. 3. Before node X marks node Y as alive, X sends an echo message (introduced in CASSANDRA-3533) 4. The echo message is received by Y. To reply, Y attempts to reuse a connection to X. The connection is dead, but the message is attempted anyway but fails. 5. X never receives the echo back, so Y isn't marked as alive. 6. X gossips to Y again, but because the endpoint isAlive() returns true, it never calls markAlive() to properly set Y as alive. I tried to fix this by defaulting isAlive=false in the constructor of EndpointState. This made it less likely to mark a node as down but it still happens. The workaround is to leave a node down for a while so the connections die on the remaining nodes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-4914) Aggregate functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-4914: - Assignee: (was: Vijay) Aggregate functions in CQL -- Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Fix For: 2.1 The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6544) Reduce GC activity during compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864746#comment-13864746 ] Vijay commented on CASSANDRA-6544: -- Hi Jonathan, Yep in addition... if we can create a a offheap slab allocator (and reuse the slabs) it will help in reducing the memory fragmentation. Hope that make sure. Reduce GC activity during compaction Key: CASSANDRA-6544 URL: https://issues.apache.org/jira/browse/CASSANDRA-6544 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Vijay Assignee: Vijay Fix For: 2.1 We are noticing increase in P99 while the compactions are running at full stream. Most of it is because of the increased GC activity (followed by full GC). The obvious solution/work around is to throttle the compactions, but with SSD's we can get more disk bandwidth for reads and compactions. It will be nice to move the compaction object allocations off heap. First thing to do might be create a Offheap Slab allocator with the size as the compaction in memory size and recycle it. Also we might want to make it configurable so folks can disable it when they don't have off-heap memory to reserve. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (CASSANDRA-6544) Reduce GC activity during compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864746#comment-13864746 ] Vijay edited comment on CASSANDRA-6544 at 1/7/14 10:05 PM: --- Hi Jonathan, Yep in addition... if we can create a a offheap slab allocator (and reuse the slabs) it will help in reducing the memory fragmentation. Hope that make sense. was (Author: vijay2...@yahoo.com): Hi Jonathan, Yep in addition... if we can create a a offheap slab allocator (and reuse the slabs) it will help in reducing the memory fragmentation. Hope that make sure. Reduce GC activity during compaction Key: CASSANDRA-6544 URL: https://issues.apache.org/jira/browse/CASSANDRA-6544 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Vijay Assignee: Vijay Fix For: 2.1 We are noticing increase in P99 while the compactions are running at full stream. Most of it is because of the increased GC activity (followed by full GC). The obvious solution/work around is to throttle the compactions, but with SSD's we can get more disk bandwidth for reads and compactions. It will be nice to move the compaction object allocations off heap. First thing to do might be create a Offheap Slab allocator with the size as the compaction in memory size and recycle it. Also we might want to make it configurable so folks can disable it when they don't have off-heap memory to reserve. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6544) Reduce GC activity during compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865011#comment-13865011 ] Vijay commented on CASSANDRA-6544: -- Sure, working on it. Thanks! Reduce GC activity during compaction Key: CASSANDRA-6544 URL: https://issues.apache.org/jira/browse/CASSANDRA-6544 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Vijay Assignee: Vijay Fix For: 2.1 We are noticing increase in P99 while the compactions are running at full stream. Most of it is because of the increased GC activity (followed by full GC). The obvious solution/work around is to throttle the compactions, but with SSD's we can get more disk bandwidth for reads and compactions. It will be nice to move the compaction object allocations off heap. First thing to do might be create a Offheap Slab allocator with the size as the compaction in memory size and recycle it. Also we might want to make it configurable so folks can disable it when they don't have off-heap memory to reserve. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (CASSANDRA-6544) Reduce GC activity during compaction
Vijay created CASSANDRA-6544: Summary: Reduce GC activity during compaction Key: CASSANDRA-6544 URL: https://issues.apache.org/jira/browse/CASSANDRA-6544 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Vijay Assignee: Vijay Fix For: 2.1 We are noticing increase in P99 while the compactions are running at full stream. Most of it is because of the increased GC activity (followed by full GC). The obvious solution/work around is to throttle the compactions, but with SSD's we can get more disk bandwidth for reads and compactions. It will be nice to move the compaction object allocations off heap. First thing to do might be create a Offheap Slab allocator with the size as the compaction in memory size and recycle it. Also we might want to make it configurable so folks can disable it when they don't have off-heap memory to reserve. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838796#comment-13838796 ] Vijay commented on CASSANDRA-5549: -- Well it is not exactly a constant overhead you might want to look at o.a.c.u.ObjectSizes (CASSANDRA-4860)... Remove Table.switchLock --- Key: CASSANDRA-5549 URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Benedict Labels: performance Fix For: 2.1 Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write path. ReentrantReadWriteLock is not lightweight, even if there is no contention per se between readers and writers of the lock (in Cassandra, memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838627#comment-13838627 ] Vijay commented on CASSANDRA-5549: -- {quote} Without switch lock, we won't have anything preventing writes coming through when we're over-burdened with memory use by memtables. {quote} I should be missing something, how does the switch RW Lock to a kind of CAS operation change this schematics? Are we talking about additional requirement/enhancements to this ticket? {quote} When we flush a memtable we release permits equal to the estimated size of each RM {quote} IMHO, that might not be good enough since Java's memory over head is not considered. And calculating the object size is not cheap either Remove Table.switchLock --- Key: CASSANDRA-5549 URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Vijay Labels: performance Fix For: 2.1 Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write path. ReentrantReadWriteLock is not lightweight, even if there is no contention per se between readers and writers of the lock (in Cassandra, memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-5911) Commit logs are not removed after nodetool flush or nodetool drain
[ https://issues.apache.org/jira/browse/CASSANDRA-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818534#comment-13818534 ] Vijay commented on CASSANDRA-5911: -- Pushed the changes to https://github.com/Vijay2win/cassandra/commits/5911-v3 Looks like we only flush/switch when the auto snapshot is enabled (when truncate is called)... Fixed Added a test case to test for truncate force commit log switch. Force flush and other commands are still a best effort switch. Commit logs are not removed after nodetool flush or nodetool drain -- Key: CASSANDRA-5911 URL: https://issues.apache.org/jira/browse/CASSANDRA-5911 Project: Cassandra Issue Type: Bug Components: Core Reporter: J.B. Langston Assignee: Vijay Priority: Minor Fix For: 2.0.3 Attachments: 0001-5911-v2.patch, 0001-5911-v3.patch, 0001-CASSANDRA-5911.patch, 6528_140171_knwmuqxe9bjv5re_system.log Commit logs are not removed after nodetool flush or nodetool drain. This can lead to unnecessary commit log replay during startup. I've reproduced this on Apache Cassandra 1.2.8. Usually this isn't much of an issue but on a Solr-indexed column family in DSE, each replayed mutation has to be reindexed which can make startup take a long time (on the order of 20-30 min). Reproduction follows: {code} jblangston:bin jblangston$ ./cassandra /dev/null jblangston:bin jblangston$ ../tools/bin/cassandra-stress -n 2000 /dev/null jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool flush jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool drain jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ pkill java jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ ./cassandra -f | grep Replaying INFO 10:03:42,915 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566778.log INFO 10:03:42,922 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log INFO 10:03:43,912 Replaying
[jira] [Commented] (CASSANDRA-6206) Thrift socket listen backlog
[ https://issues.apache.org/jira/browse/CASSANDRA-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818558#comment-13818558 ] Vijay commented on CASSANDRA-6206: -- Committed to trunk, But a partially. Talked to [~xedin], once we have the setting in HSHA we can resolve this ticket. Thrift socket listen backlog Key: CASSANDRA-6206 URL: https://issues.apache.org/jira/browse/CASSANDRA-6206 Project: Cassandra Issue Type: Bug Components: Core Environment: Debian Linux, Java 7 Reporter: Nenad Merdanovic Fix For: 2.0.3 Attachments: cassandra-v2.patch, cassandra.patch Although Thrift is a depreciated method of accessing Cassandra, default backlog is way too low on that socket. It shouldn't be a problem to implement it and I am including a POC patch for this (sorry, really low on time with limited Java knowledge so just to give an idea). This is an old report which was never addressed and the bug remains till this day, except in my case I have a much larger scale application with 3rd party software which I cannot modify to include connection pooling: https://issues.apache.org/jira/browse/CASSANDRA-1663 There is also a pending change in the Thrift itself which Cassandra should be able to use for parts using TServerSocket (SSL): https://issues.apache.org/jira/browse/THRIFT-1868 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818685#comment-13818685 ] Vijay commented on CASSANDRA-3578: -- https://github.com/Vijay2win/cassandra/commits/3578-v2 addresses most of the concerns here. Only thing we have discussed here and not been addressed yet is the aggressive allocation and deallocation of commit logs but not sure if its needed yet... Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, Current-CL.png, Multi-Threded-CL.png, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816281#comment-13816281 ] Vijay edited comment on CASSANDRA-3578 at 11/7/13 7:04 PM: --- {quote} You only call force when you think there is something dirty, not when the buffer does, {quote} Ahaa that might be an over sight, we can call buffer.force all the time and let the OS decide if it has to sync the filesystem. If we do that then we just need to stop during the recovery when we have a corrupted columns (which are because the os or the force didnt complete the fsync completely). {quote} How ugly would it get to either wait for previous (in CL order) mutations before syncing {quote} We can do that with another counter which holds the bytes written by all the threads and comparing it with the allocated. We dont need lock in that case. was (Author: vijay2...@yahoo.com): {quote} You only call force when you think there is something dirty, not when the buffer does, {quote} Ahaa that might be an over sight, we can call buffer.force all the time and let the OS decide if it has to sync the filesystem. If we do that then we just need to stop when we have a corrupted columns (which are because the os or the force didnt complete the fsync completely). {quote} How ugly would it get to either wait for previous (in CL order) mutations before syncing {quote} We can do that with another counter which holds the bytes written by all the threads and comparing it with the allocated. We dont need lock in that case. Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, Current-CL.png, Multi-Threded-CL.png, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816281#comment-13816281 ] Vijay commented on CASSANDRA-3578: -- {quote} You only call force when you think there is something dirty, not when the buffer does, {quote} Ahaa that might be an over sight, we can call buffer.force all the time and let the OS decide if it has to sync the filesystem. If we do that then we just need to stop when we have a corrupted columns (which are because the os or the force didnt complete the fsync completely). {quote} How ugly would it get to either wait for previous (in CL order) mutations before syncing {quote} We can do that with another counter which holds the bytes written by all the threads and comparing it with the allocated. We dont need lock in that case. Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, Current-CL.png, Multi-Threded-CL.png, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816281#comment-13816281 ] Vijay edited comment on CASSANDRA-3578 at 11/7/13 7:05 PM: --- {quote} You only call force when you think there is something dirty, not when the buffer does, {quote} Ahaa that might be an over sight, we can call buffer.force all the time and let the OS decide if it has to sync the filesystem. If we do that then we just need to stop during the recovery/replay when we see a corrupted columns (which are because the os or the force didn't complete the fsync completely). {quote} How ugly would it get to either wait for previous (in CL order) mutations before syncing {quote} We can do that with another counter which holds the bytes written by all the threads and comparing it with the allocated. We dont need lock in that case. was (Author: vijay2...@yahoo.com): {quote} You only call force when you think there is something dirty, not when the buffer does, {quote} Ahaa that might be an over sight, we can call buffer.force all the time and let the OS decide if it has to sync the filesystem. If we do that then we just need to stop during the recovery when we have a corrupted columns (which are because the os or the force didnt complete the fsync completely). {quote} How ugly would it get to either wait for previous (in CL order) mutations before syncing {quote} We can do that with another counter which holds the bytes written by all the threads and comparing it with the allocated. We dont need lock in that case. Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, Current-CL.png, Multi-Threded-CL.png, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816344#comment-13816344 ] Vijay commented on CASSANDRA-3578: -- {quote} we could have A allocate, B begin sync, C allocate, C write, B see counters equal {quote} I am talking about count all the allocation and written, within a segment Which is (A + B + C) != (A + B) (which means C or someone else is still writing). {quote} we didn't know there were unfinished writes behind us {quote} Thats fine we will skip those, thats what the current implementation does too, if you are writing in a sequence and the server stops... the commits which where in the queue are not written We are just moving that queue to the buffer. Practically this is less of an concern because there is few nano's out of sync. Anyways i should stop selling :) Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, Current-CL.png, Multi-Threded-CL.png, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816935#comment-13816935 ] Vijay commented on CASSANDRA-3578: -- :) {quote} The current implementation never ACKs a message it cannot later replay under batch {quote} We don't guarantee that in PeriodicCommitLogExecutorService, all this time i was trying to optimize for the general case (PeriodicCommitLogExecutorService) For BatchCommitLogExecutorService in my patch (https://github.com/Vijay2win/cassandra/commit/0d982e840145d466b8bcbc863d6218b24b0842ad#diff-05c1e4fd86fea19b8e0552b1f289be85L191) does ACK only after we write (we wait for sync after that write), and hence the write and sync of that particular write happens before acking back. Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, Current-CL.png, Multi-Threded-CL.png, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815618#comment-13815618 ] Vijay commented on CASSANDRA-3578: -- Hi Benedict, archiver.maybeArchive(segment.getPath(), segment.getName()) is a blocking call and will need to be a separate thread it might involve user defined archival. {quote} sync() would mark things as flushed to disk that weren't, which would result in log messages never being persisted {quote} My understand is that Calling force will sync the dirty pages and if we do a concurrent writes to the same page they will be marked as dirty and will be synched in the next call, how will we loose the log messages? I still like the original approach :) of creating files (it may be just me) because of simplicity and we can be aggressive in allocator threads similar to your patch (to create empty files and deleting them). Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, Current-CL.png, Multi-Threded-CL.png, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-5911) Commit logs are not removed after nodetool flush or nodetool drain
[ https://issues.apache.org/jira/browse/CASSANDRA-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-5911: - Attachment: 0001-5911-v3.patch v3 fixes the activateNextArchiveSegment issue in v2. I had to modify the test cases to avoid initialization of the commit log before recover method is called. Hope thats ok. Commit logs are not removed after nodetool flush or nodetool drain -- Key: CASSANDRA-5911 URL: https://issues.apache.org/jira/browse/CASSANDRA-5911 Project: Cassandra Issue Type: Bug Components: Core Reporter: J.B. Langston Assignee: Vijay Priority: Minor Fix For: 2.0.3 Attachments: 0001-5911-v2.patch, 0001-5911-v3.patch, 0001-CASSANDRA-5911.patch, 6528_140171_knwmuqxe9bjv5re_system.log Commit logs are not removed after nodetool flush or nodetool drain. This can lead to unnecessary commit log replay during startup. I've reproduced this on Apache Cassandra 1.2.8. Usually this isn't much of an issue but on a Solr-indexed column family in DSE, each replayed mutation has to be reindexed which can make startup take a long time (on the order of 20-30 min). Reproduction follows: {code} jblangston:bin jblangston$ ./cassandra /dev/null jblangston:bin jblangston$ ../tools/bin/cassandra-stress -n 2000 /dev/null jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool flush jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool drain jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ pkill java jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ ./cassandra -f | grep Replaying INFO 10:03:42,915 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566778.log INFO 10:03:42,922 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log INFO 10:03:43,912 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log INFO 10:03:43,912 Replaying
[jira] [Commented] (CASSANDRA-5911) Commit logs are not removed after nodetool flush or nodetool drain
[ https://issues.apache.org/jira/browse/CASSANDRA-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811010#comment-13811010 ] Vijay commented on CASSANDRA-5911: -- Hi Jonathan, That was just a oversight... i missed that recover() set's that flag. Let me add another flag for unit tests. Commit logs are not removed after nodetool flush or nodetool drain -- Key: CASSANDRA-5911 URL: https://issues.apache.org/jira/browse/CASSANDRA-5911 Project: Cassandra Issue Type: Bug Components: Core Reporter: J.B. Langston Assignee: Vijay Priority: Minor Fix For: 2.0.3 Attachments: 0001-5911-v2.patch, 0001-CASSANDRA-5911.patch, 6528_140171_knwmuqxe9bjv5re_system.log Commit logs are not removed after nodetool flush or nodetool drain. This can lead to unnecessary commit log replay during startup. I've reproduced this on Apache Cassandra 1.2.8. Usually this isn't much of an issue but on a Solr-indexed column family in DSE, each replayed mutation has to be reindexed which can make startup take a long time (on the order of 20-30 min). Reproduction follows: {code} jblangston:bin jblangston$ ./cassandra /dev/null jblangston:bin jblangston$ ../tools/bin/cassandra-stress -n 2000 /dev/null jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool flush jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool drain jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ pkill java jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ ./cassandra -f | grep Replaying INFO 10:03:42,915 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566778.log INFO 10:03:42,922 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log INFO 10:03:43,912 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log INFO 10:03:43,912 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log INFO 10:03:43,912 Replaying
[jira] [Updated] (CASSANDRA-5911) Commit logs are not removed after nodetool flush or nodetool drain
[ https://issues.apache.org/jira/browse/CASSANDRA-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-5911: - Attachment: 0001-5911-v2.patch Attached patch has warn message and fix the test to do blocking wait till the new segment arrives. Also little more logic to makes sure if we really need to switch... {code} if (!activeSegment.isUnused() activeSegment.id == context.segment) { if (allocator.numSegmentsAvailable() 0 || allocator.createReserveSegments) activateNextArchiveSegment(); else logger.warn(no active commitlog to switch, additional mutations might be replayed if the node is restarted immediatly. See: CASSANDRA-5911); } {code} Commit logs are not removed after nodetool flush or nodetool drain -- Key: CASSANDRA-5911 URL: https://issues.apache.org/jira/browse/CASSANDRA-5911 Project: Cassandra Issue Type: Bug Components: Core Reporter: J.B. Langston Assignee: Vijay Priority: Minor Fix For: 2.0.3 Attachments: 0001-5911-v2.patch, 0001-CASSANDRA-5911.patch, 6528_140171_knwmuqxe9bjv5re_system.log Commit logs are not removed after nodetool flush or nodetool drain. This can lead to unnecessary commit log replay during startup. I've reproduced this on Apache Cassandra 1.2.8. Usually this isn't much of an issue but on a Solr-indexed column family in DSE, each replayed mutation has to be reindexed which can make startup take a long time (on the order of 20-30 min). Reproduction follows: {code} jblangston:bin jblangston$ ./cassandra /dev/null jblangston:bin jblangston$ ../tools/bin/cassandra-stress -n 2000 /dev/null jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool flush jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool drain jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ pkill java jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ ./cassandra -f | grep Replaying INFO 10:03:42,915 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566778.log INFO 10:03:42,922 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log INFO 10:03:43,911 Replaying
[jira] [Commented] (CASSANDRA-5911) Commit logs are not removed after nodetool flush or nodetool drain
[ https://issues.apache.org/jira/browse/CASSANDRA-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807262#comment-13807262 ] Vijay commented on CASSANDRA-5911: -- [~rcoli] I think, the issue was there even before 1.0 Commit logs are not removed after nodetool flush or nodetool drain -- Key: CASSANDRA-5911 URL: https://issues.apache.org/jira/browse/CASSANDRA-5911 Project: Cassandra Issue Type: Bug Components: Core Reporter: J.B. Langston Assignee: Vijay Priority: Minor Fix For: 2.0.3 Attachments: 0001-5911-v2.patch, 0001-CASSANDRA-5911.patch, 6528_140171_knwmuqxe9bjv5re_system.log Commit logs are not removed after nodetool flush or nodetool drain. This can lead to unnecessary commit log replay during startup. I've reproduced this on Apache Cassandra 1.2.8. Usually this isn't much of an issue but on a Solr-indexed column family in DSE, each replayed mutation has to be reindexed which can make startup take a long time (on the order of 20-30 min). Reproduction follows: {code} jblangston:bin jblangston$ ./cassandra /dev/null jblangston:bin jblangston$ ../tools/bin/cassandra-stress -n 2000 /dev/null jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool flush jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool drain jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ pkill java jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ ./cassandra -f | grep Replaying INFO 10:03:42,915 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566778.log INFO 10:03:42,922 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log INFO 10:03:43,912 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log INFO 10:03:43,912 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log INFO 10:03:43,912 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566778.log {code} -- This
[jira] [Comment Edited] (CASSANDRA-5911) Commit logs are not removed after nodetool flush or nodetool drain
[ https://issues.apache.org/jira/browse/CASSANDRA-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807257#comment-13807257 ] Vijay edited comment on CASSANDRA-5911 at 10/28/13 9:36 PM: Attached patch has warn message, also make unit tests to do blocking wait till the new segment arrives. Also little more logic to makes sure if we really need to switch... {code} if (!activeSegment.isUnused() activeSegment.id == context.segment) { if (allocator.numSegmentsAvailable() 0 || allocator.createReserveSegments) activateNextArchiveSegment(); else logger.warn(no active commitlog to switch, additional mutations might be replayed if the node is restarted immediatly. See: CASSANDRA-5911); } {code} was (Author: vijay2...@yahoo.com): Attached patch has warn message and fix the test to do blocking wait till the new segment arrives. Also little more logic to makes sure if we really need to switch... {code} if (!activeSegment.isUnused() activeSegment.id == context.segment) { if (allocator.numSegmentsAvailable() 0 || allocator.createReserveSegments) activateNextArchiveSegment(); else logger.warn(no active commitlog to switch, additional mutations might be replayed if the node is restarted immediatly. See: CASSANDRA-5911); } {code} Commit logs are not removed after nodetool flush or nodetool drain -- Key: CASSANDRA-5911 URL: https://issues.apache.org/jira/browse/CASSANDRA-5911 Project: Cassandra Issue Type: Bug Components: Core Reporter: J.B. Langston Assignee: Vijay Priority: Minor Fix For: 2.0.3 Attachments: 0001-5911-v2.patch, 0001-CASSANDRA-5911.patch, 6528_140171_knwmuqxe9bjv5re_system.log Commit logs are not removed after nodetool flush or nodetool drain. This can lead to unnecessary commit log replay during startup. I've reproduced this on Apache Cassandra 1.2.8. Usually this isn't much of an issue but on a Solr-indexed column family in DSE, each replayed mutation has to be reindexed which can make startup take a long time (on the order of 20-30 min). Reproduction follows: {code} jblangston:bin jblangston$ ./cassandra /dev/null jblangston:bin jblangston$ ../tools/bin/cassandra-stress -n 2000 /dev/null jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool flush jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool drain jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ pkill java jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ ./cassandra -f | grep Replaying INFO 10:03:42,915 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566778.log INFO 10:03:42,922 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log INFO 10:03:43,908 Replaying
[jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806473#comment-13806473 ] Vijay commented on CASSANDRA-3578: -- Other option is replace recycle with discard, we can always create new segments and not recycle (instead discard, in sync thread)... we can get rid of the header. We still need to skip the commit log recovery, if its corrupted/partial write... Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, Current-CL.png, Multi-Threded-CL.png, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806193#comment-13806193 ] Vijay commented on CASSANDRA-3578: -- Hi Jonathan, {quote} I must be missing where this gets persisted back to disk {quote} First 4 bytes at the beginning of the file, may be we can get rid of it and stop when the size and checksum doesn't match? But the header is pretty light, and will need one additional seek every 10 seconds (it just marks the end of the file at the beginning of the file just before fsync). {quote} I think allocate needs to write the length to the segment before returning {quote} The first thing the thread does after allocation is writing the size and its checksum are we talking about synchronization in the allocation, so only 1 thread writes the size and end (-1)? currently the atomic operation is only on AtomicLong (position) We might be able to do something similar to the current implementation and without headers with a Read Write lock, where write lock will ensure that we write the end (write -1 to mark the end, lock to ensure no one else overwrites the end marker) just before fsync (but the OS can also write before we force the buffers too)... also that might not be desirable, since it might stall the system like the current one. Not sure if the header is that bad though Let me know what you think, thanks! Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, Current-CL.png, Multi-Threded-CL.png, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801536#comment-13801536 ] Vijay commented on CASSANDRA-3578: -- {quote} It slows down the mutation thread by waiting for commitlog writing mutation is done {quote} Well depends on where you are bottlenecking, updating mmap buffer is not that expensive and its usually cpu intensive, in the other hand it reduces the variability as shown in the stress. Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, Current-CL.png, Multi-Threded-CL.png, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-3578: - Attachment: ComitlogStress.java Micro benchmark code attached, try's to update commit log as fast as possible (choose a small mutation to avoid active segment starvation, we are still creating ~1 CL per second). It was creating a commit log segment per second, not sure if this is valid comparison to real world at this time. But the good part it is that it the patch consumes less memory and has a less swings. http://pastebin.com/WeJ0QL8p Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-3578: - Attachment: Multi-Threded-CL.png Current-CL.png Hi Jonathan, Ohhh you can ignore those i was experimenting few other things (like UUID.random was locking and the numbers where all bad, etc) and hence added those metrics (didn't mean to confuse). But if you are interested with the GC profile please see the attached. Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, Current-CL.png, Multi-Threded-CL.png, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6206) Thrift socket listen backlog
[ https://issues.apache.org/jira/browse/CASSANDRA-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800956#comment-13800956 ] Vijay commented on CASSANDRA-6206: -- Nenad, Can you change have the default backlog config to be java default? Thrift socket listen backlog Key: CASSANDRA-6206 URL: https://issues.apache.org/jira/browse/CASSANDRA-6206 Project: Cassandra Issue Type: Bug Components: Core Environment: Debian Linux, Java 7 Reporter: Nenad Merdanovic Fix For: 2.0.2 Attachments: cassandra.patch Although Thrift is a depreciated method of accessing Cassandra, default backlog is way too low on that socket. It shouldn't be a problem to implement it and I am including a POC patch for this (sorry, really low on time with limited Java knowledge so just to give an idea). This is an old report which was never addressed and the bug remains till this day, except in my case I have a much larger scale application with 3rd party software which I cannot modify to include connection pooling: https://issues.apache.org/jira/browse/CASSANDRA-1663 There is also a pending change in the Thrift itself which Cassandra should be able to use for parts using TServerSocket (SSL): https://issues.apache.org/jira/browse/THRIFT-1868 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6206) Thrift socket listen backlog
[ https://issues.apache.org/jira/browse/CASSANDRA-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800991#comment-13800991 ] Vijay commented on CASSANDRA-6206: -- Hi Nenad, Yep, thanks! Thrift socket listen backlog Key: CASSANDRA-6206 URL: https://issues.apache.org/jira/browse/CASSANDRA-6206 Project: Cassandra Issue Type: Bug Components: Core Environment: Debian Linux, Java 7 Reporter: Nenad Merdanovic Fix For: 2.0.2 Attachments: cassandra.patch Although Thrift is a depreciated method of accessing Cassandra, default backlog is way too low on that socket. It shouldn't be a problem to implement it and I am including a POC patch for this (sorry, really low on time with limited Java knowledge so just to give an idea). This is an old report which was never addressed and the bug remains till this day, except in my case I have a much larger scale application with 3rd party software which I cannot modify to include connection pooling: https://issues.apache.org/jira/browse/CASSANDRA-1663 There is also a pending change in the Thrift itself which Cassandra should be able to use for parts using TServerSocket (SSL): https://issues.apache.org/jira/browse/THRIFT-1868 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801163#comment-13801163 ] Vijay commented on CASSANDRA-3578: -- Yeah we do CAS instead of queue.take() in http://goo.gl/JbNWM5 , but we do allocate new segments every second, not sure why the dip... will do more profiling on it. Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, Current-CL.png, Multi-Threded-CL.png, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801496#comment-13801496 ] Vijay commented on CASSANDRA-3578: -- Found the bottleneck in the current! Actually this happens during buffer.force()... CL.add queue is capped by commitlog_periodic_queue_size {code} public int commitlog_periodic_queue_size = 1024 * FBUtilities.getAvailableProcessors(); {code} Hence, till we flush() (is called every 10 seconds) the writes to the CL us blocked. Hope that makes sense... Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, Current-CL.png, Multi-Threded-CL.png, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801496#comment-13801496 ] Vijay edited comment on CASSANDRA-3578 at 10/22/13 5:28 AM: Found the bottleneck in the current! Actually this happens during buffer.force()... CL.add queue is capped by commitlog_periodic_queue_size {code} public int commitlog_periodic_queue_size = 1024 * FBUtilities.getAvailableProcessors(); {code} Hence, till we flush() (is called every 10 seconds) the writes to the CL is blocked. Hope that makes sense... was (Author: vijay2...@yahoo.com): Found the bottleneck in the current! Actually this happens during buffer.force()... CL.add queue is capped by commitlog_periodic_queue_size {code} public int commitlog_periodic_queue_size = 1024 * FBUtilities.getAvailableProcessors(); {code} Hence, till we flush() (is called every 10 seconds) the writes to the CL us blocked. Hope that makes sense... Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, Current-CL.png, Multi-Threded-CL.png, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6206) Thrift socket listen backlog
[ https://issues.apache.org/jira/browse/CASSANDRA-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801518#comment-13801518 ] Vijay commented on CASSANDRA-6206: -- v2 doesn't apply clean (in future may be use git patch). I will also change SSLFactory.getServerSocket to use this configuration. ping [~xedin] for HSHA for THRIFT-1868, Should we leave this ticket open till THRIFT-1868 gets resolved and/or also till 2.1 (changes the yaml configuration)? Thrift socket listen backlog Key: CASSANDRA-6206 URL: https://issues.apache.org/jira/browse/CASSANDRA-6206 Project: Cassandra Issue Type: Bug Components: Core Environment: Debian Linux, Java 7 Reporter: Nenad Merdanovic Fix For: 2.0.2 Attachments: cassandra.patch, cassandra-v2.patch Although Thrift is a depreciated method of accessing Cassandra, default backlog is way too low on that socket. It shouldn't be a problem to implement it and I am including a POC patch for this (sorry, really low on time with limited Java knowledge so just to give an idea). This is an old report which was never addressed and the bug remains till this day, except in my case I have a much larger scale application with 3rd party software which I cannot modify to include connection pooling: https://issues.apache.org/jira/browse/CASSANDRA-1663 There is also a pending change in the Thrift itself which Cassandra should be able to use for parts using TServerSocket (SSL): https://issues.apache.org/jira/browse/THRIFT-1868 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (CASSANDRA-6206) Thrift socket listen backlog
[ https://issues.apache.org/jira/browse/CASSANDRA-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801518#comment-13801518 ] Vijay edited comment on CASSANDRA-6206 at 10/22/13 5:54 AM: v2 doesn't apply clean (in future may be use git patch). I will also change SSLFactory.getServerSocket to use this configuration. ping [~xedin] for HSHA change after THRIFT-1868, Should we leave this ticket open till THRIFT-1868 gets resolved and/or also till 2.1 (changes the yaml configuration)? was (Author: vijay2...@yahoo.com): v2 doesn't apply clean (in future may be use git patch). I will also change SSLFactory.getServerSocket to use this configuration. ping [~xedin] for HSHA for THRIFT-1868, Should we leave this ticket open till THRIFT-1868 gets resolved and/or also till 2.1 (changes the yaml configuration)? Thrift socket listen backlog Key: CASSANDRA-6206 URL: https://issues.apache.org/jira/browse/CASSANDRA-6206 Project: Cassandra Issue Type: Bug Components: Core Environment: Debian Linux, Java 7 Reporter: Nenad Merdanovic Fix For: 2.0.2 Attachments: cassandra.patch, cassandra-v2.patch Although Thrift is a depreciated method of accessing Cassandra, default backlog is way too low on that socket. It shouldn't be a problem to implement it and I am including a POC patch for this (sorry, really low on time with limited Java knowledge so just to give an idea). This is an old report which was never addressed and the bug remains till this day, except in my case I have a much larger scale application with 3rd party software which I cannot modify to include connection pooling: https://issues.apache.org/jira/browse/CASSANDRA-1663 There is also a pending change in the Thrift itself which Cassandra should be able to use for parts using TServerSocket (SSL): https://issues.apache.org/jira/browse/THRIFT-1868 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6218) Reduce WAN traffic while doing repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799818#comment-13799818 ] Vijay commented on CASSANDRA-6218: -- Wont it be simpler to just forward (Similar to write forwards) to the difference of (A, B) and (C, D) to each other (after initial repair) than initiating another repair again between (A, B) and (C, D) in step 3? Another possible option: Consider (DC1: A, B, C and DC2: X, Y, Z) Start Merkel tree comparison between all the nodes, once the differences is identified: Stream within the DC and then across the DC using a proxy or a forwarder node picked. (A, B, C to X) and then (X, Y, Z to A) Now both the DC has all the inconsistent data hence they can stream the ranges which where identified as inconsistent Reduce WAN traffic while doing repairs -- Key: CASSANDRA-6218 URL: https://issues.apache.org/jira/browse/CASSANDRA-6218 Project: Cassandra Issue Type: Improvement Components: Core Reporter: sankalp kohli Priority: Minor The way we send out data that does not match over WAN can be improved. Example: Say there are four nodes(A,B,C,D) which are replica of a range we are repairing. A, B is in DC1 and C,D is in DC2. If A does not have the data which other replicas have, then we will have following streams 1) A to B and back 2) A to C and back(Goes over WAN) 3) A to D and back(Goes over WAN) One of the ways of doing it to reduce WAN traffic is this. 1) Repair A and B only with each other and C and D with each other starting at same time t. 2) Once these repairs have finished, A,B and C,D are in sync with respect to time t. 3) Now run a repair between A and C, the streams which are exchanged as a result of the diff will also be streamed to B and D via A and C(C and D behaves like a proxy to the streams). For a replication of DC1:2,DC2:2, the WAN traffic will get reduced by 50% and even more for higher replication factors. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (CASSANDRA-6218) Reduce WAN traffic while doing repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799818#comment-13799818 ] Vijay edited comment on CASSANDRA-6218 at 10/19/13 7:04 AM: Wont it be simpler to just forward (Similar to write forwards) to the difference of (A, B) and (C, D) to each other (after initial repair) than initiating another repair again between (A, B) and (C, D) in step 3? Another possible option: Consider (DC1: A, B, C and DC2: X, Y, Z) Start Merkel tree comparison between all the nodes, once the differences is identified: Stream within the DC and then across the DC using a proxy or a forwarder node picked. (A, B, C to X) and then (X, Y, Z to A) Now both the DC's have consistent data hence the proxy/forwarder can stream the ranges which where identified as inconsistent in the Merkel comparison was (Author: vijay2...@yahoo.com): Wont it be simpler to just forward (Similar to write forwards) to the difference of (A, B) and (C, D) to each other (after initial repair) than initiating another repair again between (A, B) and (C, D) in step 3? Another possible option: Consider (DC1: A, B, C and DC2: X, Y, Z) Start Merkel tree comparison between all the nodes, once the differences is identified: Stream within the DC and then across the DC using a proxy or a forwarder node picked. (A, B, C to X) and then (X, Y, Z to A) Now both the DC has all the inconsistent data hence they can stream the ranges which where identified as inconsistent Reduce WAN traffic while doing repairs -- Key: CASSANDRA-6218 URL: https://issues.apache.org/jira/browse/CASSANDRA-6218 Project: Cassandra Issue Type: Improvement Components: Core Reporter: sankalp kohli Priority: Minor The way we send out data that does not match over WAN can be improved. Example: Say there are four nodes(A,B,C,D) which are replica of a range we are repairing. A, B is in DC1 and C,D is in DC2. If A does not have the data which other replicas have, then we will have following streams 1) A to B and back 2) A to C and back(Goes over WAN) 3) A to D and back(Goes over WAN) One of the ways of doing it to reduce WAN traffic is this. 1) Repair A and B only with each other and C and D with each other starting at same time t. 2) Once these repairs have finished, A,B and C,D are in sync with respect to time t. 3) Now run a repair between A and C, the streams which are exchanged as a result of the diff will also be streamed to B and D via A and C(C and D behaves like a proxy to the streams). For a replication of DC1:2,DC2:2, the WAN traffic will get reduced by 50% and even more for higher replication factors. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799612#comment-13799612 ] Vijay commented on CASSANDRA-3578: -- Ohhh great idea, will give it a shot... Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798746#comment-13798746 ] Vijay commented on CASSANDRA-3578: -- Pushed my changes to https://github.com/Vijay2win/cassandra/commits/3578 * The above takes a different approach, we update commit log as a part of the mutation thread and no more threads to deal with serialization. CAS operation to reserve a block of bytes in the MMapped segment (Similar to slab allocator) and activate segments. * Sync is managed in the separate thread still. * Doesn't have a end of segment on each mutation, we just have header which will hold the end. We could clean up little more if it looks good. Performance test shows a slight improvements... May be once we remove other bottlenecks the improvements (also have to test on spinning drives) will be more visible. Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (CASSANDRA-3578) Multithreaded commitlog
[ https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay reassigned CASSANDRA-3578: Assignee: Vijay Multithreaded commitlog --- Key: CASSANDRA-3578 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Labels: performance Attachments: 0001-CASSANDRA-3578.patch, parallel_commit_log_2.patch Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate) can improve performance, since you're not bottlenecking on a single thread to do all the copying and CRC computation. Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable. (moved from CASSANDRA-622, which was getting a bit muddled.) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-5911) Commit logs are not removed after nodetool flush or nodetool drain
[ https://issues.apache.org/jira/browse/CASSANDRA-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-5911: - Attachment: 0001-CASSANDRA-5911.patch Hi Jonathan, Yeah it will replay the active segment currently ( 2. above ). Please see the attached which provides a alternative, we can just switch to a different segment before it is full on flush. Commit logs are not removed after nodetool flush or nodetool drain -- Key: CASSANDRA-5911 URL: https://issues.apache.org/jira/browse/CASSANDRA-5911 Project: Cassandra Issue Type: Bug Components: Core Reporter: J.B. Langston Assignee: Vijay Priority: Minor Fix For: 2.0.2 Attachments: 0001-CASSANDRA-5911.patch, 6528_140171_knwmuqxe9bjv5re_system.log Commit logs are not removed after nodetool flush or nodetool drain. This can lead to unnecessary commit log replay during startup. I've reproduced this on Apache Cassandra 1.2.8. Usually this isn't much of an issue but on a Solr-indexed column family in DSE, each replayed mutation has to be reindexed which can make startup take a long time (on the order of 20-30 min). Reproduction follows: {code} jblangston:bin jblangston$ ./cassandra /dev/null jblangston:bin jblangston$ ../tools/bin/cassandra-stress -n 2000 /dev/null jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool flush jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool drain jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ pkill java jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ ./cassandra -f | grep Replaying INFO 10:03:42,915 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566778.log INFO 10:03:42,922 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log INFO 10:03:43,912 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log INFO 10:03:43,912 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log INFO
[jira] [Commented] (CASSANDRA-4681) SlabAllocator spends a lot of time in Thread.yield
[ https://issues.apache.org/jira/browse/CASSANDRA-4681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782230#comment-13782230 ] Vijay commented on CASSANDRA-4681: -- Hi Jonathan, Doesn't showing any changes in TPS, please see the attached (tried with stress 50/200 threads and concurrent_writes 32/256... all the runs are attached). http://pastebin.com/JDFqgcZN SlabAllocator spends a lot of time in Thread.yield -- Key: CASSANDRA-4681 URL: https://issues.apache.org/jira/browse/CASSANDRA-4681 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.5 Environment: OEL Linux Reporter: Oleg Kibirev Assignee: Jonathan Ellis Priority: Minor Labels: performance Attachments: 4681-v3.txt, 4691-short-circuit.txt, 4691-v3-rebased.txt, SlabAllocator.java, SlabAllocator.java.list, slab-list.patch When profiling high volume inserts into Cassandra running on a host with fast SSD and CPU, Thread.yield() invoked by SlabAllocator appeared as the top item in CPU samples. The fix is to return a regular byte buffer if current slab is being initialized by another thread. So instead of: if (oldOffset == UNINITIALIZED) { // The region doesn't have its data allocated yet. // Since we found this in currentRegion, we know that whoever // CAS-ed it there is allocating it right now. So spin-loop // shouldn't spin long! Thread.yield(); continue; } do: if (oldOffset == UNINITIALIZED) return ByteBuffer.allocate(size); I achieved 4x speed up in my (admittedly specialized) benchmark by using an optimized version of SlabAllocator attached. Since this code is in the critical path, even doing excessive atomic instructions or allocating unneeded extra ByteBuffer instances has a measurable effect on performance -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13781214#comment-13781214 ] Vijay commented on CASSANDRA-5549: -- Hi Ryan, Can you give a shot at https://github.com/Vijay2win/cassandra/commits/5549-v2 on 10M keys atleast. Rebased [~jbellis] branch I moved the CommitLogAllocator forceFlush back to separate thread, removed isDirty boolean since isClean is called in a separate thread and hence shouldn't help performance on writes, rest is all Jonathan... My benchmark on 32 physical core machine shows a better performance than earlier. ~72 vs ~84 http://pastebin.com/GRPMUcSB Remove Table.switchLock --- Key: CASSANDRA-5549 URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Vijay Labels: performance Fix For: 2.1 Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write path. ReentrantReadWriteLock is not lightweight, even if there is no contention per se between readers and writers of the lock (in Cassandra, memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-5357) Query cache
[ https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777088#comment-13777088 ] Vijay commented on CASSANDRA-5357: -- {quote} So the cost is quite high vs having live filters {quote} Some synthetic test show very low over head on the filter deserialization http://pastebin.com/VNREA8fG. IMHO... Exist check might not be that bad, since 99% (thats a assumption) of the queries will have the same query filters on them. For those queries which are discreet and present in the cache (survived the LRU), i think it is fair to take a hit than letting it live in JVM. Filters may be big in some cases (like named filters, or filters with long string names) and even an optimal case of empty strings we still need a minimum of 2 BB, count and the data structures in memory. Hence a compact storage off-heap might be good. One other option which we where discussing little earlier, to optimize the filters in the cache by trying to find the optimal cache filter entry by merging similar and overlapping queries will help the above. {quote} I'm not concerned about that so much as, do we keep within our total memory budget? {quote} Ahaa got it, so we need an additional parameter for the cache which says how much memory is available in the JVM for the cached keys... i will add it to the next revision. Query cache --- Key: CASSANDRA-5357 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Vijay I think that most people expect the row cache to act like a query cache, because that's a reasonable model. Caching the entire partition is, in retrospect, not really reasonable, so it's not surprising that it catches people off guard, especially given the confusion we've inflicted on ourselves as to what a row constitutes. I propose replacing it with a true query cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5357) Query cache
[ https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775967#comment-13775967 ] Vijay commented on CASSANDRA-5357: -- {quote} I'm saying just shove the ColumnFamily payload off-heap but leave the rest live. {quote} Sure but that can cause more memory pressure in the JVM, IMHO (cost vs benefit) its not that bad to deserialize the filters at least in the stress tests i did. {quote} I'm not sure I understand exactly how the problem happens here. {quote} The problem is when the whole row (lets say multiple MB's) column family is cached, instead of de-serializing the whole column family at once we can de-serialize it during filter in CFS.filterColumnFamily, hence the QC should return a iterator instead of CF... Makes sense? Query cache --- Key: CASSANDRA-5357 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Vijay I think that most people expect the row cache to act like a query cache, because that's a reasonable model. Caching the entire partition is, in retrospect, not really reasonable, so it's not surprising that it catches people off guard, especially given the confusion we've inflicted on ourselves as to what a row constitutes. I propose replacing it with a true query cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5357) Query cache
[ https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774291#comment-13774291 ] Vijay commented on CASSANDRA-5357: -- Hi Jonathan, I have pushed a version with sentinel (might have made it little hackie, but it works) https://github.com/Vijay2win/cassandra/commits/query_cache_v2. {quote} Serializing the entire QueryCacheValue for each lookup is going to kill performance on hot partitions. {quote} It is required because we need to know the query which populated the cache, for example there can be a named query for Column A, Z which can be followed by a slice query from A to Z and we might not respond with the right response since B to Y is not in the cache. In a separate ticket we can also optimize the above case (and more) cache query's stored, if thats ok. Example: If the slice with 250 is stored why to also store the slice with 50 in the same range, we can also merge overlapping slices etc. {quote} if there's room, that's fine, but exceeding the configured memory budget is Bad {quote} Can we do that in a separate ticket?, i believe we can achieve this by implementing a Iterator which will be similar to SSTableIterator to stream the columns than constructing the ColumnFamily at once. Thanks! Query cache --- Key: CASSANDRA-5357 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Vijay I think that most people expect the row cache to act like a query cache, because that's a reasonable model. Caching the entire partition is, in retrospect, not really reasonable, so it's not surprising that it catches people off guard, especially given the confusion we've inflicted on ourselves as to what a row constitutes. I propose replacing it with a true query cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (CASSANDRA-5357) Query cache
[ https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774291#comment-13774291 ] Vijay edited comment on CASSANDRA-5357 at 9/23/13 4:48 AM: --- Hi Jonathan, I have pushed a version with sentinel (might have made it little hackie, but it works) https://github.com/Vijay2win/cassandra/commits/query_cache_v2. {quote} Serializing the entire QueryCacheValue for each lookup is going to kill performance on hot partitions. {quote} It is required because we need to know the query which populated the cache, for example there can be a named query for Column A, Z which can be followed by a slice query from A to Z and we might not respond with the right response since B to Y is not in the cache. In a separate ticket we can also optimize the above case (and more) cache query's stored, if thats ok. Example: If the slice with count as 250 is stored we might not need to store the slice with count of 50 with same range, we can also merge overlapping slices etc. {quote} if there's room, that's fine, but exceeding the configured memory budget is Bad {quote} Can we do that in a separate ticket?, i believe we can achieve this by implementing a Iterator which will be similar to SSTableIterator to stream the columns than constructing the ColumnFamily at once. Thanks! was (Author: vijay2...@yahoo.com): Hi Jonathan, I have pushed a version with sentinel (might have made it little hackie, but it works) https://github.com/Vijay2win/cassandra/commits/query_cache_v2. {quote} Serializing the entire QueryCacheValue for each lookup is going to kill performance on hot partitions. {quote} It is required because we need to know the query which populated the cache, for example there can be a named query for Column A, Z which can be followed by a slice query from A to Z and we might not respond with the right response since B to Y is not in the cache. In a separate ticket we can also optimize the above case (and more) cache query's stored, if thats ok. Example: If the slice with 250 is stored why to also store the slice with 50 in the same range, we can also merge overlapping slices etc. {quote} if there's room, that's fine, but exceeding the configured memory budget is Bad {quote} Can we do that in a separate ticket?, i believe we can achieve this by implementing a Iterator which will be similar to SSTableIterator to stream the columns than constructing the ColumnFamily at once. Thanks! Query cache --- Key: CASSANDRA-5357 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Vijay I think that most people expect the row cache to act like a query cache, because that's a reasonable model. Caching the entire partition is, in retrospect, not really reasonable, so it's not surprising that it catches people off guard, especially given the confusion we've inflicted on ourselves as to what a row constitutes. I propose replacing it with a true query cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-1956) Convert row cache to row+filter cache
[ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay resolved CASSANDRA-1956. -- Resolution: Duplicate Yep, Closing this as it is duplicate to CASSANDRA-5357. Convert row cache to row+filter cache - Key: CASSANDRA-1956 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Stu Hood Assignee: Vijay Priority: Minor Fix For: 2.1 Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter. Possible implementations: * (copout) Cache a single filter per row, and leave the cache key as is * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard * Change the cache key to rowkey+filterid: basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable * others? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4165) Generate Digest file for compressed SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773543#comment-13773543 ] Vijay commented on CASSANDRA-4165: -- Hi Jonathan, 3648 actually adds block level CRC for uncompressed files and writes to a separate file (CRC.db), and uses it during the streaming parts of the file to validate before streaming (not during normal reads). Hence we need 2 Checksums during the flush 1 for blocks and the md5 for the whole file. Generate Digest file for compressed SSTables Key: CASSANDRA-4165 URL: https://issues.apache.org/jira/browse/CASSANDRA-4165 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Assignee: Marcus Eriksson Priority: Minor Attachments: 0001-Generate-digest-for-compressed-files-as-well.patch, 4165-rebased.txt We use the generated *Digest.sha1-files to verify backups, would be nice if they were generated for compressed sstables as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-6031) Remove code to load pre-1.2 caches
[ https://issues.apache.org/jira/browse/CASSANDRA-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767119#comment-13767119 ] Vijay commented on CASSANDRA-6031: -- +1 Remove code to load pre-1.2 caches -- Key: CASSANDRA-6031 URL: https://issues.apache.org/jira/browse/CASSANDRA-6031 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Jonathan Ellis Attachments: remove-deprecated-cache-load-method.txt AutoSavingCache has been deprecated since 1.2 and exists to read pre-CASSANDRA-3762 caches. It is thus safe to remove in 2.0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5933) 2.0 read performance is slower than 1.2
[ https://issues.apache.org/jira/browse/CASSANDRA-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756323#comment-13756323 ] Vijay commented on CASSANDRA-5933: -- Ryan, Do you mind testing the custom with 5 to 10 ms... I am thinking, we might need enough sample for Percentiles to make more sense (if conformed we might want to wait till the samples arrive etc). 2.0 read performance is slower than 1.2 --- Key: CASSANDRA-5933 URL: https://issues.apache.org/jira/browse/CASSANDRA-5933 Project: Cassandra Issue Type: Bug Reporter: Ryan McGuire Attachments: 1.2-faster-than-2.0.png, 1.2-faster-than-2.0-stats.png Over the course of several tests I have observed that 2.0 read performance is noticeably slower than 1.2 Example: Blue line is 1.2, the rest are various forms of 2.0 rc1 (I've also seen this on rc2, just don't have a good graph handy) !1.2-faster-than-2.0.png! !1.2-faster-than-2.0-stats.png! [See test data here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.node_killed.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5933) 2.0 read performance is slower than 1.2
[ https://issues.apache.org/jira/browse/CASSANDRA-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756329#comment-13756329 ] Vijay commented on CASSANDRA-5933: -- Hi Ryan, You can set a custom speculative execution like the below... {code} update column family Standard1 with speculative_retry=10ms; {code} 2.0 read performance is slower than 1.2 --- Key: CASSANDRA-5933 URL: https://issues.apache.org/jira/browse/CASSANDRA-5933 Project: Cassandra Issue Type: Bug Reporter: Ryan McGuire Attachments: 1.2-faster-than-2.0.png, 1.2-faster-than-2.0-stats.png Over the course of several tests I have observed that 2.0 read performance is noticeably slower than 1.2 Example: Blue line is 1.2, the rest are various forms of 2.0 rc1 (I've also seen this on rc2, just don't have a good graph handy) !1.2-faster-than-2.0.png! !1.2-faster-than-2.0-stats.png! [See test data here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.eager_retry.node_killed.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5952) report compression ratio via nodetool cfstats
[ https://issues.apache.org/jira/browse/CASSANDRA-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-5952: - Reviewer: jbellis report compression ratio via nodetool cfstats - Key: CASSANDRA-5952 URL: https://issues.apache.org/jira/browse/CASSANDRA-5952 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Robert Coli Assignee: Vijay Priority: Trivial Fix For: 1.2.10, 2.0.1 Attachments: 0001-CASSANDRA-5952.patch CASSANDRA-3393 adds a getCompressionRatio JMX call, and was originally supposed to also expose this value per CF via nodetool cfstats. However, the nodetool cfstats part was not done in CASSANDRA-3393. This ticket serves as a request to expose this valuable data about compression via nodetool cfstats. (cc: [~vijay2...@yahoo.com], who did the CASSANDRA-3393 work) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5952) report compression ratio via nodetool cfstats
[ https://issues.apache.org/jira/browse/CASSANDRA-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-5952: - Attachment: 0001-CASSANDRA-5952.patch One liner change to expose via NT, Thanks! report compression ratio via nodetool cfstats - Key: CASSANDRA-5952 URL: https://issues.apache.org/jira/browse/CASSANDRA-5952 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Robert Coli Assignee: Vijay Priority: Trivial Attachments: 0001-CASSANDRA-5952.patch CASSANDRA-3393 adds a getCompressionRatio JMX call, and was originally supposed to also expose this value per CF via nodetool cfstats. However, the nodetool cfstats part was not done in CASSANDRA-3393. This ticket serves as a request to expose this valuable data about compression via nodetool cfstats. (cc: [~vijay2...@yahoo.com], who did the CASSANDRA-3393 work) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (CASSANDRA-5952) report compression ratio via nodetool cfstats
[ https://issues.apache.org/jira/browse/CASSANDRA-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay reassigned CASSANDRA-5952: Assignee: Vijay report compression ratio via nodetool cfstats - Key: CASSANDRA-5952 URL: https://issues.apache.org/jira/browse/CASSANDRA-5952 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Robert Coli Assignee: Vijay Priority: Trivial CASSANDRA-3393 adds a getCompressionRatio JMX call, and was originally supposed to also expose this value per CF via nodetool cfstats. However, the nodetool cfstats part was not done in CASSANDRA-3393. This ticket serves as a request to expose this valuable data about compression via nodetool cfstats. (cc: [~vijay2...@yahoo.com], who did the CASSANDRA-3393 work) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5939) Cache Providers calculate very different row sizes
[ https://issues.apache.org/jira/browse/CASSANDRA-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754387#comment-13754387 ] Vijay commented on CASSANDRA-5939: -- {quote} While java has overhead, it's not... {quote} Well try the following code in CacheProviderTest {code} @Test public void testCompareSizes() throws IOException { RowCacheKey key = new RowCacheKey(UUID.randomUUID(), ByteBufferUtil.bytes(test)); ColumnFamily cf = createCF(); System.out.println(size: + (key.memorySize() + cf.memorySize())); System.out.println(key size: + key.memorySize()); System.out.println(value size: + cf.memorySize()); RowCacheSerializer serializer = new RowCacheSerializer(); DataOutputBuffer out = new DataOutputBuffer(); serializer.serialize(cf, out); System.out.println(ser size: + out.getLength()); IRowCacheEntry cf2 = serializer.deserialize(new DataInputStream(new ByteArrayInputStream(out.getData(; Assert.assertEquals(cf, cf2); } {code} output (actually value/CF overhead memorySize uses measureDeep() JAMM) {code} size:74120 key size:48 value size:74072 ser size:66 {code} I am just trying to figure out if there is any bug I am missing/overlooking. I agree that we need to have a configuration for the key size in JVM heap to contain OOM's etc. We can use this ticket to solve that issue. I do understand, we have removed CLHM in 2.0 so we can concentrate on getting a better configuration for SC. Cache Providers calculate very different row sizes -- Key: CASSANDRA-5939 URL: https://issues.apache.org/jira/browse/CASSANDRA-5939 Project: Cassandra Issue Type: Bug Components: Core Environment: 1.2.8 Reporter: Chris Burroughs Assignee: Vijay Took the same production node and bounced it 4 times comparing version and cache provider. ConcurrentLinkedHashCacheProvider and SerializingCacheProvider produce very different results resulting in an order of magnitude difference in rows cached. In all cases the row cache size was 2048 MB. Hit rate is provided for color, but entries size are the important part. 1.2.8 ConcurrentLinkedHashCacheProvider: * entries: 23,217 * hit rate: 43% * size: 2,147,398,344 1.2.8 about 20 minutes of SerializingCacheProvider: * entries: 221,709 * hit rate: 68% * size: 18,417254 1.2.5 ConcurrentLinkedHashCacheProvider: * entries: 25,967 * hit rate: ~ 50% * size: 2,147,421,704 1.2.5 about 20 minutes of SerializingCacheProvider: * entries: 228,457 * hit rate: ~ 70% * size: 19,070,315 A related(?) problem is that the ConcurrentLinkedHashCacheProvider sizes seem to be highly variable. Digging up the values for 5 different nodes in the cluster using ConcurrentLinkedHashCacheProvider shows a wide variance in number of entries: * 12k * 444k * 10k * 25k * 25k -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5939) Cache Providers calculate very different row sizes
[ https://issues.apache.org/jira/browse/CASSANDRA-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753061#comment-13753061 ] Vijay commented on CASSANDRA-5939: -- Chris, Not sure if i understand the question/issue right... If the question is whats the difference between SC and CLHM in terms of memory overhead? CLHM Entry's (Key and Value) weight is calculated, where as SC we only weigh the values (which is off-heap) and we don't weigh the size of the keys in the heap (since it is kind of hybrid foot print's). CLHM has java's Object overhead (look https://issues.apache.org/jira/browse/CASSANDRA-4860?focusedCommentId=13632991page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13632991), SC we Encode the bytes hence it will be considerably low overhead of value's in memory. Your milage also may vary depending on the size of the columns. Cache Providers calculate very different row sizes -- Key: CASSANDRA-5939 URL: https://issues.apache.org/jira/browse/CASSANDRA-5939 Project: Cassandra Issue Type: Bug Components: Core Environment: 1.2.8 Reporter: Chris Burroughs Assignee: Vijay Took the same production node and bounced it 4 times comparing version and cache provider. ConcurrentLinkedHashCacheProvider and SerializingCacheProvider produce very different results resulting in an order of magnitude difference in rows cached. In all cases the row cache size was 2048 MB. Hit rate is provided for color, but entries size are the important part. 1.2.8 ConcurrentLinkedHashCacheProvider: * entries: 23,217 * hit rate: 43% * size: 2,147,398,344 1.2.8 about 20 minutes of SerializingCacheProvider: * entries: 221,709 * hit rate: 68% * size: 18,417254 1.2.5 ConcurrentLinkedHashCacheProvider: * entries: 25,967 * hit rate: ~ 50% * size: 2,147,421,704 1.2.5 about 20 minutes of SerializingCacheProvider: * entries: 228,457 * hit rate: ~ 70% * size: 19,070,315 A related(?) problem is that the ConcurrentLinkedHashCacheProvider sizes seem to be highly variable. Digging up the values for 5 different nodes in the cluster using ConcurrentLinkedHashCacheProvider shows a wide variance in number of entries: * 12k * 444k * 10k * 25k * 25k -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5909) CommitLogReplayer date time issue
[ https://issues.apache.org/jira/browse/CASSANDRA-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-5909: - Attachment: 0001-CASSANDRA-5909.patch Attached patch and test case as a fix to add precision. Thanks! CommitLogReplayer date time issue -- Key: CASSANDRA-5909 URL: https://issues.apache.org/jira/browse/CASSANDRA-5909 Project: Cassandra Issue Type: Bug Components: Core Reporter: Artur Kronenberg Assignee: Vijay Priority: Minor Fix For: 1.2.10 Attachments: 0001-CASSANDRA-5909.patch Hi, First off I am sorry if the component is not right for this. I am trying to get the point-in-time backup to work. And I ran into the following issues: 1. The documentation in the commitlog_archiving.properties seems to be out of date, as the example date format is no more valid and can't be parsed. 2. The restore_point_in_time property seems to differ from the actual maxTimeStamp. I added additional logging to the codebase in the class CommitLogReplayer like that: protected boolean pointInTimeExceeded(RowMutation frm) { long restoreTarget = CommitLog.instance.archiver.restorePointInTime; logger.info(String.valueOf(restoreTarget)); for (ColumnFamily families : frm.getColumnFamilies()) { logger.info(String.valueOf(families.maxTimestamp())); if (families.maxTimestamp() restoreTarget) return true; } return false; } The following output can be seen: The restoreTarget timestamp is: 1377015783000 This has been correctly parsed as I added this date to the properties: 2013:08:20 17:23:03 the value for families.maxTimestamp() is: 1377009021033000 This date corresponds to: Mon 45605-09-05 10:50:33 BST (44 millennia from now) It seems like the timestamp has 3 additional zeros. This also means that the code can never return false on the call, as the restoreTarget will always be smaller then the maxTimestamp(). Therefore the Replayer can never replay any of my commitlog files. The timestamp minus the 3 zeros corresponds to Tue 2013-08-20 15:30:21 BST (23 hours ago) which makes more sense and would allow for the replay to work. My config: Cassandra-1.2.4 Java 1.6 Ubuntu 12.04 64bit If you need any more information let me know and I'll be happy to suply whatever info I can. -- artur -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5909) CommitLogReplayer date time issue
[ https://issues.apache.org/jira/browse/CASSANDRA-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750875#comment-13750875 ] Vijay commented on CASSANDRA-5909: -- Ahaaa looks like we need a configuration for Milli/Micro second precisions, Users should not mix those in a cluster to have a reliable delete and updates, so it should be fine. The other option is to write additional long field while storing the RM in the commit log. CommitLogReplayer date time issue -- Key: CASSANDRA-5909 URL: https://issues.apache.org/jira/browse/CASSANDRA-5909 Project: Cassandra Issue Type: Bug Components: Core Reporter: Artur Kronenberg Assignee: Vijay Priority: Minor Fix For: 1.2.10 Hi, First off I am sorry if the component is not right for this. I am trying to get the point-in-time backup to work. And I ran into the following issues: 1. The documentation in the commitlog_archiving.properties seems to be out of date, as the example date format is no more valid and can't be parsed. 2. The restore_point_in_time property seems to differ from the actual maxTimeStamp. I added additional logging to the codebase in the class CommitLogReplayer like that: protected boolean pointInTimeExceeded(RowMutation frm) { long restoreTarget = CommitLog.instance.archiver.restorePointInTime; logger.info(String.valueOf(restoreTarget)); for (ColumnFamily families : frm.getColumnFamilies()) { logger.info(String.valueOf(families.maxTimestamp())); if (families.maxTimestamp() restoreTarget) return true; } return false; } The following output can be seen: The restoreTarget timestamp is: 1377015783000 This has been correctly parsed as I added this date to the properties: 2013:08:20 17:23:03 the value for families.maxTimestamp() is: 1377009021033000 This date corresponds to: Mon 45605-09-05 10:50:33 BST (44 millennia from now) It seems like the timestamp has 3 additional zeros. This also means that the code can never return false on the call, as the restoreTarget will always be smaller then the maxTimestamp(). Therefore the Replayer can never replay any of my commitlog files. The timestamp minus the 3 zeros corresponds to Tue 2013-08-20 15:30:21 BST (23 hours ago) which makes more sense and would allow for the replay to work. My config: Cassandra-1.2.4 Java 1.6 Ubuntu 12.04 64bit If you need any more information let me know and I'll be happy to suply whatever info I can. -- artur -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5911) Commit logs are not removed after nodetool flush or nodetool drain
[ https://issues.apache.org/jira/browse/CASSANDRA-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748365#comment-13748365 ] Vijay commented on CASSANDRA-5911: -- 1) Even though the logs show Replaying, we will not replay anything since END_OF_SEGMENT_MARKER is placed in the beginning of the file. - We can improve the logging, so we don't print CL which we are skipping after reading first 4 bytes. 2) Only the active segment is replayed, even if we Flush the CL since we have not recycled. - One way I can think of to avoid replaying on active segment, with a performance hit is to have a metadata file which might hold the info on CF dirty writes if any (similar to CommitLogSegment#cfLastWrite, write for the first write on segment and remove for flush). Commit logs are not removed after nodetool flush or nodetool drain -- Key: CASSANDRA-5911 URL: https://issues.apache.org/jira/browse/CASSANDRA-5911 Project: Cassandra Issue Type: Bug Components: Core Reporter: J.B. Langston Assignee: Vijay Priority: Minor Fix For: 2.0.1 Commit logs are not removed after nodetool flush or nodetool drain. This can lead to unnecessary commit log replay during startup. I've reproduced this on Apache Cassandra 1.2.8. Usually this isn't much of an issue but on a Solr-indexed column family in DSE, each replayed mutation has to be reindexed which can make startup take a long time (on the order of 20-30 min). Reproduction follows: {code} jblangston:bin jblangston$ ./cassandra /dev/null jblangston:bin jblangston$ ../tools/bin/cassandra-stress -n 2000 /dev/null jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool flush jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool drain jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ pkill java jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ ./cassandra -f | grep Replaying INFO 10:03:42,915 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566778.log INFO 10:03:42,922 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log INFO 10:03:43,911 Replaying
[jira] [Commented] (CASSANDRA-5903) Integer overflow in OffHeapBitSet when bloomfilter 2GB
[ https://issues.apache.org/jira/browse/CASSANDRA-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746598#comment-13746598 ] Vijay commented on CASSANDRA-5903: -- Thanks Taylan, I will writeup a test case for it... The patch on 1.2 (0002) should handle up to 2GB * 8 over which we might want to serialize and deserialize into long for 2.1. Integer overflow in OffHeapBitSet when bloomfilter 2GB Key: CASSANDRA-5903 URL: https://issues.apache.org/jira/browse/CASSANDRA-5903 Project: Cassandra Issue Type: Bug Components: Core Reporter: Taylan Develioglu Assignee: Vijay Labels: patch Fix For: 1.2.9 Attachments: 0001-CASSANDRA-5903.patch, 0002-CASSANDRA-5903.patch In org.apache.cassandra.utils.obs.OffHeapBitSet. byteCount overflows and causes an IllegalArgument exception in Memory.allocate when bloomfilter is 2GB. Suggest changing byteCount to long. {code:title=OffHeapBitSet.java} public OffHeapBitSet(long numBits) { // OpenBitSet.bits2words calculation is there for backward compatibility. int byteCount = OpenBitSet.bits2words(numBits) * 8; bytes = RefCountedMemory.allocate(byteCount); // flush/clear the existing memory. clear(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5903) Integer overflow in OffHeapBitSet when bloomfilter 2GB
[ https://issues.apache.org/jira/browse/CASSANDRA-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-5903: - Attachment: 0001-CASSANDRA-5903-check.patch Not sure if we still need this patch, attaching it just in case :) Integer overflow in OffHeapBitSet when bloomfilter 2GB Key: CASSANDRA-5903 URL: https://issues.apache.org/jira/browse/CASSANDRA-5903 Project: Cassandra Issue Type: Bug Components: Core Reporter: Taylan Develioglu Assignee: Vijay Labels: patch Fix For: 1.2.9 Attachments: 0001-CASSANDRA-5903-check.patch, 0001-CASSANDRA-5903.patch, 0002-CASSANDRA-5903.patch In org.apache.cassandra.utils.obs.OffHeapBitSet. byteCount overflows and causes an IllegalArgument exception in Memory.allocate when bloomfilter is 2GB. Suggest changing byteCount to long. {code:title=OffHeapBitSet.java} public OffHeapBitSet(long numBits) { // OpenBitSet.bits2words calculation is there for backward compatibility. int byteCount = OpenBitSet.bits2words(numBits) * 8; bytes = RefCountedMemory.allocate(byteCount); // flush/clear the existing memory. clear(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5903) Integer overflow in OffHeapBitSet when bloomfilter 2GB
[ https://issues.apache.org/jira/browse/CASSANDRA-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-5903: - Attachment: 0001-CASSANDRA-5903-check.patch Integer overflow in OffHeapBitSet when bloomfilter 2GB Key: CASSANDRA-5903 URL: https://issues.apache.org/jira/browse/CASSANDRA-5903 Project: Cassandra Issue Type: Bug Components: Core Reporter: Taylan Develioglu Assignee: Vijay Labels: patch Fix For: 1.2.9 Attachments: 0001-CASSANDRA-5903-check.patch, 0001-CASSANDRA-5903.patch, 0002-CASSANDRA-5903.patch In org.apache.cassandra.utils.obs.OffHeapBitSet. byteCount overflows and causes an IllegalArgument exception in Memory.allocate when bloomfilter is 2GB. Suggest changing byteCount to long. {code:title=OffHeapBitSet.java} public OffHeapBitSet(long numBits) { // OpenBitSet.bits2words calculation is there for backward compatibility. int byteCount = OpenBitSet.bits2words(numBits) * 8; bytes = RefCountedMemory.allocate(byteCount); // flush/clear the existing memory. clear(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5903) Integer overflow in OffHeapBitSet when bloomfilter 2GB
[ https://issues.apache.org/jira/browse/CASSANDRA-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-5903: - Attachment: (was: 0001-CASSANDRA-5903-check.patch) Integer overflow in OffHeapBitSet when bloomfilter 2GB Key: CASSANDRA-5903 URL: https://issues.apache.org/jira/browse/CASSANDRA-5903 Project: Cassandra Issue Type: Bug Components: Core Reporter: Taylan Develioglu Assignee: Vijay Labels: patch Fix For: 1.2.9 Attachments: 0001-CASSANDRA-5903-check.patch, 0001-CASSANDRA-5903.patch, 0002-CASSANDRA-5903.patch In org.apache.cassandra.utils.obs.OffHeapBitSet. byteCount overflows and causes an IllegalArgument exception in Memory.allocate when bloomfilter is 2GB. Suggest changing byteCount to long. {code:title=OffHeapBitSet.java} public OffHeapBitSet(long numBits) { // OpenBitSet.bits2words calculation is there for backward compatibility. int byteCount = OpenBitSet.bits2words(numBits) * 8; bytes = RefCountedMemory.allocate(byteCount); // flush/clear the existing memory. clear(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (CASSANDRA-5903) Integer overflow in OffHeapBitSet when bloomfilter 2GB
[ https://issues.apache.org/jira/browse/CASSANDRA-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746836#comment-13746836 ] Vijay edited comment on CASSANDRA-5903 at 8/21/13 8:59 PM: --- Not sure if we still need this patch, attaching it just in case :) Ignored the test since we need 4 GB to test it function. was (Author: vijay2...@yahoo.com): Not sure if we still need this patch, attaching it just in case :) Integer overflow in OffHeapBitSet when bloomfilter 2GB Key: CASSANDRA-5903 URL: https://issues.apache.org/jira/browse/CASSANDRA-5903 Project: Cassandra Issue Type: Bug Components: Core Reporter: Taylan Develioglu Assignee: Vijay Labels: patch Fix For: 1.2.9 Attachments: 0001-CASSANDRA-5903-check.patch, 0001-CASSANDRA-5903.patch, 0002-CASSANDRA-5903.patch In org.apache.cassandra.utils.obs.OffHeapBitSet. byteCount overflows and causes an IllegalArgument exception in Memory.allocate when bloomfilter is 2GB. Suggest changing byteCount to long. {code:title=OffHeapBitSet.java} public OffHeapBitSet(long numBits) { // OpenBitSet.bits2words calculation is there for backward compatibility. int byteCount = OpenBitSet.bits2words(numBits) * 8; bytes = RefCountedMemory.allocate(byteCount); // flush/clear the existing memory. clear(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5903) Integer overflow in OffHeapBitSet when bloomfilter 2GB
[ https://issues.apache.org/jira/browse/CASSANDRA-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747164#comment-13747164 ] Vijay commented on CASSANDRA-5903: -- Done, Thanks! Integer overflow in OffHeapBitSet when bloomfilter 2GB Key: CASSANDRA-5903 URL: https://issues.apache.org/jira/browse/CASSANDRA-5903 Project: Cassandra Issue Type: Bug Components: Core Reporter: Taylan Develioglu Assignee: Vijay Labels: patch Fix For: 1.2.9 Attachments: 0001-CASSANDRA-5903-check.patch, 0001-CASSANDRA-5903.patch, 0002-CASSANDRA-5903.patch In org.apache.cassandra.utils.obs.OffHeapBitSet. byteCount overflows and causes an IllegalArgument exception in Memory.allocate when bloomfilter is 2GB. Suggest changing byteCount to long. {code:title=OffHeapBitSet.java} public OffHeapBitSet(long numBits) { // OpenBitSet.bits2words calculation is there for backward compatibility. int byteCount = OpenBitSet.bits2words(numBits) * 8; bytes = RefCountedMemory.allocate(byteCount); // flush/clear the existing memory. clear(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5903) Integer overflow in OffHeapBitSet when bloomfilter 2GB
[ https://issues.apache.org/jira/browse/CASSANDRA-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13745244#comment-13745244 ] Vijay commented on CASSANDRA-5903: -- I can change the byte count to long, As a side note, i am not sure if we are addressing the right issue. From the stack trace the byteCount should be 228805104 which is 228 MB (OpenBitSet.bits2words(1830440832L) * 8L) which should fit in a integer. Integer overflow in OffHeapBitSet when bloomfilter 2GB Key: CASSANDRA-5903 URL: https://issues.apache.org/jira/browse/CASSANDRA-5903 Project: Cassandra Issue Type: Bug Components: Core Reporter: Taylan Develioglu Assignee: Vijay Fix For: 1.2.9 In org.apache.cassandra.utils.obs.OffHeapBitSet. byteCount overflows and causes an IllegalArgument exception in Memory.allocate when bloomfilter is 2GB. Suggest changing byteCount to long. {code:title=OffHeapBitSet.java} public OffHeapBitSet(long numBits) { // OpenBitSet.bits2words calculation is there for backward compatibility. int byteCount = OpenBitSet.bits2words(numBits) * 8; bytes = RefCountedMemory.allocate(byteCount); // flush/clear the existing memory. clear(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (CASSANDRA-5903) Integer overflow in OffHeapBitSet when bloomfilter 2GB
[ https://issues.apache.org/jira/browse/CASSANDRA-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13745244#comment-13745244 ] Vijay edited comment on CASSANDRA-5903 at 8/20/13 6:31 PM: --- I can change the byte count to long, As a side note, i am not sure if we are addressing the right issue. From the stack trace the byteCount should be 228805104 which is 228 MB (OpenBitSet.bits2words(1830440832L) * 8L) / ((1830440832L/64) * 8) which should fit in a integer. was (Author: vijay2...@yahoo.com): I can change the byte count to long, As a side note, i am not sure if we are addressing the right issue. From the stack trace the byteCount should be 228805104 which is 228 MB (OpenBitSet.bits2words(1830440832L) * 8L) which should fit in a integer. Integer overflow in OffHeapBitSet when bloomfilter 2GB Key: CASSANDRA-5903 URL: https://issues.apache.org/jira/browse/CASSANDRA-5903 Project: Cassandra Issue Type: Bug Components: Core Reporter: Taylan Develioglu Assignee: Vijay Fix For: 1.2.9 In org.apache.cassandra.utils.obs.OffHeapBitSet. byteCount overflows and causes an IllegalArgument exception in Memory.allocate when bloomfilter is 2GB. Suggest changing byteCount to long. {code:title=OffHeapBitSet.java} public OffHeapBitSet(long numBits) { // OpenBitSet.bits2words calculation is there for backward compatibility. int byteCount = OpenBitSet.bits2words(numBits) * 8; bytes = RefCountedMemory.allocate(byteCount); // flush/clear the existing memory. clear(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5903) Integer overflow in OffHeapBitSet when bloomfilter 2GB
[ https://issues.apache.org/jira/browse/CASSANDRA-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13745320#comment-13745320 ] Vijay commented on CASSANDRA-5903: -- Not sure yet, still trying to figure it out (Since i am more curious)... A simple test shows it might run out after 17B to 18B keys in a single SSTable (thats a giant SST) :) {code} for (int i = 0; i 30; i++) { long items = (i * 10L); System.out.println(Items: + items + byteCount: + (OpenBitSet.bits2words(items) * 8)); } {code} {noformat} Items: 0 byteCount: 0 Items: 10 byteCount: 12500 Items: 20 byteCount: 25000 Items: 30 byteCount: 37500 Items: 40 byteCount: 5 Items: 50 byteCount: 62500 Items: 60 byteCount: 75000 Items: 70 byteCount: 87500 Items: 80 byteCount: 10 Items: 90 byteCount: 112500 Items: 100 byteCount: 125000 Items: 110 byteCount: 137500 Items: 120 byteCount: 15 Items: 130 byteCount: 162500 Items: 140 byteCount: 175000 Items: 150 byteCount: 187500 Items: 160 byteCount: 20 Items: 170 byteCount: 212500 Items: 180 byteCount: -2044967296 Items: 190 byteCount: -1919967296 Items: 200 byteCount: -1794967296 ... {noformat} Integer overflow in OffHeapBitSet when bloomfilter 2GB Key: CASSANDRA-5903 URL: https://issues.apache.org/jira/browse/CASSANDRA-5903 Project: Cassandra Issue Type: Bug Components: Core Reporter: Taylan Develioglu Assignee: Vijay Fix For: 1.2.9 In org.apache.cassandra.utils.obs.OffHeapBitSet. byteCount overflows and causes an IllegalArgument exception in Memory.allocate when bloomfilter is 2GB. Suggest changing byteCount to long. {code:title=OffHeapBitSet.java} public OffHeapBitSet(long numBits) { // OpenBitSet.bits2words calculation is there for backward compatibility. int byteCount = OpenBitSet.bits2words(numBits) * 8; bytes = RefCountedMemory.allocate(byteCount); // flush/clear the existing memory. clear(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5903) Integer overflow in OffHeapBitSet when bloomfilter 2GB
[ https://issues.apache.org/jira/browse/CASSANDRA-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13745328#comment-13745328 ] Vijay commented on CASSANDRA-5903: -- Actually my calculations where wrong it does use 2 GB for 1830440832 long numElements = 1830440832L; FilterFactory.getFilter(numElements, 0.01d, true); fixing it. Integer overflow in OffHeapBitSet when bloomfilter 2GB Key: CASSANDRA-5903 URL: https://issues.apache.org/jira/browse/CASSANDRA-5903 Project: Cassandra Issue Type: Bug Components: Core Reporter: Taylan Develioglu Assignee: Vijay Fix For: 1.2.9 In org.apache.cassandra.utils.obs.OffHeapBitSet. byteCount overflows and causes an IllegalArgument exception in Memory.allocate when bloomfilter is 2GB. Suggest changing byteCount to long. {code:title=OffHeapBitSet.java} public OffHeapBitSet(long numBits) { // OpenBitSet.bits2words calculation is there for backward compatibility. int byteCount = OpenBitSet.bits2words(numBits) * 8; bytes = RefCountedMemory.allocate(byteCount); // flush/clear the existing memory. clear(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5903) Integer overflow in OffHeapBitSet when bloomfilter 2GB
[ https://issues.apache.org/jira/browse/CASSANDRA-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-5903: - Attachment: 0001-CASSANDRA-5903.patch Simple fix for 1.2, it also catches for native OOM (I am neutral, i can also remove it so we fail fast) and throws a RTE to pause the compaction etc. Integer overflow in OffHeapBitSet when bloomfilter 2GB Key: CASSANDRA-5903 URL: https://issues.apache.org/jira/browse/CASSANDRA-5903 Project: Cassandra Issue Type: Bug Components: Core Reporter: Taylan Develioglu Assignee: Vijay Fix For: 1.2.9 Attachments: 0001-CASSANDRA-5903.patch In org.apache.cassandra.utils.obs.OffHeapBitSet. byteCount overflows and causes an IllegalArgument exception in Memory.allocate when bloomfilter is 2GB. Suggest changing byteCount to long. {code:title=OffHeapBitSet.java} public OffHeapBitSet(long numBits) { // OpenBitSet.bits2words calculation is there for backward compatibility. int byteCount = OpenBitSet.bits2words(numBits) * 8; bytes = RefCountedMemory.allocate(byteCount); // flush/clear the existing memory. clear(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-5903) Integer overflow in OffHeapBitSet when bloomfilter 2GB
[ https://issues.apache.org/jira/browse/CASSANDRA-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay resolved CASSANDRA-5903. -- Resolution: Fixed Reviewer: jbellis Committed to 1.2 and merged into 2.0.0 - 2.0 - trunk. Thanks! Integer overflow in OffHeapBitSet when bloomfilter 2GB Key: CASSANDRA-5903 URL: https://issues.apache.org/jira/browse/CASSANDRA-5903 Project: Cassandra Issue Type: Bug Components: Core Reporter: Taylan Develioglu Assignee: Vijay Fix For: 1.2.9 Attachments: 0001-CASSANDRA-5903.patch In org.apache.cassandra.utils.obs.OffHeapBitSet. byteCount overflows and causes an IllegalArgument exception in Memory.allocate when bloomfilter is 2GB. Suggest changing byteCount to long. {code:title=OffHeapBitSet.java} public OffHeapBitSet(long numBits) { // OpenBitSet.bits2words calculation is there for backward compatibility. int byteCount = OpenBitSet.bits2words(numBits) * 8; bytes = RefCountedMemory.allocate(byteCount); // flush/clear the existing memory. clear(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5903) Integer overflow in OffHeapBitSet when bloomfilter 2GB
[ https://issues.apache.org/jira/browse/CASSANDRA-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13745622#comment-13745622 ] Vijay commented on CASSANDRA-5903: -- Done! Thanks. Integer overflow in OffHeapBitSet when bloomfilter 2GB Key: CASSANDRA-5903 URL: https://issues.apache.org/jira/browse/CASSANDRA-5903 Project: Cassandra Issue Type: Bug Components: Core Reporter: Taylan Develioglu Assignee: Vijay Fix For: 1.2.9 Attachments: 0001-CASSANDRA-5903.patch In org.apache.cassandra.utils.obs.OffHeapBitSet. byteCount overflows and causes an IllegalArgument exception in Memory.allocate when bloomfilter is 2GB. Suggest changing byteCount to long. {code:title=OffHeapBitSet.java} public OffHeapBitSet(long numBits) { // OpenBitSet.bits2words calculation is there for backward compatibility. int byteCount = OpenBitSet.bits2words(numBits) * 8; bytes = RefCountedMemory.allocate(byteCount); // flush/clear the existing memory. clear(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5357) Query cache
[ https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733151#comment-13733151 ] Vijay commented on CASSANDRA-5357: -- Hi Jonathan, The idea in the current implementation is as follows: The QueryCacheQueryFilter,CF is implemented on top of SerializedCache. It stores the Map's key as a RowCacheKeyRowKey, CFID (same as earlier RowCache), and Map's value is a composite value as QueryCacheValue[Query, ], ColumnFamily, For every new query enters the system, we get the QueryCacheValue after generating RowCacheKey from QueryFilter, to check if the IFilter exist. If it does then return CF; else get QueryCacheValue (if QCV exist; else create new), add the IFilter to QCV and merge the results with the existing ColumnFamily (also in QCV), which will in-turn be serialized. Advantages: 1) Queries can overlap, there could be any number of queries but the data will not be repeated within them. 2) When we want to invalidate it we would just invalidate the RowKey and all the cached QueryCacheValue goes away (avoids another Map for book keeping and hence little more memory efficient) 3) there is a property which user can enable to cache the whole row no matter what the query is (but currently patch adds overhead of deserializing identity filter which can be fixed though). Of course there are disadvantages: 1) LRU algorithm is no longer really accurate, When a single query is hot we have no way of invalidating the other queries on the same row, since they all have the same number of hit rates (which is no worse than what we have currently) 2) With multiple types of queries on the same row (which is kind of edge case) we might be pulling the whole data into memory (which can be mitigated by incrementally loading it or holding a index in the filter and doesn't exist in the current patch). there could be more which i overlooked... Query cache --- Key: CASSANDRA-5357 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Vijay I think that most people expect the row cache to act like a query cache, because that's a reasonable model. Caching the entire partition is, in retrospect, not really reasonable, so it's not surprising that it catches people off guard, especially given the confusion we've inflicted on ourselves as to what a row constitutes. I propose replacing it with a true query cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5357) Query cache
[ https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729343#comment-13729343 ] Vijay commented on CASSANDRA-5357: -- Hi Jonathan, I pushed a basic version of Query cache to https://github.com/Vijay2win/cassandra/commits/query_cache .I am not sure if we still need RowCacheSentinel, but the attached removes it. Attached patch also has an option query_cache: true (if set to false, the whole row will always be cached). It will be nice to have fully off-heap Map/Cache (including the keys) but i am thinking to address it with a separate github project/patch (though IMHO, CHM may have contention in the segments for a big caches). Let me know what you think about the patch, it might need some more cleanup. Query cache --- Key: CASSANDRA-5357 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Vijay I think that most people expect the row cache to act like a query cache, because that's a reasonable model. Caching the entire partition is, in retrospect, not really reasonable, so it's not surprising that it catches people off guard, especially given the confusion we've inflicted on ourselves as to what a row constitutes. I propose replacing it with a true query cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5826) Fix trigger directory detection code
[ https://issues.apache.org/jira/browse/CASSANDRA-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729782#comment-13729782 ] Vijay commented on CASSANDRA-5826: -- Hi Brandon, Oooops... Isn't the directory found in conf? i can remove the RTE and make it log, if not found. Fix trigger directory detection code Key: CASSANDRA-5826 URL: https://issues.apache.org/jira/browse/CASSANDRA-5826 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 2.0 beta 2 Environment: OS X Reporter: Aleksey Yeschenko Assignee: Vijay Labels: triggers Fix For: 2.0 rc1 Attachments: 0001-5826.patch At least when building from source, Cassandra determines the trigger directory wrong. C* calculates the trigger directory as 'build/triggers' instead of 'triggers'. FBUtilities.cassandraHomeDir() is to blame, and should be replaced with something more robust. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5826) Fix trigger directory detection code
[ https://issues.apache.org/jira/browse/CASSANDRA-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-5826: - Attachment: 0001-handle-trigger-non-existance.patch Hi Brandon, Attached, handles un reachable trigger directory. Fix trigger directory detection code Key: CASSANDRA-5826 URL: https://issues.apache.org/jira/browse/CASSANDRA-5826 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 2.0 beta 2 Environment: OS X Reporter: Aleksey Yeschenko Assignee: Vijay Labels: triggers Fix For: 2.0 rc1 Attachments: 0001-5826.patch, 0001-handle-trigger-non-existance.patch At least when building from source, Cassandra determines the trigger directory wrong. C* calculates the trigger directory as 'build/triggers' instead of 'triggers'. FBUtilities.cassandraHomeDir() is to blame, and should be replaced with something more robust. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5826) Fix trigger directory detection code
[ https://issues.apache.org/jira/browse/CASSANDRA-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729907#comment-13729907 ] Vijay commented on CASSANDRA-5826: -- Hi Brandon did the patch apply clean? {code} File tiggerDirectory = FBUtilities.cassandraTriggerDir(); if (tiggerDirectory == null) return; {code} should save a NPE, i did test it and worked fine for me. Fix trigger directory detection code Key: CASSANDRA-5826 URL: https://issues.apache.org/jira/browse/CASSANDRA-5826 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 2.0 beta 2 Environment: OS X Reporter: Aleksey Yeschenko Assignee: Vijay Labels: triggers Fix For: 2.0 rc1 Attachments: 0001-5826.patch, 0001-handle-trigger-non-existance.patch At least when building from source, Cassandra determines the trigger directory wrong. C* calculates the trigger directory as 'build/triggers' instead of 'triggers'. FBUtilities.cassandraHomeDir() is to blame, and should be replaced with something more robust. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5826) Fix trigger directory detection code
[ https://issues.apache.org/jira/browse/CASSANDRA-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-5826: - Attachment: 0001-handle-trigger-non-existance-v2.patch Hi Brandon, fixed in v2 Fix trigger directory detection code Key: CASSANDRA-5826 URL: https://issues.apache.org/jira/browse/CASSANDRA-5826 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 2.0 beta 2 Environment: OS X Reporter: Aleksey Yeschenko Assignee: Vijay Labels: triggers Fix For: 2.0 rc1 Attachments: 0001-5826.patch, 0001-handle-trigger-non-existance.patch, 0001-handle-trigger-non-existance-v2.patch At least when building from source, Cassandra determines the trigger directory wrong. C* calculates the trigger directory as 'build/triggers' instead of 'triggers'. FBUtilities.cassandraHomeDir() is to blame, and should be replaced with something more robust. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-5826) Fix trigger directory detection code
[ https://issues.apache.org/jira/browse/CASSANDRA-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay resolved CASSANDRA-5826. -- Resolution: Fixed Committed, with nit Thanks! Fix trigger directory detection code Key: CASSANDRA-5826 URL: https://issues.apache.org/jira/browse/CASSANDRA-5826 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 2.0 beta 2 Environment: OS X Reporter: Aleksey Yeschenko Assignee: Vijay Labels: triggers Fix For: 2.0 rc1 Attachments: 0001-5826.patch, 0001-handle-trigger-non-existance.patch, 0001-handle-trigger-non-existance-v2.patch At least when building from source, Cassandra determines the trigger directory wrong. C* calculates the trigger directory as 'build/triggers' instead of 'triggers'. FBUtilities.cassandraHomeDir() is to blame, and should be replaced with something more robust. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5826) Fix trigger directory detection code
[ https://issues.apache.org/jira/browse/CASSANDRA-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730309#comment-13730309 ] Vijay commented on CASSANDRA-5826: -- Done, sorry for all the mess on a simple patch. Fix trigger directory detection code Key: CASSANDRA-5826 URL: https://issues.apache.org/jira/browse/CASSANDRA-5826 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 2.0 beta 2 Environment: OS X Reporter: Aleksey Yeschenko Assignee: Vijay Labels: triggers Fix For: 2.0 rc1 Attachments: 0001-5826.patch, 0001-handle-trigger-non-existance.patch, 0001-handle-trigger-non-existance-v2.patch At least when building from source, Cassandra determines the trigger directory wrong. C* calculates the trigger directory as 'build/triggers' instead of 'triggers'. FBUtilities.cassandraHomeDir() is to blame, and should be replaced with something more robust. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5826) Fix trigger directory detection code
[ https://issues.apache.org/jira/browse/CASSANDRA-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727960#comment-13727960 ] Vijay commented on CASSANDRA-5826: -- {quote} As long as we are not trying to isolate classloaders or anything {quote} Actually we do it with triggers, similar to what solr does for Tokenizer code etc (but not the same). For the record: You can place all of your dependencies in the trigger directory except everything which Cassandra depends on. If the user uses maven for building, all he needs to do is, and place the jars in the trigger directory. {code} dependency groupIdorg.apache.cassandra/groupId artifactIdcassandra-all/artifactId version2.0.0-beta2/version scopeprovided/scope /dependency {code} My understanding is that, Java doesn't do nested class path scanning on sub directories, hence conf file was ok to do. But understand it is kind of scary if someone places in conf instead. Fix trigger directory detection code Key: CASSANDRA-5826 URL: https://issues.apache.org/jira/browse/CASSANDRA-5826 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 2.0 beta 2 Environment: OS X Reporter: Aleksey Yeschenko Assignee: Vijay Labels: triggers Fix For: 2.0 rc1 Attachments: 0001-5826.patch At least when building from source, Cassandra determines the trigger directory wrong. C* calculates the trigger directory as 'build/triggers' instead of 'triggers'. FBUtilities.cassandraHomeDir() is to blame, and should be replaced with something more robust. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5826) Fix trigger directory detection code
[ https://issues.apache.org/jira/browse/CASSANDRA-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-5826: - Attachment: 0001-5826.patch Attached a small patch moves the trigger directory into conf directory, hope it is fine. that way we can just search for the triggers directory in the class path (which is Conf). Thanks! Fix trigger directory detection code Key: CASSANDRA-5826 URL: https://issues.apache.org/jira/browse/CASSANDRA-5826 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 2.0 beta 2 Environment: OS X Reporter: Aleksey Yeschenko Assignee: Vijay Labels: triggers Fix For: 2.0 rc1 Attachments: 0001-5826.patch At least when building from source, Cassandra determines the trigger directory wrong. C* calculates the trigger directory as 'build/triggers' instead of 'triggers'. FBUtilities.cassandraHomeDir() is to blame, and should be replaced with something more robust. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5826) Fix trigger directory detection code
[ https://issues.apache.org/jira/browse/CASSANDRA-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724240#comment-13724240 ] Vijay commented on CASSANDRA-5826: -- Probably have to change the build.xml to copy the trigger directory to build like what we do with conf directory? I will add the above and also add it to Debian package may be (in addition adding a property to override the trigger absolute path). Fix trigger directory detection code Key: CASSANDRA-5826 URL: https://issues.apache.org/jira/browse/CASSANDRA-5826 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 2.0 beta 2 Environment: OS X Reporter: Aleksey Yeschenko Assignee: Vijay Labels: triggers At least when building from source, Cassandra determines the trigger directory wrong. C* calculates the trigger directory as 'build/triggers' instead of 'triggers'. FBUtilities.cassandraHomeDir() is to blame, and should be replaced with something more robust. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5175) Unbounded (?) thread growth connecting to an removed node
[ https://issues.apache.org/jira/browse/CASSANDRA-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721497#comment-13721497 ] Vijay commented on CASSANDRA-5175: -- Yes there was another commit on top the attached patch to fix the test cases, yes the logic has changed since calling close() is the only time we need to stop the thread. Current code in the repo {code} if (m == CLOSE_SENTINEL) { disconnect(); if (isStopped) break; continue; } {code} Unbounded (?) thread growth connecting to an removed node - Key: CASSANDRA-5175 URL: https://issues.apache.org/jira/browse/CASSANDRA-5175 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.8 Environment: EC2, JDK 7u9, Ubuntu 12.04.1 LTS Reporter: Janne Jalkanen Assignee: Vijay Priority: Minor Fix For: 1.1.10, 1.2.1 Attachments: 0001-CASSANDRA-5175.patch The following lines started repeating every minute in the log file {noformat} INFO [GossipStage:1] 2013-01-19 19:35:43,929 Gossiper.java (line 831) InetAddress /10.238.x.y is now dead. INFO [GossipStage:1] 2013-01-19 19:35:43,930 StorageService.java (line 1291) Removing token 170141183460469231731687303715884105718 for /10.238.x.y {noformat} Also, I got about 3000 threads which all look like this: {noformat} Name: WRITE-/10.238.x.y State: WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1bb65c0f Total blocked: 0 Total waited: 3 Stack trace: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:104) {noformat} A new thread seems to be created every minute, and they never go away. The endpoint in question had been a part of the cluster weeks ago, and the node exhibiting the thread growth was added yesterday. Anyway, assassinating the endpoint in question stopped thread growth (but kept the existing threads running), so this isn't a huge issue. But I don't think the thread count is supposed to be increasing like this... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4573) HSHA doesn't handle large messages gracefully
[ https://issues.apache.org/jira/browse/CASSANDRA-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13714546#comment-13714546 ] Vijay commented on CASSANDRA-4573: -- Peter, Looks like your issue is because of the client timeout when you didn't receive a response for 10 sec. Time to tune the heap or add more nodes. Tyler, is this ticket still valid? HSHA doesn't handle large messages gracefully - Key: CASSANDRA-4573 URL: https://issues.apache.org/jira/browse/CASSANDRA-4573 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Vijay Attachments: repro.py HSHA doesn't seem to enforce any kind of max message length, and when messages are too large, it doesn't fail gracefully. With debug logs enabled, you'll see this: {{DEBUG 13:13:31,805 Unexpected state 16}} Which seems to mean that there's a SelectionKey that's valid, but isn't ready for reading, writing, or accepting. Client-side, you'll get this thrift error (while trying to read a frame as part of {{recv_batch_mutate}}): {{TTransportException: TSocket read 0 bytes}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5574) Add trigger examples
[ https://issues.apache.org/jira/browse/CASSANDRA-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-5574: - Affects Version/s: 2.0 beta 1 Fix Version/s: 2.0 Add trigger examples - Key: CASSANDRA-5574 URL: https://issues.apache.org/jira/browse/CASSANDRA-5574 Project: Cassandra Issue Type: Test Affects Versions: 2.0 beta 1 Reporter: Vijay Assignee: Vijay Priority: Trivial Fix For: 2.0 Attachments: 0001-CASSANDRA-5574.patch Since 1311 is committed we need some example code to show the power and usage of triggers. Similar to the ones in examples directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5171) Save EC2Snitch topology information in system table
[ https://issues.apache.org/jira/browse/CASSANDRA-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704261#comment-13704261 ] Vijay commented on CASSANDRA-5171: -- PS: i only committed to 2.0 to be safe, let me know if you think otherwise. Thanks! Save EC2Snitch topology information in system table --- Key: CASSANDRA-5171 URL: https://issues.apache.org/jira/browse/CASSANDRA-5171 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.1 Environment: EC2 Reporter: Vijay Assignee: Vijay Priority: Critical Fix For: 2.0 Attachments: 0001-CASSANDRA-5171.patch, 0001-CASSANDRA-5171-v2.patch EC2Snitch currently waits for the Gossip information to understand the cluster information every time we restart. It will be nice to use already available system table info similar to GPFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5171) Save EC2Snitch topology information in system table
[ https://issues.apache.org/jira/browse/CASSANDRA-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-5171: - Fix Version/s: (was: 1.2.7) 2.0 Save EC2Snitch topology information in system table --- Key: CASSANDRA-5171 URL: https://issues.apache.org/jira/browse/CASSANDRA-5171 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.1 Environment: EC2 Reporter: Vijay Assignee: Vijay Priority: Critical Fix For: 2.0 Attachments: 0001-CASSANDRA-5171.patch, 0001-CASSANDRA-5171-v2.patch EC2Snitch currently waits for the Gossip information to understand the cluster information every time we restart. It will be nice to use already available system table info similar to GPFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5171) Save EC2Snitch topology information in system table
[ https://issues.apache.org/jira/browse/CASSANDRA-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704784#comment-13704784 ] Vijay commented on CASSANDRA-5171: -- Thanks Jason! Save EC2Snitch topology information in system table --- Key: CASSANDRA-5171 URL: https://issues.apache.org/jira/browse/CASSANDRA-5171 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.1 Environment: EC2 Reporter: Vijay Assignee: Vijay Priority: Critical Fix For: 2.0 Attachments: 0001-CASSANDRA-5171.patch, 0001-CASSANDRA-5171-v2.patch EC2Snitch currently waits for the Gossip information to understand the cluster information every time we restart. It will be nice to use already available system table info similar to GPFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira