[jira] [Commented] (CASSANDRA-15172) LegacyLayout RangeTombstoneList throws IndexOutOfBoundsException

2019-09-25 Thread Benedict (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937547#comment-16937547
 ] 

Benedict commented on CASSANDRA-15172:
--

Hi [~Sagges], I'm on holiday (again!) but please file a new bug report for this 
and assign it to me so that I remember when I return, as it is presumably a 
different (but very similar) bug.

> LegacyLayout RangeTombstoneList throws IndexOutOfBoundsException
> 
>
> Key: CASSANDRA-15172
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15172
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Shalom
>Assignee: Benedict
>Priority: Normal
> Fix For: 3.0.19, 3.11.5
>
>
> Hi All,
> This is the first time I open an issue, so apologies if I'm not following the 
> rules properly.
>  
> After upgrading a node from version 2.1.21 to 3.11.4, we've started seeing a 
> lot of AbstractLocalAwareExecutorService exceptions. This happened right 
> after the node successfully started up with the new 3.11.4 binaries. 
> {noformat}
> INFO  [main] 2019-06-05 04:41:37,730 Gossiper.java:1715 - No gossip backlog; 
> proceeding
> INFO  [main] 2019-06-05 04:41:38,036 NativeTransportService.java:70 - Netty 
> using native Epoll event loop
> INFO  [main] 2019-06-05 04:41:38,117 Server.java:155 - Using Netty Version: 
> [netty-buffer=netty-buffer-4.0.44.Final.452812a, 
> netty-codec=netty-codec-4.0.44.Final.452812a, 
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
> netty-common=netty-common-4.0.44.Final.452812a, 
> netty-handler=netty-handler-4.0.44.Final.452812a, 
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
> netty-transport=netty-transport-4.0.44.Final.452812a, 
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
>  netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> INFO  [main] 2019-06-05 04:41:38,118 Server.java:156 - Starting listening for 
> CQL clients on /0.0.0.0:9042 (unencrypted)...
> INFO  [main] 2019-06-05 04:41:38,179 CassandraDaemon.java:556 - Not starting 
> RPC server as requested. Use JMX (StorageService->startRPCServer()) or 
> nodetool (enablethrift) to start it
> INFO  [Native-Transport-Requests-21] 2019-06-05 04:41:39,145 
> AuthCache.java:161 - (Re)initializing PermissionsCache (validity 
> period/update interval/max entries) (2000/2000/1000)
> INFO  [OptionalTasks:1] 2019-06-05 04:41:39,729 CassandraAuthorizer.java:409 
> - Converting legacy permissions data
> INFO  [HANDSHAKE-/10.10.10.8] 2019-06-05 04:41:39,808 
> OutboundTcpConnection.java:561 - Handshaking version with /10.10.10.8
> INFO  [HANDSHAKE-/10.10.10.9] 2019-06-05 04:41:39,808 
> OutboundTcpConnection.java:561 - Handshaking version with /10.10.10.9
> INFO  [HANDSHAKE-dc1_02/10.10.10.6] 2019-06-05 04:41:39,809 
> OutboundTcpConnection.java:561 - Handshaking version with dc1_02/10.10.10.6
> WARN  [ReadStage-2] 2019-06-05 04:41:39,857 
> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
> Thread[ReadStage-2,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: 1
>     at 
> org.apache.cassandra.db.AbstractBufferClusteringPrefix.get(AbstractBufferClusteringPrefix.java:55)
>     at 
> org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSizeCompound(LegacyLayout.java:2545)
>     at 
> org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSize(LegacyLayout.java:2522)
>     at 
> org.apache.cassandra.db.LegacyLayout.serializedSizeAsLegacyPartition(LegacyLayout.java:565)
>     at 
> org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:446)
>     at 
> org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:352)
>     at 
> org.apache.cassandra.net.MessageOut.payloadSize(MessageOut.java:171)
>     at 
> org.apache.cassandra.net.OutboundTcpConnectionPool.getConnection(OutboundTcpConnectionPool.java:77)
>     at 
> org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:802)
>     at 
> org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:953)
>     at 
> org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:929)
>     at 
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:62)
>     at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
>     at 
> 

[jira] [Commented] (CASSANDRA-14825) Expose table schema for drivers

2019-09-06 Thread Benedict (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924124#comment-16924124
 ] 

Benedict commented on CASSANDRA-14825:
--

There is a legitimate case to be made to support both approaches, in my opinion.

{{DESCRIBE}}'s main advantage, to my mind, is that it could support versioning 
of CQL grammar, so that you can ask for a string compatible with a given CQL 
version. This is difficult to do cleanly with a virtual table.

The presented factors in its favour don't seem relevant to me, and please try 
to avoid characterising others' disagreement with your points as 
misunderstanding or ignoring them:
 # {{cqlsh}} having it already does not seem particularly important, and should 
not bind our future decisions. It is a single tool, even if it is bundled.
 # Virtual tables are also very capable of surfacing the necessary information 
to produce dependent types, for instance as a collection column of the names of 
those type.

Further to this second point, virtual tables permit surfacing a whole lot more 
information in a structured _searchable_ manner.  Want to find which tables use 
a given type?  Search on that collection column.

>From a pure UX, personal preference perspective, I can say that I hate 
>features like {{DESCRIBE}} because I have to go and google the manual.  With a 
>virtual table interface, I just {{SELECT}} and progressively narrow my 
>criteria with {{WHERE}} clauses. I preferred this approach in SQL Server than 
>I do today's {{DESCRIBE}} in {{cqlsh}}.

So, I can see a case for both.  But if I had to choose, I would prefer virtual 
tables, and I haven't seen an argument that unequivocally seals the deal one 
way or the other.

> Expose table schema for drivers
> ---
>
> Key: CASSANDRA-14825
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14825
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently the drivers recreate the CQL for the tables by putting together the 
> system table values. This is very difficult to keep up to date and buggy 
> enough that its only even supported in Java and Python drivers. Cassandra 
> already has some limited output available for snapshots that we could provide 
> in a virtual table or new query that the drivers can fetch. This can greatly 
> reduce the complexity of drivers while also reducing bugs like 
> CASSANDRA-14822 as the underlying schema and properties change.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14825) Expose table schema for drivers

2019-09-05 Thread Benedict (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923747#comment-16923747
 ] 

Benedict edited comment on CASSANDRA-14825 at 9/5/19 9:34 PM:
--

Perhaps we should break this deadlock with a non-binding vote?  Possibly on the 
mailing list?  Since there's no obvious consensus, except that we should pick 
one of the two, it looks set to go round in circles if we don't just pick one 
and go with it.


was (Author: benedict):
Perhaps we should break this deadlock with a non-binding vote?  Possibly on the 
mailing list?  Since nobody feels strongly, it looks set to go round in circles 
if we don't just pick an approach and go with it.

> Expose table schema for drivers
> ---
>
> Key: CASSANDRA-14825
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14825
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently the drivers recreate the CQL for the tables by putting together the 
> system table values. This is very difficult to keep up to date and buggy 
> enough that its only even supported in Java and Python drivers. Cassandra 
> already has some limited output available for snapshots that we could provide 
> in a virtual table or new query that the drivers can fetch. This can greatly 
> reduce the complexity of drivers while also reducing bugs like 
> CASSANDRA-14822 as the underlying schema and properties change.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14825) Expose table schema for drivers

2019-09-05 Thread Benedict (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923747#comment-16923747
 ] 

Benedict commented on CASSANDRA-14825:
--

Perhaps we should break this deadlock with a non-binding vote?  Possibly on the 
mailing list?  Since nobody feels strongly, it looks set to go round in circles 
if we don't just pick an approach and go with it.

> Expose table schema for drivers
> ---
>
> Key: CASSANDRA-14825
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14825
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently the drivers recreate the CQL for the tables by putting together the 
> system table values. This is very difficult to keep up to date and buggy 
> enough that its only even supported in Java and Python drivers. Cassandra 
> already has some limited output available for snapshots that we could provide 
> in a virtual table or new query that the drivers can fetch. This can greatly 
> reduce the complexity of drivers while also reducing bugs like 
> CASSANDRA-14822 as the underlying schema and properties change.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15216) Cross node message creation times are disabled by default

2019-09-02 Thread Benedict (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920755#comment-16920755
 ] 

Benedict commented on CASSANDRA-15216:
--

Thanks for the offer.  This is a two word change, though (from {{false}} to  
{{true}}, in {{cassandra.yaml}} and {{Config}}), and probably too trivial to 
even bother with submission and review; it can probably be ninja'd in.  The 
only requirement is a brief discussion to confirm nobody disagrees this should 
happen.

> Cross node message creation times are disabled by default
> -
>
> Key: CASSANDRA-15216
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15216
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0, 4.0-alpha
>
>
> This can cause a lot of wasted work for messages that have timed out on the 
> coordinator.  We should generally assume that our users have setup NTP on 
> their clusters, and that clocks are modestly in sync, since it’s a 
> requirement for general correctness of last write wins.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15228) CommitLogReplayer should replay past final sync marker

2019-09-02 Thread Benedict (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920747#comment-16920747
 ] 

Benedict edited comment on CASSANDRA-15228 at 9/2/19 10:11 AM:
---

\[This comment refers to prior goal of avoiding writing the sync markers\]

It had been a long time since I looked at the commit log code, and taking a 
look now, this has obviously become tightly interwoven with other extended 
behaviours (encryption and compression) that I would prefer not to unpick 
anytime soon.  This isn't the simplifying change I had hoped it would be.  I 
think it would be preferable to defer any refactoring until we can address 
CASSANDRA-9834.

We can instead, at any time during 4.0 (or later if we choose) modify the 
replayer to simply ignore the lack of a final sync marker, and to attempt to 
replay the contents in-between, to restore earlier persistence behaviour 
without any user-visible file format changes.


was (Author: benedict):
\[This comment refers to prior goal of avoiding writing the sync markers]\

It had been a long time since I looked at the commit log code, and taking a 
look now, this has obviously become tightly interwoven with other extended 
behaviours (encryption and compression) that I would prefer not to unpick 
anytime soon.  This isn't the simplifying change I had hoped it would be.  I 
think it would be preferable to defer any refactoring until we can address 
CASSANDRA-9834.

We can instead, at any time during 4.0 (or later if we choose) modify the 
replayer to simply ignore the lack of a final sync marker, and to attempt to 
replay the contents in-between, to restore earlier persistence behaviour 
without any user-visible file format changes.

> CommitLogReplayer should replay past final sync marker
> --
>
> Key: CASSANDRA-15228
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15228
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0.x
>
>
> Under default commit log configuration, the sync markers have no purpose, 
> only serving to reduce persistence by preventing replay of any mutations 
> serialised between syncs.  Refactoring the commit log to prevent this would 
> be painful, given their utility for encrypted and compressed segments, so we 
> should instead ignore the lack of a final sync marker when replaying a raw 
> commit log segment, and attempt to replay any mutations we encounter after 
> the last sync marker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15228) CommitLogReplayer should replay past final sync marker

2019-09-02 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15228:
-
Description: Under default commit log configuration, the sync markers have 
no purpose, only serving to reduce persistence by preventing replay of any 
mutations serialised between syncs.  Refactoring the commit log to prevent this 
would be painful, given their utility for encrypted and compressed segments, so 
we should instead ignore the lack of a final sync marker when replaying a raw 
commit log segment, and attempt to replay any mutations we encounter after the 
last sync marker.  (was: The sync markers existed to permit file re-use.  Since 
we no longer re-use files, they no longer provide any value.  However, they 
_can_ corrupt the commit log for replay in the event of a process crash.  
Before we release 4.0, we should ideally remove the sync markers entirely.
)

> CommitLogReplayer should replay past final sync marker
> --
>
> Key: CASSANDRA-15228
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15228
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0.x
>
>
> Under default commit log configuration, the sync markers have no purpose, 
> only serving to reduce persistence by preventing replay of any mutations 
> serialised between syncs.  Refactoring the commit log to prevent this would 
> be painful, given their utility for encrypted and compressed segments, so we 
> should instead ignore the lack of a final sync marker when replaying a raw 
> commit log segment, and attempt to replay any mutations we encounter after 
> the last sync marker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15228) CommitLogReplayer should replay past final sync marker

2019-09-02 Thread Benedict (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920747#comment-16920747
 ] 

Benedict edited comment on CASSANDRA-15228 at 9/2/19 10:10 AM:
---

\[This comment refers to prior goal of avoiding writing the sync markers]\

It had been a long time since I looked at the commit log code, and taking a 
look now, this has obviously become tightly interwoven with other extended 
behaviours (encryption and compression) that I would prefer not to unpick 
anytime soon.  This isn't the simplifying change I had hoped it would be.  I 
think it would be preferable to defer any refactoring until we can address 
CASSANDRA-9834.

We can instead, at any time during 4.0 (or later if we choose) modify the 
replayer to simply ignore the lack of a final sync marker, and to attempt to 
replay the contents in-between, to restore earlier persistence behaviour 
without any user-visible file format changes.


was (Author: benedict):
It had been a long time since I looked at the commit log code, and taking a 
look now, this has obviously become tightly interwoven with other extended 
behaviours (encryption and compression) that I would prefer not to unpick 
anytime soon.  This isn't the simplifying change I had hoped it would be.  I 
think it would be preferable to defer any refactoring until we can address 
CASSANDRA-9834.

We can instead, at any time during 4.0 (or later if we choose) modify the 
replayer to simply ignore the lack of a final sync marker, and to attempt to 
replay the contents in-between, to restore earlier persistence behaviour 
without any user-visible file format changes.

> CommitLogReplayer should replay past final sync marker
> --
>
> Key: CASSANDRA-15228
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15228
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0.x
>
>
> Under default commit log configuration, the sync markers have no purpose, 
> only serving to reduce persistence by preventing replay of any mutations 
> serialised between syncs.  Refactoring the commit log to prevent this would 
> be painful, given their utility for encrypted and compressed segments, so we 
> should instead ignore the lack of a final sync marker when replaying a raw 
> commit log segment, and attempt to replay any mutations we encounter after 
> the last sync marker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15228) CommitLogReplayer should replay past final sync marker

2019-09-02 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15228:
-
Fix Version/s: (was: 4.0-alpha)
   (was: 4.0)
   4.0.x

> CommitLogReplayer should replay past final sync marker
> --
>
> Key: CASSANDRA-15228
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15228
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0.x
>
>
> The sync markers existed to permit file re-use.  Since we no longer re-use 
> files, they no longer provide any value.  However, they _can_ corrupt the 
> commit log for replay in the event of a process crash.  Before we release 
> 4.0, we should ideally remove the sync markers entirely.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15228) CommitLogReplayer should replay past final sync marker

2019-09-02 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15228:
-
Summary: CommitLogReplayer should replay past final sync marker  (was: 
Commit Log should not use sync markers)

> CommitLogReplayer should replay past final sync marker
> --
>
> Key: CASSANDRA-15228
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15228
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0, 4.0-alpha
>
>
> The sync markers existed to permit file re-use.  Since we no longer re-use 
> files, they no longer provide any value.  However, they _can_ corrupt the 
> commit log for replay in the event of a process crash.  Before we release 
> 4.0, we should ideally remove the sync markers entirely.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15228) Commit Log should not use sync markers

2019-09-02 Thread Benedict (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920747#comment-16920747
 ] 

Benedict commented on CASSANDRA-15228:
--

It had been a long time since I looked at the commit log code, and taking a 
look now, this has obviously become tightly interwoven with other extended 
behaviours (encryption and compression) that I would prefer not to unpick 
anytime soon.  This isn't the simplifying change I had hoped it would be.  I 
think it would be preferable to defer any refactoring until we can address 
CASSANDRA-9834.

We can instead, at any time during 4.0 (or later if we choose) modify the 
replayer to simply ignore the lack of a final sync marker, and to attempt to 
replay the contents in-between, to restore earlier persistence behaviour 
without any user-visible file format changes.

> Commit Log should not use sync markers
> --
>
> Key: CASSANDRA-15228
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15228
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0, 4.0-alpha
>
>
> The sync markers existed to permit file re-use.  Since we no longer re-use 
> files, they no longer provide any value.  However, they _can_ corrupt the 
> commit log for replay in the event of a process crash.  Before we release 
> 4.0, we should ideally remove the sync markers entirely.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14772) Fix issues in audit / full query log interactions

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14772:
-
Fix Version/s: 4.0-rc

> Fix issues in audit / full query log interactions
> -
>
> Key: CASSANDRA-14772
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14772
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL, Legacy/Tools
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0, 4.0-rc
>
>
> There are some problems with the audit + full query log code that need to be 
> resolved before 4.0 is released:
> * Fix performance regression in FQL that makes it less usable than it should 
> be.
> * move full query log specific code to a separate package 
> * do some audit log class renames (I keep reading {{BinLogAuditLogger}} vs 
> {{BinAuditLogger}} wrong for example)
> * avoid parsing the CQL queries twice in {{QueryMessage}} when audit log is 
> enabled.
> * add a new tool to dump audit logs (ie, let fqltool be full query log 
> specific). fqltool crashes when pointed to them.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-13994:
-
Fix Version/s: (was: 4.x)
   4.0-rc

> Remove COMPACT STORAGE internals before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Low
> Fix For: 4.0-rc
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14801) calculatePendingRanges no longer safe for multiple adjacent range movements

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14801:
-
Fix Version/s: 4.0

> calculatePendingRanges no longer safe for multiple adjacent range movements
> ---
>
> Key: CASSANDRA-14801
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14801
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination, Legacy/Distributed Metadata
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
>
> Correctness depended upon the narrowing to a {{Set}}, 
> which we no longer do - we maintain a collection of all {{Replica}}.  Our 
> {{RangesAtEndpoint}} collection built by {{getPendingRanges}} can as a result 
> contain the same endpoint multiple times; and our {{EndpointsForToken}} 
> obtained by {{TokenMetadata.pendingEndpointsFor}} may fail to be constructed, 
> resulting in cluster-wide failures for writes to the affected token ranges 
> for the duration of the range movement.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15214) OOMs caught and not rethrown

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15214:
-
Fix Version/s: 4.0

> OOMs caught and not rethrown
> 
>
> Key: CASSANDRA-15214
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15214
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client, Messaging/Internode
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0, 4.0-rc
>
> Attachments: oom-experiments.zip
>
>
> Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, 
> so presently there is no way to ensure that an OOM reaches the JVM handler to 
> trigger a crash/heapdump.
> It may be that the simplest most consistent way to do this would be to have a 
> single thread spawned at startup that waits for any exceptions we must 
> propagate to the Runtime.
> We could probably submit a patch upstream to Netty, but for a guaranteed 
> future proof approach, it may be worth paying the cost of a single thread.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14888:
-
Fix Version/s: 4.0

> Several mbeans are not unregistered when dropping a keyspace and table
> --
>
> Key: CASSANDRA-14888
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14888
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Ariel Weisberg
>Assignee: Alex Deparvu
>Priority: Urgent
>  Labels: patch-available
> Fix For: 4.0, 4.0-rc
>
> Attachments: CASSANDRA-14888.patch
>
>
> CasCommit, CasPrepare, CasPropose, ReadRepairRequests, 
> ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, 
> PartitionsValidated, RepairPrepareTime, RepairSyncTime, 
> RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, 
> WriteFailedIdealCL
> Basically for 3 years people haven't known what they are doing because the 
> entire thing is kind of obscure. Fix it and also add a dtest that detects if 
> any mbeans are left behind after dropping a table and keyspace.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15234) Standardise config and JVM parameters

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15234:
-
Fix Version/s: 4.0

> Standardise config and JVM parameters
> -
>
> Key: CASSANDRA-15234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15234
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
>
> We have a bunch of inconsistent names and config patterns in the codebase, 
> both from the yams and JVM properties.  It would be nice to standardise the 
> naming (such as otc_ vs internode_) as well as the provision of values with 
> units - while maintaining perpetual backwards compatibility with the old 
> parameter names, of course.
> For temporal units, I would propose parsing strings with suffixes of:
> {{code}}
> u|micros(econds?)?
> s(econds?)?
> m(inutes?)?
> h(ours?)?
> d(ays?)?
> mo(nths?)?
> {{code}}
> For rate units, I would propose parsing any of the standard {{B/s, KiB/s, 
> MiB/s, GiB/s, TiB/s}}.
> Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or 
> powers of 1000 such as {{KB/s}}, given these are regularly used for either 
> their old or new definition e.g. {{KiB/s}}, or we could support them and 
> simply log the value in bytes/s.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-10190) Python 3 support for cqlsh

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-10190:
-
Fix Version/s: 4.0

> Python 3 support for cqlsh
> --
>
> Key: CASSANDRA-10190
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10190
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Tools
>Reporter: Andrew Pennebaker
>Assignee: Patrick Bannister
>Priority: Normal
>  Labels: cqlsh
> Fix For: 4.0, 4.0-alpha
>
> Attachments: coverage_notes.txt
>
>
> Users who operate in a Python 3 environment may have trouble launching cqlsh. 
> Could we please update cqlsh's syntax to run in Python 3?
> As a workaround, users can setup pyenv, and cd to a directory with a 
> .python-version containing "2.7". But it would be nice if cqlsh supported 
> modern Python versions out of the box.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-13994:
-
Fix Version/s: 4.0

> Remove COMPACT STORAGE internals before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Low
> Fix For: 4.0, 4.0-rc
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15216) Cross node message creation times are disabled by default

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15216:
-
Fix Version/s: 4.0

> Cross node message creation times are disabled by default
> -
>
> Key: CASSANDRA-15216
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15216
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0, 4.0-alpha
>
>
> This can cause a lot of wasted work for messages that have timed out on the 
> coordinator.  We should generally assume that our users have setup NTP on 
> their clusters, and that clocks are modestly in sync, since it’s a 
> requirement for general correctness of last write wins.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14973) Bring v5 driver out of beta, introduce v6 before 4.0 release is cut

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14973:
-
Fix Version/s: 4.0

> Bring v5 driver out of beta, introduce v6 before 4.0 release is cut
> ---
>
> Key: CASSANDRA-14973
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14973
> Project: Cassandra
>  Issue Type: Task
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Urgent
> Fix For: 4.0, 4.0-rc
>
>
> In http://issues.apache.org/jira/browse/CASSANDRA-12142, we’ve introduced 
> Beta flag for v5 protocol. However, up till now, v5 is in beta both in 
> [Cassandra|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/ProtocolVersion.java#L46]
>  and in 
> [java-driver|https://github.com/datastax/java-driver/blob/3.x/driver-core/src/main/java/com/datastax/driver/core/ProtocolVersion.java#L35].
>  
> Before the final 4.0 release is cut, we need to bring v5 out of beta and 
> finalise native protocol spec, and start bringing all new changes to v6 
> protocol, which will be in beta.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15229) BufferPool Regression

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15229:
-
Fix Version/s: 4.0

> BufferPool Regression
> -
>
> Key: CASSANDRA-15229
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15229
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Caching
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
>
> The BufferPool was never intended to be used for a {{ChunkCache}}, and we 
> need to either change our behaviour to handle uncorrelated lifetimes or use 
> something else.  This is particularly important with the default chunk size 
> for compressed sstables being reduced.  If we address the problem, we should 
> also utilise the BufferPool for native transport connections like we do for 
> internode messaging, and reduce the number of pooling solutions we employ.
> Probably the best thing to do is to improve BufferPool’s behaviour when used 
> for things with uncorrelated lifetimes, which essentially boils down to 
> tracking those chunks that have not been freed and re-circulating them when 
> we run out of completely free blocks.  We should probably also permit 
> instantiating separate {{BufferPool}}, so that we can insulate internode 
> messaging from the {{ChunkCache}}, or at least have separate memory bounds 
> for each, and only share fully-freed chunks.
> With these improvements we can also safely increase the {{BufferPool}} chunk 
> size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce 
> the amount of global coordination and per-allocation overhead.  We don’t need 
> 1KiB granularity for allocations, nor 16 byte granularity for tiny 
> allocations.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14973) Bring v5 driver out of beta, introduce v6 before 4.0 release is cut

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14973:
-
Fix Version/s: (was: 4.0)
   4.0-rc

> Bring v5 driver out of beta, introduce v6 before 4.0 release is cut
> ---
>
> Key: CASSANDRA-14973
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14973
> Project: Cassandra
>  Issue Type: Task
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Urgent
> Fix For: 4.0-rc
>
>
> In http://issues.apache.org/jira/browse/CASSANDRA-12142, we’ve introduced 
> Beta flag for v5 protocol. However, up till now, v5 is in beta both in 
> [Cassandra|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/ProtocolVersion.java#L46]
>  and in 
> [java-driver|https://github.com/datastax/java-driver/blob/3.x/driver-core/src/main/java/com/datastax/driver/core/ProtocolVersion.java#L35].
>  
> Before the final 4.0 release is cut, we need to bring v5 out of beta and 
> finalise native protocol spec, and start bringing all new changes to v6 
> protocol, which will be in beta.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14775) StreamingTombstoneHistogramBuilder overflows if > 2B in a single bucket/stabile

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14775:
-
Fix Version/s: 4.0

> StreamingTombstoneHistogramBuilder overflows if > 2B in a single 
> bucket/stabile
> ---
>
> Key: CASSANDRA-14775
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14775
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Benedict
>Assignee: Alex Deparvu
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
>
> This may be unlikely, but is certainly not impossible.  In this event, the 
> count for the bucket will be reset to zero, and the time distorted to 1s in 
> the future.  If MAX_DELETION_TIME were encountered through overflow, this 
> might result in a bucket with NO_DELETION_TIME.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14773) Overflow of 32-bit integer during compaction.

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14773:
-
Fix Version/s: 4.0

> Overflow of 32-bit integer during compaction.
> -
>
> Key: CASSANDRA-14773
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14773
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Vladimir Bukhtoyarov
>Assignee: Vladimir Bukhtoyarov
>Priority: Urgent
> Fix For: 4.0, 4.0-beta
>
>
> In scope of CASSANDRA-13444 the compaction was significantly improved from 
> CPU and memory perspective. Hovewer this improvement introduces the bug in 
> rounding. When rounding the expriration time which is close to  
> *Cell.MAX_DELETION_TIME*(it is just *Integer.MAX_VALUE*) the math overflow 
> happens(because in scope of -CASSANDRA-13444-) data type for point was 
> changed from Long to Integer in order to reduce memory footprint), as result 
> point became negative and acts as silent poison for internal structures of 
> StreamingTombstoneHistogramBuilder like *DistanceHolder* and *DataHolder*. 
> Then depending of point intervals:
>  * The TombstoneHistogram produces wrong values when interval of points is 
> less then binSize, it is not critical.
>  * Compaction crashes with ArrayIndexOutOfBoundsException if amount of point 
> intervals is great then  binSize, this case is very critical.
>  
> This is pull request [https://github.com/apache/cassandra/pull/273] that 
> reproduces the issue and provides the fix. 
>  
> The stacktrace when running(on codebase without fix) 
> *testMathOverflowDuringRoundingOfLargeTimestamp* without -ea JVM flag
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException
> at java.lang.System.arraycopy(Native Method)
> at 
> org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$DistanceHolder.add(StreamingTombstoneHistogramBuilder.java:208)
> at 
> org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.flushValue(StreamingTombstoneHistogramBuilder.java:140)
> at 
> org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$$Lambda$1/1967205423.consume(Unknown
>  Source)
> at 
> org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$Spool.forEach(StreamingTombstoneHistogramBuilder.java:574)
> at 
> org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.flushHistogram(StreamingTombstoneHistogramBuilder.java:124)
> at 
> org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.build(StreamingTombstoneHistogramBuilder.java:184)
> at 
> org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilderTest.testMathOverflowDuringRoundingOfLargeTimestamp(StreamingTombstoneHistogramBuilderTest.java:183)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
> at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:159)
> at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {noformat}
>  
> The stacktrace when running(on codebase without fix)  
> *testMathOverflowDuringRoundingOfLargeTimestamp* with 

[jira] [Updated] (CASSANDRA-14748) Recycler$WeakOrderQueue occupies Heap

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14748:
-
Fix Version/s: 4.0

> Recycler$WeakOrderQueue occupies Heap
> -
>
> Key: CASSANDRA-14748
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14748
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
> Environment: The netty Cassandra using is netty-all-4.0.39.Final.jar
>Reporter: HX
>Priority: Normal
> Fix For: 4.0, 3.11.x, 4.0-rc
>
>
> Heap constantly high on some of the nodes in the cluster, I dump the heap and 
> open it through Eclipse Memory Analyzer, looks like Recycler$WeakOrderQueue 
> occupies most of the heap. 
>  
> ||Package||Retained Heap||Retained Heap, %||# Top Dominators||
> |!/jira/icons/i5.gif! |7,078,140,136|100.00%|379,627|
> |io|5,665,035,800|80.04%|13,306|
> |netty|5,665,035,800|80.04%|13,306|
> |util|5,568,107,344|78.67%|2,965|
> |Recycler$WeakOrderQueue|4,950,021,544|69.93%|2,169|



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14834) Avoid keeping StreamingTombstoneHistogramBuilder.Spool in memory during the whole compaction

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14834:
-
Fix Version/s: 4.0

> Avoid keeping StreamingTombstoneHistogramBuilder.Spool in memory during the 
> whole compaction
> 
>
> Key: CASSANDRA-14834
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14834
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 4.0, 4.0-beta
>
>
> Since CASSANDRA-13444 {{StreamingTombstoneHistogramBuilder.Spool}} is 
> allocated to keep around an array with 131072 * 2 * 2 integers *per written 
> sstable* during the whole compaction. With LCS at times creating 1000s of 
> sstables during a compaction it kills the node.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15229) BufferPool Regression

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15229:
-
Fix Version/s: (was: 4.0)
   4.0-beta

> BufferPool Regression
> -
>
> Key: CASSANDRA-15229
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15229
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Caching
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0-beta
>
>
> The BufferPool was never intended to be used for a {{ChunkCache}}, and we 
> need to either change our behaviour to handle uncorrelated lifetimes or use 
> something else.  This is particularly important with the default chunk size 
> for compressed sstables being reduced.  If we address the problem, we should 
> also utilise the BufferPool for native transport connections like we do for 
> internode messaging, and reduce the number of pooling solutions we employ.
> Probably the best thing to do is to improve BufferPool’s behaviour when used 
> for things with uncorrelated lifetimes, which essentially boils down to 
> tracking those chunks that have not been freed and re-circulating them when 
> we run out of completely free blocks.  We should probably also permit 
> instantiating separate {{BufferPool}}, so that we can insulate internode 
> messaging from the {{ChunkCache}}, or at least have separate memory bounds 
> for each, and only share fully-freed chunks.
> With these improvements we can also safely increase the {{BufferPool}} chunk 
> size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce 
> the amount of global coordination and per-allocation overhead.  We don’t need 
> 1KiB granularity for allocations, nor 16 byte granularity for tiny 
> allocations.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14775) StreamingTombstoneHistogramBuilder overflows if > 2B in a single bucket/stabile

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14775:
-
Fix Version/s: (was: 4.0)
   4.0-beta

> StreamingTombstoneHistogramBuilder overflows if > 2B in a single 
> bucket/stabile
> ---
>
> Key: CASSANDRA-14775
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14775
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Benedict
>Assignee: Alex Deparvu
>Priority: Normal
> Fix For: 4.0-beta
>
>
> This may be unlikely, but is certainly not impossible.  In this event, the 
> count for the bucket will be reset to zero, and the time distorted to 1s in 
> the future.  If MAX_DELETION_TIME were encountered through overflow, this 
> might result in a bucket with NO_DELETION_TIME.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14834) Avoid keeping StreamingTombstoneHistogramBuilder.Spool in memory during the whole compaction

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14834:
-
Fix Version/s: (was: 4.0)
   4.0-beta

> Avoid keeping StreamingTombstoneHistogramBuilder.Spool in memory during the 
> whole compaction
> 
>
> Key: CASSANDRA-14834
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14834
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 4.0-beta
>
>
> Since CASSANDRA-13444 {{StreamingTombstoneHistogramBuilder.Spool}} is 
> allocated to keep around an array with 131072 * 2 * 2 integers *per written 
> sstable* during the whole compaction. With LCS at times creating 1000s of 
> sstables during a compaction it kills the node.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14748) Recycler$WeakOrderQueue occupies Heap

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14748:
-
Fix Version/s: 4.0-rc

> Recycler$WeakOrderQueue occupies Heap
> -
>
> Key: CASSANDRA-14748
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14748
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
> Environment: The netty Cassandra using is netty-all-4.0.39.Final.jar
>Reporter: HX
>Priority: Normal
> Fix For: 3.11.x, 4.0-rc
>
>
> Heap constantly high on some of the nodes in the cluster, I dump the heap and 
> open it through Eclipse Memory Analyzer, looks like Recycler$WeakOrderQueue 
> occupies most of the heap. 
>  
> ||Package||Retained Heap||Retained Heap, %||# Top Dominators||
> |!/jira/icons/i5.gif! |7,078,140,136|100.00%|379,627|
> |io|5,665,035,800|80.04%|13,306|
> |netty|5,665,035,800|80.04%|13,306|
> |util|5,568,107,344|78.67%|2,965|
> |Recycler$WeakOrderQueue|4,950,021,544|69.93%|2,169|



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14773) Overflow of 32-bit integer during compaction.

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14773:
-
Fix Version/s: (was: 4.x)
   4.0-beta

> Overflow of 32-bit integer during compaction.
> -
>
> Key: CASSANDRA-14773
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14773
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Vladimir Bukhtoyarov
>Assignee: Vladimir Bukhtoyarov
>Priority: Urgent
> Fix For: 4.0-beta
>
>
> In scope of CASSANDRA-13444 the compaction was significantly improved from 
> CPU and memory perspective. Hovewer this improvement introduces the bug in 
> rounding. When rounding the expriration time which is close to  
> *Cell.MAX_DELETION_TIME*(it is just *Integer.MAX_VALUE*) the math overflow 
> happens(because in scope of -CASSANDRA-13444-) data type for point was 
> changed from Long to Integer in order to reduce memory footprint), as result 
> point became negative and acts as silent poison for internal structures of 
> StreamingTombstoneHistogramBuilder like *DistanceHolder* and *DataHolder*. 
> Then depending of point intervals:
>  * The TombstoneHistogram produces wrong values when interval of points is 
> less then binSize, it is not critical.
>  * Compaction crashes with ArrayIndexOutOfBoundsException if amount of point 
> intervals is great then  binSize, this case is very critical.
>  
> This is pull request [https://github.com/apache/cassandra/pull/273] that 
> reproduces the issue and provides the fix. 
>  
> The stacktrace when running(on codebase without fix) 
> *testMathOverflowDuringRoundingOfLargeTimestamp* without -ea JVM flag
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException
> at java.lang.System.arraycopy(Native Method)
> at 
> org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$DistanceHolder.add(StreamingTombstoneHistogramBuilder.java:208)
> at 
> org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.flushValue(StreamingTombstoneHistogramBuilder.java:140)
> at 
> org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$$Lambda$1/1967205423.consume(Unknown
>  Source)
> at 
> org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$Spool.forEach(StreamingTombstoneHistogramBuilder.java:574)
> at 
> org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.flushHistogram(StreamingTombstoneHistogramBuilder.java:124)
> at 
> org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.build(StreamingTombstoneHistogramBuilder.java:184)
> at 
> org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilderTest.testMathOverflowDuringRoundingOfLargeTimestamp(StreamingTombstoneHistogramBuilderTest.java:183)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
> at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:159)
> at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {noformat}
>  
> The stacktrace when running(on codebase without fix)  
> 

[jira] [Updated] (CASSANDRA-15214) OOMs caught and not rethrown

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15214:
-
Fix Version/s: (was: 4.0-rc)
   4.0-beta

> OOMs caught and not rethrown
> 
>
> Key: CASSANDRA-15214
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15214
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client, Messaging/Internode
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: oom-experiments.zip
>
>
> Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, 
> so presently there is no way to ensure that an OOM reaches the JVM handler to 
> trigger a crash/heapdump.
> It may be that the simplest most consistent way to do this would be to have a 
> single thread spawned at startup that waits for any exceptions we must 
> propagate to the Runtime.
> We could probably submit a patch upstream to Netty, but for a guaranteed 
> future proof approach, it may be worth paying the cost of a single thread.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15214) OOMs caught and not rethrown

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15214:
-
Fix Version/s: (was: 4.0-beta)
   4.0-rc

> OOMs caught and not rethrown
> 
>
> Key: CASSANDRA-15214
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15214
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client, Messaging/Internode
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0-rc
>
> Attachments: oom-experiments.zip
>
>
> Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, 
> so presently there is no way to ensure that an OOM reaches the JVM handler to 
> trigger a crash/heapdump.
> It may be that the simplest most consistent way to do this would be to have a 
> single thread spawned at startup that waits for any exceptions we must 
> propagate to the Runtime.
> We could probably submit a patch upstream to Netty, but for a guaranteed 
> future proof approach, it may be worth paying the cost of a single thread.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14888:
-
Fix Version/s: (was: 4.0.x)
   4.0-rc

> Several mbeans are not unregistered when dropping a keyspace and table
> --
>
> Key: CASSANDRA-14888
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14888
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Ariel Weisberg
>Assignee: Alex Deparvu
>Priority: Urgent
>  Labels: patch-available
> Fix For: 4.0-rc
>
> Attachments: CASSANDRA-14888.patch
>
>
> CasCommit, CasPrepare, CasPropose, ReadRepairRequests, 
> ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, 
> PartitionsValidated, RepairPrepareTime, RepairSyncTime, 
> RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, 
> WriteFailedIdealCL
> Basically for 3 years people haven't known what they are doing because the 
> entire thing is kind of obscure. Fix it and also add a dtest that detects if 
> any mbeans are left behind after dropping a table and keyspace.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15214) OOMs caught and not rethrown

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15214:
-
Fix Version/s: (was: 4.0)
   4.0-rc

> OOMs caught and not rethrown
> 
>
> Key: CASSANDRA-15214
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15214
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client, Messaging/Internode
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0-rc
>
> Attachments: oom-experiments.zip
>
>
> Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, 
> so presently there is no way to ensure that an OOM reaches the JVM handler to 
> trigger a crash/heapdump.
> It may be that the simplest most consistent way to do this would be to have a 
> single thread spawned at startup that waits for any exceptions we must 
> propagate to the Runtime.
> We could probably submit a patch upstream to Netty, but for a guaranteed 
> future proof approach, it may be worth paying the cost of a single thread.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15234) Standardise config and JVM parameters

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15234:
-
Fix Version/s: 4.0-beta

> Standardise config and JVM parameters
> -
>
> Key: CASSANDRA-15234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15234
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We have a bunch of inconsistent names and config patterns in the codebase, 
> both from the yams and JVM properties.  It would be nice to standardise the 
> naming (such as otc_ vs internode_) as well as the provision of values with 
> units - while maintaining perpetual backwards compatibility with the old 
> parameter names, of course.
> For temporal units, I would propose parsing strings with suffixes of:
> {{code}}
> u|micros(econds?)?
> s(econds?)?
> m(inutes?)?
> h(ours?)?
> d(ays?)?
> mo(nths?)?
> {{code}}
> For rate units, I would propose parsing any of the standard {{B/s, KiB/s, 
> MiB/s, GiB/s, TiB/s}}.
> Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or 
> powers of 1000 such as {{KB/s}}, given these are regularly used for either 
> their old or new definition e.g. {{KiB/s}}, or we could support them and 
> simply log the value in bytes/s.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15216) Cross node message creation times are disabled by default

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15216:
-
Fix Version/s: (was: 4.0)
   4.0-alpha

> Cross node message creation times are disabled by default
> -
>
> Key: CASSANDRA-15216
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15216
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> This can cause a lot of wasted work for messages that have timed out on the 
> coordinator.  We should generally assume that our users have setup NTP on 
> their clusters, and that clocks are modestly in sync, since it’s a 
> requirement for general correctness of last write wins.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14801) calculatePendingRanges no longer safe for multiple adjacent range movements

2019-08-30 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14801:
-
Fix Version/s: (was: 4.0)
   4.0-beta

> calculatePendingRanges no longer safe for multiple adjacent range movements
> ---
>
> Key: CASSANDRA-14801
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14801
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination, Legacy/Distributed Metadata
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Correctness depended upon the narrowing to a {{Set}}, 
> which we no longer do - we maintain a collection of all {{Replica}}.  Our 
> {{RangesAtEndpoint}} collection built by {{getPendingRanges}} can as a result 
> contain the same endpoint multiple times; and our {{EndpointsForToken}} 
> obtained by {{TokenMetadata.pendingEndpointsFor}} may fail to be constructed, 
> resulting in cluster-wide failures for writes to the affected token ranges 
> for the duration of the range movement.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies

2019-08-29 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15213:
-
Fix Version/s: (was: 4.0)
   4.0-beta

> DecayingEstimatedHistogramReservoir Inefficiencies
> --
>
> Key: CASSANDRA-15213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0-beta
>
>
> * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user 
> schemas, and this will grow significantly under contention and user schemas 
> with many tables.  This is because {{LongAdder}} is a very heavy class 
> designed for single contended values.  
>  ** This can likely be improved significantly, without significant loss of 
> performance in the contended case, by simply increasing the size of our 
> primitive backing array and providing multiple buckets, with each thread 
> picking a bucket to increment, or simply multiple backing arrays.  Probably a 
> better way still to do this would be to introduce some competition detection 
> to the update, much like {{LongAdder}} utilises, that increases the number of 
> backing arrays under competition.
>  ** To save memory this approach could partition the space into chunks that 
> are likely to be updated together, so that we do not need to duplicate the 
> entire array under competition.
>  * Similarly, binary search is costly and a measurable cost as a share of the 
> new networking work (without filtering it was > 10% of the CPU used overall). 
>  We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, 
> to save the random memory access costs.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2019-08-29 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-13938:
-
Fix Version/s: (was: 4.0)
   4.0-alpha

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Nate McCall
>Assignee: Jason Brown
>Priority: Urgent
> Fix For: 4.0-alpha
>
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not necessary, but this one provides a 
> really simple schema and consistent data shape):
> {noformat}
> keyspace: standard_long
> keyspace_definition: |
>   CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor':3};
> table: test_data
> table_definition: |
>   CREATE TABLE test_data (
>   key text,
>   ts bigint,
>   val text,
>   PRIMARY KEY (key, ts)
>   ) WITH COMPACT STORAGE AND
>   CLUSTERING ORDER BY (ts DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.00 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> columnspec:
>   - name: key
> population: uniform(1..5000) # 50 million records available
>   - name: ts
> cluster: gaussian(1..50) # Up to 50 inserts per record
>   - name: val
> population: gaussian(128..1024) # varrying size of value data
> insert:
>   partitions: fixed(1) # only one insert per batch for individual partitions
>   select: fixed(1)/1 # each insert comes in one at a time
>   batchtype: UNLOGGED
> queries:
>   single:
> cql: select * from test_data where key = ? and ts = ? limit 1;
>   series:
> cql: select key,ts,val from test_data where key = ? limit 10;
> {noformat}
> The commands to build and run:
> {noformat}
> ccm create 4_0_test -v git:trunk -n 3 -s
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4
> # flush the memtable just to get everything on disk
> ccm node1 nodetool flush
> ccm node2 nodetool flush
> ccm node3 nodetool flush
> # disable hints for nodes 2 and 3
> ccm node2 nodetool disablehandoff
> ccm node3 nodetool disablehandoff
> # stop node1
> ccm node1 stop
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4
> # wait 10 seconds
> ccm node1 start
> # Note that we are local to ccm's nodetool install 'cause repair preview is 
> not reported yet
> node1/bin/nodetool repair --preview
> node1/bin/nodetool repair standard_long test_data
> {noformat} 
> The error outputs from the last repair command follow. First, this is stdout 
> from node1:
> {noformat}
> $ node1/bin/nodetool repair standard_long test_data
> objc[47876]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java 
> (0x10274d4c0) and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib
>  (0x1047b64e0). One of the two will be used. Which one is undefined.
> [2017-10-05 14:31:52,425] Starting repair command #4 
> (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with 
> repair options (parallelism: parallel, primary range: false, incremental: 
> true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: 
> [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: 
> false)
> [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 
> for range [(3074457345618258602,-9223372036854775808], 
> (-9223372036854775808,-3074457345618258603], 
> (-3074457345618258603,3074457345618258602]] failed with error Stream failed
> [2017-10-05 14:32:07,048] null
> [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds
> error: Repair job has failed with the error message: [2017-10-05 
> 14:32:07,048] null
> -- StackTrace --
> java.lang.RuntimeException: Repair job has failed with the error message: 
> [2017-10-05 14:32:07,048] null
> at 

[jira] [Commented] (CASSANDRA-15172) LegacyLayout RangeTombstoneList throws IndexOutOfBoundsException

2019-08-27 Thread Benedict (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916995#comment-16916995
 ] 

Benedict commented on CASSANDRA-15172:
--

Hi [~Sagges], I'm afraid I was on holiday so was unable to respond, and am now 
otherwise engaged for the next few weeks, but [~michaelsembwever] is mostly 
correct.  However, to clarify, the likely cause of this bug that I have 
established is not related to thrift or legacy tables (though atypical range 
tombstone use with thrift could cause it), but to communication from a 3.0 node 
to a 2.2 or 2.1 node, in the face of range tombstones that cover a primary key 
prefix.

That is to say, a schema of the form (pk, c1, c2, v), with a deletion on (pk, 
c1)

> LegacyLayout RangeTombstoneList throws IndexOutOfBoundsException
> 
>
> Key: CASSANDRA-15172
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15172
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Shalom
>Assignee: Benedict
>Priority: Normal
> Fix For: 3.0.19, 3.11.5
>
>
> Hi All,
> This is the first time I open an issue, so apologies if I'm not following the 
> rules properly.
>  
> After upgrading a node from version 2.1.21 to 3.11.4, we've started seeing a 
> lot of AbstractLocalAwareExecutorService exceptions. This happened right 
> after the node successfully started up with the new 3.11.4 binaries. 
> {noformat}
> INFO  [main] 2019-06-05 04:41:37,730 Gossiper.java:1715 - No gossip backlog; 
> proceeding
> INFO  [main] 2019-06-05 04:41:38,036 NativeTransportService.java:70 - Netty 
> using native Epoll event loop
> INFO  [main] 2019-06-05 04:41:38,117 Server.java:155 - Using Netty Version: 
> [netty-buffer=netty-buffer-4.0.44.Final.452812a, 
> netty-codec=netty-codec-4.0.44.Final.452812a, 
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
> netty-common=netty-common-4.0.44.Final.452812a, 
> netty-handler=netty-handler-4.0.44.Final.452812a, 
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
> netty-transport=netty-transport-4.0.44.Final.452812a, 
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
>  netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> INFO  [main] 2019-06-05 04:41:38,118 Server.java:156 - Starting listening for 
> CQL clients on /0.0.0.0:9042 (unencrypted)...
> INFO  [main] 2019-06-05 04:41:38,179 CassandraDaemon.java:556 - Not starting 
> RPC server as requested. Use JMX (StorageService->startRPCServer()) or 
> nodetool (enablethrift) to start it
> INFO  [Native-Transport-Requests-21] 2019-06-05 04:41:39,145 
> AuthCache.java:161 - (Re)initializing PermissionsCache (validity 
> period/update interval/max entries) (2000/2000/1000)
> INFO  [OptionalTasks:1] 2019-06-05 04:41:39,729 CassandraAuthorizer.java:409 
> - Converting legacy permissions data
> INFO  [HANDSHAKE-/10.10.10.8] 2019-06-05 04:41:39,808 
> OutboundTcpConnection.java:561 - Handshaking version with /10.10.10.8
> INFO  [HANDSHAKE-/10.10.10.9] 2019-06-05 04:41:39,808 
> OutboundTcpConnection.java:561 - Handshaking version with /10.10.10.9
> INFO  [HANDSHAKE-dc1_02/10.10.10.6] 2019-06-05 04:41:39,809 
> OutboundTcpConnection.java:561 - Handshaking version with dc1_02/10.10.10.6
> WARN  [ReadStage-2] 2019-06-05 04:41:39,857 
> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
> Thread[ReadStage-2,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: 1
>     at 
> org.apache.cassandra.db.AbstractBufferClusteringPrefix.get(AbstractBufferClusteringPrefix.java:55)
>     at 
> org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSizeCompound(LegacyLayout.java:2545)
>     at 
> org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSize(LegacyLayout.java:2522)
>     at 
> org.apache.cassandra.db.LegacyLayout.serializedSizeAsLegacyPartition(LegacyLayout.java:565)
>     at 
> org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:446)
>     at 
> org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:352)
>     at 
> org.apache.cassandra.net.MessageOut.payloadSize(MessageOut.java:171)
>     at 
> org.apache.cassandra.net.OutboundTcpConnectionPool.getConnection(OutboundTcpConnectionPool.java:77)
>     at 
> org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:802)
>     at 
> 

[jira] [Commented] (CASSANDRA-15289) bad merge reverted CASSANDRA-14993

2019-08-27 Thread Benedict (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916877#comment-16916877
 ] 

Benedict commented on CASSANDRA-15289:
--

+1, good catch, thanks for fixing - really confused as to what happened here, 
given this doesn't match either parent commit.

> bad merge reverted CASSANDRA-14993
> --
>
> Key: CASSANDRA-15289
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15289
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15285) Rename StatsMetadata estimatedColumnCount to estimatedCellPerPartitionCount

2019-08-20 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15285:
-
  Fix Version/s: (was: 4.x)
 4.0
Source Control Link: 
[db2ad0f7c59d9deb1f8755858cd630d640c5baa9|https://github.com/apache/cassandra/commit/db2ad0f7c59d9deb1f8755858cd630d640c5baa9]
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Rename StatsMetadata estimatedColumnCount to estimatedCellPerPartitionCount
> ---
>
> Key: CASSANDRA-15285
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15285
> Project: Cassandra
>  Issue Type: Task
>  Components: Local/SSTable
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Normal
> Fix For: 4.0
>
>
> This renaming was done previously in MetadataCollector but not in 
> StatsMetadata. The proposed name is more descriptive of what the field now 
> represents.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15285) Rename StatsMetadata estimatedColumnCount to estimatedCellPerPartitionCount

2019-08-20 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15285:
-
Status: Ready to Commit  (was: Review In Progress)

> Rename StatsMetadata estimatedColumnCount to estimatedCellPerPartitionCount
> ---
>
> Key: CASSANDRA-15285
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15285
> Project: Cassandra
>  Issue Type: Task
>  Components: Local/SSTable
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Normal
> Fix For: 4.x
>
>
> This renaming was done previously in MetadataCollector but not in 
> StatsMetadata. The proposed name is more descriptive of what the field now 
> represents.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15285) Rename StatsMetadata estimatedColumnCount to estimatedCellPerPartitionCount

2019-08-20 Thread Benedict (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911450#comment-16911450
 ] 

Benedict commented on CASSANDRA-15285:
--

Thanks.  I've pushed 
[here|https://github.com/belliottsmith/cassandra/tree/15285] a branch with the 
mentioned method, and also {{ColumnFamilyStore.getMeanColumns}} renamed.

There's also a reasonable question around if we should rename 
{{TableMetrics.estimatedColumnCountHistogram}}, but given this has 
compatibility implications (with its JMX name) I think it's OK to leave it.

If you want to +1 the follow-up, I'll commit to trunk.

> Rename StatsMetadata estimatedColumnCount to estimatedCellPerPartitionCount
> ---
>
> Key: CASSANDRA-15285
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15285
> Project: Cassandra
>  Issue Type: Task
>  Components: Local/SSTable
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Normal
> Fix For: 4.x
>
>
> This renaming was done previously in MetadataCollector but not in 
> StatsMetadata. The proposed name is more descriptive of what the field now 
> represents.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-20 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15170:
-
  Fix Version/s: 4.0
Source Control Link: 
[8dcaa12baa97ce870f23ff9045f968f2fa28b2cc|https://github.com/apache/cassandra/commit/8dcaa12baa97ce870f23ff9045f968f2fa28b2cc]
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.0
>
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-20 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15170:
-
Status: Patch Available  (was: In Progress)

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-20 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15170:
-
Reviewers: Benedict, Benedict  (was: Benedict)
   Benedict, Benedict  (was: Benedict)
   Status: Review In Progress  (was: Patch Available)

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-20 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15170:
-
Status: Ready to Commit  (was: Review In Progress)

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14740) BlockingReadRepair does not maintain monotonicity during range movements

2019-08-20 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14740:
-
Status: Open  (was: Patch Available)

> BlockingReadRepair does not maintain monotonicity during range movements
> 
>
> Key: CASSANDRA-14740
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14740
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination
>Reporter: Benedict
>Assignee: Benedict
>Priority: Urgent
>  Labels: correctness
> Fix For: 4.0
>
>
> The BlockingReadRepair code introduced by CASSANDRA-10726 requires that each 
> of the queried nodes are written to, but pending nodes are not considered.  
> If there is a pending range movement, one of these writes may be ‘lost’ when 
> the range movement completes.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15285) Rename StatsMetadata estimatedColumnCount to estimatedCellPerPartitionCount

2019-08-19 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15285:
-
Reviewers: Benedict, Benedict  (was: Benedict)
   Benedict, Benedict
   Status: Review In Progress  (was: Patch Available)

> Rename StatsMetadata estimatedColumnCount to estimatedCellPerPartitionCount
> ---
>
> Key: CASSANDRA-15285
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15285
> Project: Cassandra
>  Issue Type: Task
>  Components: Local/SSTable
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Normal
> Fix For: 4.x
>
>
> This renaming was done previously in MetadataCollector but not in 
> StatsMetadata. The proposed name is more descriptive of what the field now 
> represents.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-08-16 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908901#comment-16908901
 ] 

Benedict commented on CASSANDRA-15274:
--

You may need to modify the code in {{SSTableExport}} to include the line 
{{metadata.compressionParameters.setCrcCheckChance(0);}} at the start of 
{{export}}

> Multiple Corrupt datafiles across entire environment 
> -
>
> Key: CASSANDRA-15274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Phil O Conduin
>Priority: Normal
>
> Cassandra Version: 2.2.13
> PRE-PROD environment.
>  * 2 datacenters.
>  * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
>  * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
>  * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site 
> B.
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> Cassandra load is about 200GB and the rest of the space is snapshots
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> *Description of problem:*
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
>  
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
> Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle 
> tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
> (-1476350953672479093,-1474461
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
> Thread[ValidationExecutor:825,1,main]
> Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /x/ssd2/data/KeyspaceMetadata/x-1e453cb0
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:361) 
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:340)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:366)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: 

[jira] [Commented] (CASSANDRA-14415) Performance regression in queries for distinct keys

2019-08-16 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908884#comment-16908884
 ] 

Benedict commented on CASSANDRA-14415:
--

Late to the party, but I agree with Kurt that we should simply {{return 0}} for 
{{n < 0}}, and we should probably let {{seek}} handle the {{null}} buffer.

Looks like a good simple patch.  I don't see any blockers to this.

> Performance regression in queries for distinct keys
> ---
>
> Key: CASSANDRA-14415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14415
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Samuel Klock
>Assignee: Samuel Klock
>Priority: Normal
>  Labels: performance
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Running Cassandra 3.0.16, we observed a major performance regression 
> affecting {{SELECT DISTINCT keys}}-style queries against certain tables.  
> Based on some investigation (guided by some helpful feedback from Benjamin on 
> the dev list), we tracked the regression down to two problems.
>  * One is that Cassandra was reading more data from disk than was necessary 
> to satisfy the query.  This was fixed under CASSANDRA-10657 in a later 3.x 
> release.
>  * If the fix for CASSANDRA-10657 is incorporated, the other is this code 
> snippet in {{RebufferingInputStream}}:
> {code:java}
>     @Override
>     public int skipBytes(int n) throws IOException
>     {
>     if (n < 0)
>     return 0;
>     int requested = n;
>     int position = buffer.position(), limit = buffer.limit(), remaining;
>     while ((remaining = limit - position) < n)
>     {
>     n -= remaining;
>     buffer.position(limit);
>     reBuffer();
>     position = buffer.position();
>     limit = buffer.limit();
>     if (position == limit)
>     return requested - n;
>     }
>     buffer.position(position + n);
>     return requested;
>     }
> {code}
> The gist of it is that to skip bytes, the stream needs to read those bytes 
> into memory then throw them away.  In our tests, we were spending a lot of 
> time in this method, so it looked like the chief drag on performance.
> We noticed that the subclass of {{RebufferingInputStream}} in use for our 
> queries, {{RandomAccessReader}} (over compressed sstables), implements a 
> {{seek()}} method.  Overriding {{skipBytes()}} in it to use {{seek()}} 
> instead was sufficient to fix the performance regression.
> The performance difference is significant for tables with large values.  It's 
> straightforward to evaluate with very simple key-value tables, e.g.:
> {{CREATE TABLE testtable (key TEXT PRIMARY KEY, value BLOB);}}
> We did some basic experimentation with the following variations (all in a 
> single-node 3.11.2 cluster with off-the-shelf settings running on a dev 
> workstation):
>  * small values (1 KB, 100,000 entries), somewhat larger values (25 KB, 
> 10,000 entries), and much larger values (1 MB, 10,000 entries);
>  * compressible data (a single byte repeated) and uncompressible data (output 
> from {{openssl rand $bytes}}); and
>  * with and without sstable compression.  (With compression, we use 
> Cassandra's defaults.)
> The difference is most conspicuous for tables with large, uncompressible data 
> and sstable decompression (which happens to describe the use case that 
> triggered our investigation).  It is smaller but still readily apparent for 
> tables with effective compression.  For uncompressible data without 
> compression enabled, there is no appreciable difference.
> Here's what the performance looks like without our patch for the 1-MB entries 
> (times in seconds, five consecutive runs for each data set, all exhausting 
> the results from a {{SELECT DISTINCT key FROM ...}} query with a page size of 
> 24):
> {noformat}
> working on compressible
> 5.21180510521
> 5.10270500183
> 5.22311806679
> 4.6732840538
> 4.84219098091
> working on uncompressible_uncompressed
> 55.0423607826
> 0.769015073776
> 0.850513935089
> 0.713396072388
> 0.62596988678
> working on uncompressible
> 413.292617083
> 231.345913887
> 449.524993896
> 425.135111094
> 243.469946861
> {noformat}
> and with the fix:
> {noformat}
> working on compressible
> 2.86733293533
> 1.24895811081
> 1.108907938
> 1.12742400169
> 1.04647302628
> working on uncompressible_uncompressed
> 56.4146180153
> 0.895509958267
> 0.922824144363
> 0.772884130478
> 0.731923818588
> working on uncompressible
> 64.4587619305
> 1.81325793266
> 1.52577018738
> 1.41769099236
> 1.60442209244
> {noformat}
> The long initial runs for the uncompressible data presumably come from 
> repeatedly hitting the disk.  In contrast to the runs without the fix, the 
> 

[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-15 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908089#comment-16908089
 ] 

Benedict commented on CASSANDRA-15170:
--

Thanks.  Just a couple of tiny nits/questions:

The {{synchronizedList}} wrapper: while its use is probably unnecessary (since 
the global event executor is single threaded), it is probably clearer and more 
future-proof.  But perhaps it makes sense to only wrap the list for the time it 
is submitted as a consumer, e.g. {{synchronzedList(inboundExecutors)::add}}?

Should {{IsolatedExecutor.shutdown]} invoke {{ExecutorUtils.awaitTermination}} 
to report a {{TimeoutException}}?


> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15232) Arithmetic operators over decimal truncate results

2019-08-15 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15232:
-
Source Control Link: 
[d60e7988736ed4358595e9c781b110a5bbb5f812|https://github.com/apache/cassandra/commit/d60e7988736ed4358595e9c781b110a5bbb5f812]
 Status: Resolved  (was: Ready to Commit)
 Resolution: Fixed

> Arithmetic operators over decimal truncate results
> --
>
> Key: CASSANDRA-15232
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15232
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Benedict
>Assignee: Liudmila Kornilova
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The decimal operators hard-code a 128 bit precision for their computations.  
> Probably a precision needs to be configured or decided somehow, but it’s not 
> clear why 128bit was chosen.  Particularly for multiplication and addition, 
> it’s very unclear why we truncate, which is different to our behaviour for 
> e.g. sum() aggregates.  Probably for division we should also ensure that we 
> do not reduce the precision of the two operands.  A minimum of decimal128 
> seems reasonable, but a maximum does not.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-08-14 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907229#comment-16907229
 ] 

Benedict commented on CASSANDRA-15274:
--

bq. if they print their entire contents successfully there's already a 
reasonable chance that the data is not corrupted

This comment was alluding to that likelihood - but that we would instead fail 
to parse the data because of corruption of the stream, long before we printed 
any garbage out.  If we manage to print out, and we do this for every 
"corrupted" block (and there are many of them), it becomes very likely the 
files aren't truly corrupted.

> Multiple Corrupt datafiles across entire environment 
> -
>
> Key: CASSANDRA-15274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Phil O Conduin
>Priority: Normal
>
> Cassandra Version: 2.2.13
> PRE-PROD environment.
>  * 2 datacenters.
>  * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
>  * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
>  * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site 
> B.
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> Cassandra load is about 200GB and the rest of the space is snapshots
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> *Description of problem:*
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
>  
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
> Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle 
> tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
> (-1476350953672479093,-1474461
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
> Thread[ValidationExecutor:825,1,main]
> Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /x/ssd2/data/KeyspaceMetadata/x-1e453cb0
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:361) 
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:340)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> 

[jira] [Comment Edited] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-08-14 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907229#comment-16907229
 ] 

Benedict edited comment on CASSANDRA-15274 at 8/14/19 12:30 PM:


bq. if they print their entire contents successfully there's already a 
reasonable chance that the data is not corrupted

This comment was alluding to that likelihood - but that we would instead fail 
to parse the data because of corruption of the stream, long before we printed 
any garbage out.  If we manage to print out, and we do this for every 
"corrupted" block (and there are many of them), it becomes very likely (but not 
certain) that the files aren't truly corrupted.


was (Author: benedict):
bq. if they print their entire contents successfully there's already a 
reasonable chance that the data is not corrupted

This comment was alluding to that likelihood - but that we would instead fail 
to parse the data because of corruption of the stream, long before we printed 
any garbage out.  If we manage to print out, and we do this for every 
"corrupted" block (and there are many of them), it becomes very likely the 
files aren't truly corrupted.

> Multiple Corrupt datafiles across entire environment 
> -
>
> Key: CASSANDRA-15274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Phil O Conduin
>Priority: Normal
>
> Cassandra Version: 2.2.13
> PRE-PROD environment.
>  * 2 datacenters.
>  * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
>  * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
>  * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site 
> B.
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> Cassandra load is about 200GB and the rest of the space is snapshots
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> *Description of problem:*
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
>  
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
> Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle 
> tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
> (-1476350953672479093,-1474461
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
> Thread[ValidationExecutor:825,1,main]
> Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /x/ssd2/data/KeyspaceMetadata/x-1e453cb0
> Aug 07 22:30:33 

[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-08-13 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906354#comment-16906354
 ] 

Benedict commented on CASSANDRA-15274:
--

{{sstableexport}} / {{sstabledump}} are your friend here - pick a corrupted 
sstable, and print its contents.  I'm pretty sure that by default these tools 
do not verify the checksum, so if they print their entire contents successfully 
there's already a reasonable chance that the data is not corrupted.  But to be 
sure, exporting data for the same partition keys from sstables on other nodes, 
and comparing that the same data is produced, gives a high confidence that the 
data in the files is still valid.

This isn't quite as simple as it sounds, as there could be many records, many 
of which not contained in corrupt blocks, so it would be easier to modify 
{{sstableexport}} to detect the specifically corrupted blocks and only print 
the data contained within them.  There's also the problem that compaction can 
lead to different data on each node.  But picking a large and old sstable may 
give you a chance of fairly similar data residing on each node in comparable 
sstables.

> Multiple Corrupt datafiles across entire environment 
> -
>
> Key: CASSANDRA-15274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Phil O Conduin
>Priority: Normal
>
> Cassandra Version: 2.2.13
> PRE-PROD environment.
>  * 2 datacenters.
>  * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
>  * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
>  * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site 
> B.
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> Cassandra load is about 200GB and the rest of the space is snapshots
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> *Description of problem:*
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
>  
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
> Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle 
> tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
> (-1476350953672479093,-1474461
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
> Thread[ValidationExecutor:825,1,main]
> Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /x/ssd2/data/KeyspaceMetadata/x-1e453cb0
> Aug 07 22:30:33 cassandra[34611]: at 
> 

[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-08-13 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906288#comment-16906288
 ] 

Benedict commented on CASSANDRA-15274:
--

This error is _very_ suggestive of actual data file corruption, independent of 
C*.  This exception is thrown only when the raw data for a block, whose 
checksum was computed on write, no longer produces the same checksum.  C* never 
modifies a file once written, so in particular if these errors are being 
encountered for the first time against sstables that are older than your last 
successful repair we can essentially guarantee that the problem is with your 
system and not C*.  

How certain are you that your disks are reliable?

You can try to rule out actual corruption by comparing the contents of data 
written to files reporting these failures to the same data as it exists on 
other nodes in the cluster (whether or not the files on the other nodes report 
these errors).

> Multiple Corrupt datafiles across entire environment 
> -
>
> Key: CASSANDRA-15274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Phil O Conduin
>Priority: Normal
>
> Cassandra Version: 2.2.13
> PRE-PROD environment.
>  * 2 datacenters.
>  * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
>  * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
>  * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site 
> B.
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> Cassandra load is about 200GB and the rest of the space is snapshots
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> *Description of problem:*
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
>  
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
> Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle 
> tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
> (-1476350953672479093,-1474461
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
> Thread[ValidationExecutor:825,1,main]
> Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /x/ssd2/data/KeyspaceMetadata/x-1e453cb0
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> 

[jira] [Commented] (CASSANDRA-15263) LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null

2019-08-13 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906036#comment-16906036
 ] 

Benedict commented on CASSANDRA-15263:
--

Sorry [~ferozshaik...@gmail.com], I should have been clearer.  This may have a 
user visible impact, with the affected queries not responding from 3.0 nodes 
contacted by a 2.1 node when this error occurs.  This will have an availability 
impact, which might result in failed queries.  Once fully upgraded the problem 
will resolve, and there will be no lasting impact.  Depending on your cluster 
topology, you might be able to upgrade without any queries failing, only having 
reduced tolerance to node failures until the upgrade completes.

> LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null
> ---
>
> Key: CASSANDRA-15263
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15263
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: feroz shaik
>Assignee: Benedict
>Priority: Normal
>  Labels: 2.1.16, 3.11.4
> Attachments: sample.system.log, schema.txt, 
> sstabledump_sal_purge_d03.json, sstablemetadata_sal_purge_d03, 
> stack_trace.txt, system.log, system.log, system.log, system.log, 
> system_latest.log
>
>
> We have  hit a problem today while upgrading from 2.1.16 to 3.11.4.
> we encountered this as soon as the first node started up with 3.11.4 
> The full error stack is attached - [^stack_trace.txt] 
>  
> The below errors continued in the log file as long as the process was up.
> ERROR [Native-Transport-Requests-12] 2019-08-06 03:00:47,135 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-8] 2019-08-06 03:00:48,778 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-13] 2019-08-06 03:00:57,454 
>  
> The nodetool version says 3.11.4 and the no of connections on native por t- 
> 9042 was similar to other nodes. The exceptions were scary that we had to 
> call off the change. Any help and insights to this problem from the community 
> is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15194) Improve readability of Table metrics Virtual tables units

2019-08-12 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905230#comment-16905230
 ] 

Benedict commented on CASSANDRA-15194:
--

bq. While we're here, should we consider renaming median to 50th, so it sorts 
correctly wrt 99th? For consistency I'd love to see 100th, but this would mess 
with order. It might be clearer to name them p50, p99, though, so we can also 
introduce p999 and maintain sort order.

I realise this was really unclear, but I ended up suggesting p50, p99 etc as 
names, so that if we introduce p999 it makes sense (though I guess we could 
always call it 99.9th, and this should sort correctly still)

Not essential, just making sure my lack of clarity wasn't obscuring the 
discussion.

LGTM, +1 either way

> Improve readability of Table metrics Virtual tables units
> -
>
> Key: CASSANDRA-15194
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15194
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Virtual Tables
>Reporter: Jon Haddad
>Assignee: Chris Lohfink
>Priority: Normal
> Fix For: 4.0
>
>
> I just noticed this strange output in the coordinator_reads output::
> {code}
> cqlsh:system_views> select * from coordinator_reads ;
>  count | keyspace_name  | table_name | 99th | max | 
> median | per_second
> ---+++--+-++
>   7573 | tlp_stress |   keyvalue |0 |   0 |   
>0 | 2.2375e-16
>   6076 | tlp_stress |  random_access |0 |   0 |   
>0 | 7.4126e-12
>390 | tlp_stress |sensor_data_udt |0 |   0 |   
>0 | 1.7721e-64
> 30 | system |  local |0 |   0 |   
>0 |   0.006406
> 11 |  system_schema |columns |0 |   0 |   
>0 | 1.1192e-16
> 11 |  system_schema |indexes |0 |   0 |   
>0 | 1.1192e-16
> 11 |  system_schema | tables |0 |   0 |   
>0 | 1.1192e-16
> 11 |  system_schema |  views |0 |   0 |   
>0 | 1.1192e-16
> {code}
> cc [~cnlwsu]
> btw I realize the output is technically correct, but it's not very readable.  
> For practical purposes this should just say 0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15232) Arithmetic operators over decimal truncate results

2019-08-12 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15232:
-
Status: Ready to Commit  (was: Review In Progress)

Thanks [~Override], the patch looks great to me.  I'm just running it through 
our CI [here|https://circleci.com/gh/belliottsmith/cassandra/2796] before 
committing.

> Arithmetic operators over decimal truncate results
> --
>
> Key: CASSANDRA-15232
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15232
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Benedict
>Assignee: Liudmila Kornilova
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The decimal operators hard-code a 128 bit precision for their computations.  
> Probably a precision needs to be configured or decided somehow, but it’s not 
> clear why 128bit was chosen.  Particularly for multiplication and addition, 
> it’s very unclear why we truncate, which is different to our behaviour for 
> e.g. sum() aggregates.  Probably for division we should also ensure that we 
> do not reduce the precision of the two operands.  A minimum of decimal128 
> seems reasonable, but a maximum does not.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15172) LegacyLayout RangeTombstoneList throws IndexOutOfBoundsException

2019-08-12 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15172:
-
Test and Documentation Plan: unit test included
 Status: Patch Available  (was: Open)

This bug appears to be similar to CASSANDRA-15263, in that a reverse query with 
the RTBoundCloser is the likely source of asymmetric range tombstone bounds.  
However in this case the problem is much easier to solve; we simply have to not 
assume the bounds have the same length.

I have pushed a patch 
[here|https://github.com/belliottsmith/cassandra/tree/15172-3.0]

> LegacyLayout RangeTombstoneList throws IndexOutOfBoundsException
> 
>
> Key: CASSANDRA-15172
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15172
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Shalom
>Assignee: Benedict
>Priority: Normal
>
> Hi All,
> This is the first time I open an issue, so apologies if I'm not following the 
> rules properly.
>  
> After upgrading a node from version 2.1.21 to 3.11.4, we've started seeing a 
> lot of AbstractLocalAwareExecutorService exceptions. This happened right 
> after the node successfully started up with the new 3.11.4 binaries. 
> INFO  [main] 2019-06-05 04:41:37,730 Gossiper.java:1715 - No gossip backlog; 
> proceeding
> INFO  [main] 2019-06-05 04:41:38,036 NativeTransportService.java:70 - Netty 
> using native Epoll event loop
> INFO  [main] 2019-06-05 04:41:38,117 Server.java:155 - Using Netty Version: 
> [netty-buffer=netty-buffer-4.0.44.Final.452812a, 
> netty-codec=netty-codec-4.0.44.Final.452812a, 
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
> netty-common=netty-common-4.0.44.Final.452812a, 
> netty-handler=netty-handler-4.0.44.Final.452812a, 
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
> netty-transport=netty-transport-4.0.44.Final.452812a, 
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
>  netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> INFO  [main] 2019-06-05 04:41:38,118 Server.java:156 - Starting listening for 
> CQL clients on /0.0.0.0:9042 (unencrypted)...
> INFO  [main] 2019-06-05 04:41:38,179 CassandraDaemon.java:556 - Not starting 
> RPC server as requested. Use JMX (StorageService->startRPCServer()) or 
> nodetool (enablethrift) to start it
> INFO  [Native-Transport-Requests-21] 2019-06-05 04:41:39,145 
> AuthCache.java:161 - (Re)initializing PermissionsCache (validity 
> period/update interval/max entries) (2000/2000/1000)
> INFO  [OptionalTasks:1] 2019-06-05 04:41:39,729 CassandraAuthorizer.java:409 
> - Converting legacy permissions data
> INFO  [HANDSHAKE-/10.10.10.8] 2019-06-05 04:41:39,808 
> OutboundTcpConnection.java:561 - Handshaking version with /10.10.10.8
> INFO  [HANDSHAKE-/10.10.10.9] 2019-06-05 04:41:39,808 
> OutboundTcpConnection.java:561 - Handshaking version with /10.10.10.9
> INFO  [HANDSHAKE-dc1_02/10.10.10.6] 2019-06-05 04:41:39,809 
> OutboundTcpConnection.java:561 - Handshaking version with dc1_02/10.10.10.6
> WARN  [ReadStage-2] 2019-06-05 04:41:39,857 
> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
> Thread[ReadStage-2,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: 1
>     at 
> org.apache.cassandra.db.AbstractBufferClusteringPrefix.get(AbstractBufferClusteringPrefix.java:55)
>     at 
> org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSizeCompound(LegacyLayout.java:2545)
>     at 
> org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSize(LegacyLayout.java:2522)
>     at 
> org.apache.cassandra.db.LegacyLayout.serializedSizeAsLegacyPartition(LegacyLayout.java:565)
>     at 
> org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:446)
>     at 
> org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:352)
>     at 
> org.apache.cassandra.net.MessageOut.payloadSize(MessageOut.java:171)
>     at 
> org.apache.cassandra.net.OutboundTcpConnectionPool.getConnection(OutboundTcpConnectionPool.java:77)
>     at 
> org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:802)
>     at 
> org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:953)
>     at 
> org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:929)
>     at 
> 

[jira] [Updated] (CASSANDRA-15172) LegacyLayout RangeTombstoneList throws IndexOutOfBoundsException

2019-08-12 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15172:
-
 Severity: Normal
   Complexity: Low Hanging Fruit
Discovered By: User Report
 Bug Category: Parent values: Availability(12983)Level 1 values: Response 
Crash(12991)
  Component/s: Local/Other
   Status: Open  (was: Triage Needed)

> LegacyLayout RangeTombstoneList throws IndexOutOfBoundsException
> 
>
> Key: CASSANDRA-15172
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15172
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Shalom
>Assignee: Benedict
>Priority: Normal
>
> Hi All,
> This is the first time I open an issue, so apologies if I'm not following the 
> rules properly.
>  
> After upgrading a node from version 2.1.21 to 3.11.4, we've started seeing a 
> lot of AbstractLocalAwareExecutorService exceptions. This happened right 
> after the node successfully started up with the new 3.11.4 binaries. 
> INFO  [main] 2019-06-05 04:41:37,730 Gossiper.java:1715 - No gossip backlog; 
> proceeding
> INFO  [main] 2019-06-05 04:41:38,036 NativeTransportService.java:70 - Netty 
> using native Epoll event loop
> INFO  [main] 2019-06-05 04:41:38,117 Server.java:155 - Using Netty Version: 
> [netty-buffer=netty-buffer-4.0.44.Final.452812a, 
> netty-codec=netty-codec-4.0.44.Final.452812a, 
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
> netty-common=netty-common-4.0.44.Final.452812a, 
> netty-handler=netty-handler-4.0.44.Final.452812a, 
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
> netty-transport=netty-transport-4.0.44.Final.452812a, 
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
>  netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> INFO  [main] 2019-06-05 04:41:38,118 Server.java:156 - Starting listening for 
> CQL clients on /0.0.0.0:9042 (unencrypted)...
> INFO  [main] 2019-06-05 04:41:38,179 CassandraDaemon.java:556 - Not starting 
> RPC server as requested. Use JMX (StorageService->startRPCServer()) or 
> nodetool (enablethrift) to start it
> INFO  [Native-Transport-Requests-21] 2019-06-05 04:41:39,145 
> AuthCache.java:161 - (Re)initializing PermissionsCache (validity 
> period/update interval/max entries) (2000/2000/1000)
> INFO  [OptionalTasks:1] 2019-06-05 04:41:39,729 CassandraAuthorizer.java:409 
> - Converting legacy permissions data
> INFO  [HANDSHAKE-/10.10.10.8] 2019-06-05 04:41:39,808 
> OutboundTcpConnection.java:561 - Handshaking version with /10.10.10.8
> INFO  [HANDSHAKE-/10.10.10.9] 2019-06-05 04:41:39,808 
> OutboundTcpConnection.java:561 - Handshaking version with /10.10.10.9
> INFO  [HANDSHAKE-dc1_02/10.10.10.6] 2019-06-05 04:41:39,809 
> OutboundTcpConnection.java:561 - Handshaking version with dc1_02/10.10.10.6
> WARN  [ReadStage-2] 2019-06-05 04:41:39,857 
> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
> Thread[ReadStage-2,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: 1
>     at 
> org.apache.cassandra.db.AbstractBufferClusteringPrefix.get(AbstractBufferClusteringPrefix.java:55)
>     at 
> org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSizeCompound(LegacyLayout.java:2545)
>     at 
> org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSize(LegacyLayout.java:2522)
>     at 
> org.apache.cassandra.db.LegacyLayout.serializedSizeAsLegacyPartition(LegacyLayout.java:565)
>     at 
> org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:446)
>     at 
> org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:352)
>     at 
> org.apache.cassandra.net.MessageOut.payloadSize(MessageOut.java:171)
>     at 
> org.apache.cassandra.net.OutboundTcpConnectionPool.getConnection(OutboundTcpConnectionPool.java:77)
>     at 
> org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:802)
>     at 
> org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:953)
>     at 
> org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:929)
>     at 
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:62)
>     at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     

[jira] [Updated] (CASSANDRA-15263) LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null

2019-08-12 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15263:
-
 Severity: Normal
   Complexity: Challenging
Discovered By: User Report
 Bug Category: Parent values: Availability(12983)Level 1 values: Response 
Crash(12991)
   Status: Open  (was: Triage Needed)

> LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null
> ---
>
> Key: CASSANDRA-15263
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15263
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: feroz shaik
>Assignee: Benedict
>Priority: Normal
>  Labels: 2.1.16, 3.11.4
> Attachments: sample.system.log, schema.txt, 
> sstabledump_sal_purge_d03.json, sstablemetadata_sal_purge_d03, 
> stack_trace.txt, system.log, system.log, system.log, system.log, 
> system_latest.log
>
>
> We have  hit a problem today while upgrading from 2.1.16 to 3.11.4.
> we encountered this as soon as the first node started up with 3.11.4 
> The full error stack is attached - [^stack_trace.txt] 
>  
> The below errors continued in the log file as long as the process was up.
> ERROR [Native-Transport-Requests-12] 2019-08-06 03:00:47,135 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-8] 2019-08-06 03:00:48,778 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-13] 2019-08-06 03:00:57,454 
>  
> The nodetool version says 3.11.4 and the no of connections on native por t- 
> 9042 was similar to other nodes. The exceptions were scary that we had to 
> call off the change. Any help and insights to this problem from the community 
> is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15263) LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null

2019-08-12 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904954#comment-16904954
 ] 

Benedict commented on CASSANDRA-15263:
--

Thanks [~ferozshaik...@gmail.com].  I can see what's happening now, and it 
looks benign.  It should resolve when you finish upgrading the nodes in your 
cluster.

The error is caused by the rare scenario of the rows not using all of the 
declared clustering columns, which inserts a {{null}} clustering value for 
{{column2}}.  It was not thought by the author of the legacy converter that a 
RT clustering component could be {{null}}, and they would have ordinarily been 
correct as row deletions are no longer stored as range tombstones in 3.0, 
however synthetic range tombstone bounds can be built from row clusterings, and 
since the row has a null component, the synthetic RT does also.  

The fix for 3.0 would be simple, namely to ignore the {{null}} value when 
computing a digest, however it looks like this {{null}} is also incompatible 
with 2.1, since it could legitimately never arise there, without the new 
machinery of 3.0 that synthesises them.  So sending this synthetic clustering 
to a 2.1 node could be more harmful than throwing this exception.

I will have to think about the best recourse to address this in 3.0 without 
adversely impacting a 2.1 node.

> LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null
> ---
>
> Key: CASSANDRA-15263
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15263
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: feroz shaik
>Assignee: Benedict
>Priority: Normal
>  Labels: 2.1.16, 3.11.4
> Attachments: sample.system.log, schema.txt, 
> sstabledump_sal_purge_d03.json, sstablemetadata_sal_purge_d03, 
> stack_trace.txt, system.log, system.log, system.log, system.log, 
> system_latest.log
>
>
> We have  hit a problem today while upgrading from 2.1.16 to 3.11.4.
> we encountered this as soon as the first node started up with 3.11.4 
> The full error stack is attached - [^stack_trace.txt] 
>  
> The below errors continued in the log file as long as the process was up.
> ERROR [Native-Transport-Requests-12] 2019-08-06 03:00:47,135 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-8] 2019-08-06 03:00:48,778 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-13] 2019-08-06 03:00:57,454 
>  
> The nodetool version says 3.11.4 and the no of connections on native por t- 
> 9042 was similar to other nodes. The exceptions were scary that we had to 
> call off the change. Any help and insights to this problem from the community 
> is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15263) LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null

2019-08-11 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904700#comment-16904700
 ] 

Benedict commented on CASSANDRA-15263:
--

Thanks [~ferozshaik...@gmail.com].  Surprisingly the new message I introduced 
hasn't shown up, suggesting a {{ClusteringBoundary}} rather than 
{{ClusteringBound}} is being constructed.  I've made a slight tweak, and 
confirmed that there's no other plausible way this bound could be created.  If 
you could run the latest patch version when you get a chance.

If you get around to running sstableexport for one of the affected keys on the 
affected table that would also be great, though hopefully this next debug run 
will give me something more to work with.

> LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null
> ---
>
> Key: CASSANDRA-15263
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15263
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: feroz shaik
>Assignee: Benedict
>Priority: Normal
>  Labels: 2.1.16, 3.11.4
> Attachments: sample.system.log, schema.txt, stack_trace.txt, 
> system.log, system.log, system.log, system_latest.log
>
>
> We have  hit a problem today while upgrading from 2.1.16 to 3.11.4.
> we encountered this as soon as the first node started up with 3.11.4 
> The full error stack is attached - [^stack_trace.txt] 
>  
> The below errors continued in the log file as long as the process was up.
> ERROR [Native-Transport-Requests-12] 2019-08-06 03:00:47,135 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-8] 2019-08-06 03:00:48,778 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-13] 2019-08-06 03:00:57,454 
>  
> The nodetool version says 3.11.4 and the no of connections on native por t- 
> 9042 was similar to other nodes. The exceptions were scary that we had to 
> call off the change. Any help and insights to this problem from the community 
> is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15263) LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null

2019-08-10 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904412#comment-16904412
 ] 

Benedict commented on CASSANDRA-15263:
--

Thanks [~ferozshaik...@gmail.com] - it looks like this may have been run with 
the first version of the debug build though, as it seems to be lacking the 
table name in the output.  I've pushed another version that will help us be 
certain we're seeing the correct version of debug output.

> LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null
> ---
>
> Key: CASSANDRA-15263
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15263
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: feroz shaik
>Assignee: Benedict
>Priority: Normal
>  Labels: 2.1.16, 3.11.4
> Attachments: sample.system.log, schema.txt, stack_trace.txt, 
> system.log, system.log, system_latest.log
>
>
> We have  hit a problem today while upgrading from 2.1.16 to 3.11.4.
> we encountered this as soon as the first node started up with 3.11.4 
> The full error stack is attached - [^stack_trace.txt] 
>  
> The below errors continued in the log file as long as the process was up.
> ERROR [Native-Transport-Requests-12] 2019-08-06 03:00:47,135 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-8] 2019-08-06 03:00:48,778 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-13] 2019-08-06 03:00:57,454 
>  
> The nodetool version says 3.11.4 and the no of connections on native por t- 
> 9042 was similar to other nodes. The exceptions were scary that we had to 
> call off the change. Any help and insights to this problem from the community 
> is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-09 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904026#comment-16904026
 ] 

Benedict commented on CASSANDRA-15170:
--

Thanks, looks good.

bq. The shutdown->shutdownNow change on the InfiniteLoopExecutor was just to 
match trunk naming, there's no functional change.

It was an unrelated point that I noticed, at the places you made the 
modification, we were invoking {{Executor.awaitTermination}} instead of 
{{ExecutorUtils.awaitTermination}}, the latter of which actually reports a 
failure to terminate as an exception.  We probably have other places in which 
we are inconsistent, I simply noticed these places while looking at the changes.

It looks like there's a related bad merge to trunk for 
{{shutdownReferenceReaper}}.  The {{EXEC}} and {{STRONG_LEAK_DETECTOR}} are 
being shutdown/awaited separately?

bq.  Sadly I can't reproduce it now and have lost the heap dump showing an 
example

The problem is that there's no good reason for this to have any impact.  If the 
shutdown task exits, the thread pool will be shutdown, so there's no apparent 
reason to need to depend on the instant core thread timeout.  It's not terribly 
important, since the change should only confer a very minor difference in code 
clarity, but it may cause people to scratch their heads later.  Anyway, it's 
probably not worth further investigation.

bq. The ColumnFamilyStore.shutdownExecutorsAndWait uses the builder so it can 
add the perDiskflushExecutors

My mistake, thanks for clarifying.

bq. We could introduce the shutdown and wait, but would mean 4 variants of it, 
I don't mind it as it is at the moment.

I don't mind terribly either way, either.

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-09 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903932#comment-16903932
 ] 

Benedict commented on CASSANDRA-15170:
--

Thanks.  It's looking good, just a few more minor(ish) items:

* {{NanoTimeToCurrentTimeMillis}}: {{updater}} should be {{final}} to ensure 
visibility, but better still would be to probably use {{InfiniteLoopExecutor}}. 
 I have a number of question marks around the pre-existing code there, though 
(like why it’s using an {{Object}} for waiting for instance), so up to you 
exactly what you do there.
* {{blockingIOThreead}} probably also needs to be {{volatile}}
* I may have lost track of changes here, but any reason why you got rid of the 
{{Feature}} enum in preference for a long?
* {{ColumnFamilyStore.shutdownExecutorsAndWait}}: use {{ImmutableList.of}}?
* Should we introduce {{ExecutorUtils.shutdownAndWait}} since we do it 
repeatedly?
* Some of the places you’ve switched {{shutdown}} to {{shutdownNow]} it might 
also be good to switch to {{ExecutorUtils.awaitTermination}} so that we can be 
informed if we haven’t shutdown/

Could you elaborate on what you saw with the single threaded executor?  It 
shouldn’t have had any impact on how long the threads were around for, and it 
might be indicative of some other problem?

For future reference, it helps for review to update existing branches with new 
commits, without rebasing, instead of creating new ones with a clean commit 
history.  We tend to rebase and clean-up just prior to commit, so that the 
reviewer can easily see what has changed between versions.  It's admittedly 
painful working with multiple branches, for all involved.

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15263) LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null

2019-08-09 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903851#comment-16903851
 ] 

Benedict commented on CASSANDRA-15263:
--

Thanks [~ferozshaik...@gmail.com].  If you could download the latest version 
and run again, I can get some further information about where the problematic 
range tombstone bound is being instantiated.

> LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null
> ---
>
> Key: CASSANDRA-15263
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15263
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: feroz shaik
>Assignee: Benedict
>Priority: Normal
>  Labels: 2.1.16, 3.11.4
> Attachments: sample.system.log, schema.txt, stack_trace.txt, 
> system.log, system_latest.log
>
>
> We have  hit a problem today while upgrading from 2.1.16 to 3.11.4.
> we encountered this as soon as the first node started up with 3.11.4 
> The full error stack is attached - [^stack_trace.txt] 
>  
> The below errors continued in the log file as long as the process was up.
> ERROR [Native-Transport-Requests-12] 2019-08-06 03:00:47,135 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-8] 2019-08-06 03:00:48,778 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-13] 2019-08-06 03:00:57,454 
>  
> The nodetool version says 3.11.4 and the no of connections on native por t- 
> 9042 was similar to other nodes. The exceptions were scary that we had to 
> call off the change. Any help and insights to this problem from the community 
> is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15263) LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null

2019-08-09 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903746#comment-16903746
 ] 

Benedict commented on CASSANDRA-15263:
--

Thanks [~ferozshaik...@gmail.com], this has the complete messages.  
Unfortunately I just noticed I didn't push my most recent version for you to 
use, and so we do not have the affected table name in the message.  If you pull 
the branch again and rebuild, these messages should print the table name as 
well, so you can more easily source the affected sstables (and I can take a 
look at the relevant schema definition)

> LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null
> ---
>
> Key: CASSANDRA-15263
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15263
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: feroz shaik
>Assignee: Benedict
>Priority: Normal
>  Labels: 2.1.16, 3.11.4
> Attachments: sample.system.log, schema.txt, stack_trace.txt, 
> system.log
>
>
> We have  hit a problem today while upgrading from 2.1.16 to 3.11.4.
> we encountered this as soon as the first node started up with 3.11.4 
> The full error stack is attached - [^stack_trace.txt] 
>  
> The below errors continued in the log file as long as the process was up.
> ERROR [Native-Transport-Requests-12] 2019-08-06 03:00:47,135 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-8] 2019-08-06 03:00:48,778 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-13] 2019-08-06 03:00:57,454 
>  
> The nodetool version says 3.11.4 and the no of connections on native por t- 
> 9042 was similar to other nodes. The exceptions were scary that we had to 
> call off the change. Any help and insights to this problem from the community 
> is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15263) LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null

2019-08-09 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903727#comment-16903727
 ] 

Benedict edited comment on CASSANDRA-15263 at 8/9/19 9:06 AM:
--

Thanks.  The sample system log appears to have had the message truncated though?

I'm not sure if the ASF has a secure system to upload to - perhaps somebody in 
the community knows better?  

[~ferozshaik...@gmail.com]: if you have somewhere you can upload it, and 
provide me access, that might be easier though.

Otherwise I'll see if I can find some other mechanism.


was (Author: benedict):
Thanks [~ferozshaik...@gmail.com].  The sample system log appears to have had 
the message truncated?

> LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null
> ---
>
> Key: CASSANDRA-15263
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15263
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: feroz shaik
>Assignee: Benedict
>Priority: Normal
>  Labels: 2.1.16, 3.11.4
> Attachments: sample.system.log, schema.txt, stack_trace.txt
>
>
> We have  hit a problem today while upgrading from 2.1.16 to 3.11.4.
> we encountered this as soon as the first node started up with 3.11.4 
> The full error stack is attached - [^stack_trace.txt] 
>  
> The below errors continued in the log file as long as the process was up.
> ERROR [Native-Transport-Requests-12] 2019-08-06 03:00:47,135 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-8] 2019-08-06 03:00:48,778 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-13] 2019-08-06 03:00:57,454 
>  
> The nodetool version says 3.11.4 and the no of connections on native por t- 
> 9042 was similar to other nodes. The exceptions were scary that we had to 
> call off the change. Any help and insights to this problem from the community 
> is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15263) LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null

2019-08-09 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903727#comment-16903727
 ] 

Benedict commented on CASSANDRA-15263:
--

Thanks [~ferozshaik...@gmail.com].  The sample system log appears to have had 
the message truncated?

> LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null
> ---
>
> Key: CASSANDRA-15263
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15263
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: feroz shaik
>Assignee: Benedict
>Priority: Normal
>  Labels: 2.1.16, 3.11.4
> Attachments: sample.system.log, schema.txt, stack_trace.txt
>
>
> We have  hit a problem today while upgrading from 2.1.16 to 3.11.4.
> we encountered this as soon as the first node started up with 3.11.4 
> The full error stack is attached - [^stack_trace.txt] 
>  
> The below errors continued in the log file as long as the process was up.
> ERROR [Native-Transport-Requests-12] 2019-08-06 03:00:47,135 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-8] 2019-08-06 03:00:48,778 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-13] 2019-08-06 03:00:57,454 
>  
> The nodetool version says 3.11.4 and the no of connections on native por t- 
> 9042 was similar to other nodes. The exceptions were scary that we had to 
> call off the change. Any help and insights to this problem from the community 
> is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15263) LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null

2019-08-09 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903676#comment-16903676
 ] 

Benedict commented on CASSANDRA-15263:
--

Yes, the branch is definitely 3.11.4, however that message should not be 
showing.  Are you perhaps building trunk, and not 15263-debug?  {{ant real 
clean && ant}} should produce {{build/apache-cassandra-3.11.4-SNAPSHOT.jar}}

> LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null
> ---
>
> Key: CASSANDRA-15263
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15263
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: feroz shaik
>Assignee: Benedict
>Priority: Normal
>  Labels: 2.1.16, 3.11.4
> Attachments: schema.txt, stack_trace.txt
>
>
> We have  hit a problem today while upgrading from 2.1.16 to 3.11.4.
> we encountered this as soon as the first node started up with 3.11.4 
> The full error stack is attached - [^stack_trace.txt] 
>  
> The below errors continued in the log file as long as the process was up.
> ERROR [Native-Transport-Requests-12] 2019-08-06 03:00:47,135 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-8] 2019-08-06 03:00:48,778 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-13] 2019-08-06 03:00:57,454 
>  
> The nodetool version says 3.11.4 and the no of connections on native por t- 
> 9042 was similar to other nodes. The exceptions were scary that we had to 
> call off the change. Any help and insights to this problem from the community 
> is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15263) LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null

2019-08-08 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903177#comment-16903177
 ] 

Benedict commented on CASSANDRA-15263:
--

Thanks for the extra info [~ferozshaik...@gmail.com].  I've pushed a slightly 
tweaked branch 
[here|https://github.com/belliottsmith/cassandra/tree/15263-debug], that is 
based on 3.11.4.  If you build and run this on your upgraded node, the errors 
should print some useful information.  This will include the range tombstones 
that are having their digest computed and, more importantly, the affected 
partition key.  If you could then run sstableexport for the partition key on 
all sstables, for all nodes, and upload the per-node output somewhere I can 
access it that would be great.  Hopefully that should be enough to at least pin 
down the problematic data, and we can work from there.

> LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null
> ---
>
> Key: CASSANDRA-15263
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15263
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: feroz shaik
>Assignee: Benedict
>Priority: Normal
>  Labels: 2.1.16, 3.11.4
> Attachments: schema.txt, stack_trace.txt
>
>
> We have  hit a problem today while upgrading from 2.1.16 to 3.11.4.
> we encountered this as soon as the first node started up with 3.11.4 
> The full error stack is attached - [^stack_trace.txt] 
>  
> The below errors continued in the log file as long as the process was up.
> ERROR [Native-Transport-Requests-12] 2019-08-06 03:00:47,135 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-8] 2019-08-06 03:00:48,778 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-13] 2019-08-06 03:00:57,454 
>  
> The nodetool version says 3.11.4 and the no of connections on native por t- 
> 9042 was similar to other nodes. The exceptions were scary that we had to 
> call off the change. Any help and insights to this problem from the community 
> is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15246) Add more information around commit message format expected for a patch

2019-08-08 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15246:
-
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> Add more information around commit message format expected for a patch
> --
>
> Key: CASSANDRA-15246
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15246
> Project: Cassandra
>  Issue Type: Task
>  Components: Documentation/Website
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
> Attachments: patch_commit_message.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is primarily from the suggestion 
> https://issues.apache.org/jira/browse/CASSANDRA-15013?focusedCommentId=16885255=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16885255,
>  to have the expected commit message format documented.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15232) Arithmetic operators over decimal truncate results

2019-08-08 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15232:
-
Status: Changes Suggested  (was: Review In Progress)

> Arithmetic operators over decimal truncate results
> --
>
> Key: CASSANDRA-15232
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15232
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Benedict
>Assignee: Liudmila Kornilova
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The decimal operators hard-code a 128 bit precision for their computations.  
> Probably a precision needs to be configured or decided somehow, but it’s not 
> clear why 128bit was chosen.  Particularly for multiplication and addition, 
> it’s very unclear why we truncate, which is different to our behaviour for 
> e.g. sum() aggregates.  Probably for division we should also ensure that we 
> do not reduce the precision of the two operands.  A minimum of decimal128 
> seems reasonable, but a maximum does not.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15232) Arithmetic operators over decimal truncate results

2019-08-08 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15232:
-
Status: Review In Progress  (was: Patch Available)

> Arithmetic operators over decimal truncate results
> --
>
> Key: CASSANDRA-15232
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15232
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Benedict
>Assignee: Liudmila Kornilova
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The decimal operators hard-code a 128 bit precision for their computations.  
> Probably a precision needs to be configured or decided somehow, but it’s not 
> clear why 128bit was chosen.  Particularly for multiplication and addition, 
> it’s very unclear why we truncate, which is different to our behaviour for 
> e.g. sum() aggregates.  Probably for division we should also ensure that we 
> do not reduce the precision of the two operands.  A minimum of decimal128 
> seems reasonable, but a maximum does not.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15232) Arithmetic operators over decimal truncate results

2019-08-08 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902898#comment-16902898
 ] 

Benedict commented on CASSANDRA-15232:
--

Thanks [~Override], I agree that's the best way forward.

The patch is almost ready to commit.  I have a couple of nits for you to 
consider:

# {{CONTEXT_WITH_MAX_PRECISION}} and {{MAX_PRECISION}} could be merged into 
simply {{MAX_PRECISION}} (as a {{MathContext}}) for brevity, since 
{{MAX_PRECISION}} isn't used independently?
# {{leftOperandFirstDigitPos}} and {{rightOperandFirstDigitPos}} both subtract 
1, which is cancelled out by the subtraction of one from the other, and are no 
longer used for any other calculation.  Perhaps rename the variables to 
something that indicates we're just getting the normalised scale wrt each 
other's decimal point?  I haven't come up with an immediately good name, but 
I'm sure it's possible.  Of course, I hope the compiler eliminates the extra 
calculation for us, so it's not strictly necessary - but it helps to avoid 
triggering any human linters in future, and more clearly defines the purpose of 
the variables.

Otherwise the patch looks ready to commit to me, thanks again for another great 
submission.

Reviewing this ticket highlights another follow-up ticket, or discussion, 
around the mod operator: I'm not entirely sure makes a lot of sense over 
decimal values.  Is this definitely a function we want to provide?  I've only 
ever understood it to be defined over the integer domain.  I note that 
{{BigDecimal}} names it {{remainder}}, and in fact the Java Language Spec does 
the same, so it is not modulus we are offering here and we should rename it 
internally and in any docs anyway.

We should probably discuss this for all value types, particularly floating 
point types.  I honestly was unaware Java supported remainder for floating 
point, and the behaviour is unintuitive to say the least ({{0.4 % 0.25 == 
0.15}} ??)

I've also noted your other follow-up, it's good we're shaking out these minor 
issues before we release the feature, so your help here is greatly appreciated!

> Arithmetic operators over decimal truncate results
> --
>
> Key: CASSANDRA-15232
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15232
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Benedict
>Assignee: Liudmila Kornilova
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The decimal operators hard-code a 128 bit precision for their computations.  
> Probably a precision needs to be configured or decided somehow, but it’s not 
> clear why 128bit was chosen.  Particularly for multiplication and addition, 
> it’s very unclear why we truncate, which is different to our behaviour for 
> e.g. sum() aggregates.  Probably for division we should also ensure that we 
> do not reduce the precision of the two operands.  A minimum of decimal128 
> seems reasonable, but a maximum does not.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15194) Improve readability of Table metrics Virtual tables units

2019-08-07 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15194:
-
Reviewers: Benedict
   Status: Review In Progress  (was: Patch Available)

> Improve readability of Table metrics Virtual tables units
> -
>
> Key: CASSANDRA-15194
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15194
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Virtual Tables
>Reporter: Jon Haddad
>Assignee: Chris Lohfink
>Priority: Normal
> Fix For: 4.0
>
>
> I just noticed this strange output in the coordinator_reads output::
> {code}
> cqlsh:system_views> select * from coordinator_reads ;
>  count | keyspace_name  | table_name | 99th | max | 
> median | per_second
> ---+++--+-++
>   7573 | tlp_stress |   keyvalue |0 |   0 |   
>0 | 2.2375e-16
>   6076 | tlp_stress |  random_access |0 |   0 |   
>0 | 7.4126e-12
>390 | tlp_stress |sensor_data_udt |0 |   0 |   
>0 | 1.7721e-64
> 30 | system |  local |0 |   0 |   
>0 |   0.006406
> 11 |  system_schema |columns |0 |   0 |   
>0 | 1.1192e-16
> 11 |  system_schema |indexes |0 |   0 |   
>0 | 1.1192e-16
> 11 |  system_schema | tables |0 |   0 |   
>0 | 1.1192e-16
> 11 |  system_schema |  views |0 |   0 |   
>0 | 1.1192e-16
> {code}
> cc [~cnlwsu]
> btw I realize the output is technically correct, but it's not very readable.  
> For practical purposes this should just say 0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15241) Virtual table to expose current running queries

2019-08-07 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15241:
-
Status: Review In Progress  (was: Patch Available)

> Virtual table to expose current running queries
> ---
>
> Key: CASSANDRA-15241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15241
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/Virtual Tables
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
>
> Expose current running queries and their duration.
> {code}cqlsh> select * from system_views.queries;
>  thread_id| duration_micros | task
> --+-+-
>  Native-Transport-Requests-17 |6325 |  QUERY 
> select * from system_views.queries; [pageSize = 100]
>   Native-Transport-Requests-4 |   14681 | EXECUTE 
> f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE
>   Native-Transport-Requests-6 |   14678 | EXECUTE 
> f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE
>  ReadStage-10 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-13 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-14 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-19 |   11861 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-20 |   11861 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-22 |7279 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-23 |4716 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-5 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-7 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-8 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15241) Virtual table to expose current running queries

2019-08-07 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15241:
-
Status: Changes Suggested  (was: Review In Progress)

> Virtual table to expose current running queries
> ---
>
> Key: CASSANDRA-15241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15241
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/Virtual Tables
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
>
> Expose current running queries and their duration.
> {code}cqlsh> select * from system_views.queries;
>  thread_id| duration_micros | task
> --+-+-
>  Native-Transport-Requests-17 |6325 |  QUERY 
> select * from system_views.queries; [pageSize = 100]
>   Native-Transport-Requests-4 |   14681 | EXECUTE 
> f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE
>   Native-Transport-Requests-6 |   14678 | EXECUTE 
> f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE
>  ReadStage-10 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-13 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-14 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-19 |   11861 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-20 |   11861 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-22 |7279 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-23 |4716 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-5 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-7 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-8 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15194) Improve readability of Table metrics Virtual tables units

2019-08-07 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15194:
-
Status: Changes Suggested  (was: Review In Progress)

> Improve readability of Table metrics Virtual tables units
> -
>
> Key: CASSANDRA-15194
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15194
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Virtual Tables
>Reporter: Jon Haddad
>Assignee: Chris Lohfink
>Priority: Normal
> Fix For: 4.0
>
>
> I just noticed this strange output in the coordinator_reads output::
> {code}
> cqlsh:system_views> select * from coordinator_reads ;
>  count | keyspace_name  | table_name | 99th | max | 
> median | per_second
> ---+++--+-++
>   7573 | tlp_stress |   keyvalue |0 |   0 |   
>0 | 2.2375e-16
>   6076 | tlp_stress |  random_access |0 |   0 |   
>0 | 7.4126e-12
>390 | tlp_stress |sensor_data_udt |0 |   0 |   
>0 | 1.7721e-64
> 30 | system |  local |0 |   0 |   
>0 |   0.006406
> 11 |  system_schema |columns |0 |   0 |   
>0 | 1.1192e-16
> 11 |  system_schema |indexes |0 |   0 |   
>0 | 1.1192e-16
> 11 |  system_schema | tables |0 |   0 |   
>0 | 1.1192e-16
> 11 |  system_schema |  views |0 |   0 |   
>0 | 1.1192e-16
> {code}
> cc [~cnlwsu]
> btw I realize the output is technically correct, but it's not very readable.  
> For practical purposes this should just say 0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15194) Improve readability of Table metrics Virtual tables units

2019-08-07 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902110#comment-16902110
 ] 

Benedict commented on CASSANDRA-15194:
--

Thanks, the patch looks good.

I've pushed some minor suggestions 
[here|https://github.com/belliottsmith/cassandra/tree/15194-suggest], solely to 
introduce static compilation checks of the types of the metric we're supplying 
with some fancy generics.

Some other suggestions for discussion, that I haven't implemented, but are easy 
to do so:

In {{LatencyTableMetric.add}} it is probably sufficient to test 
{{column.endsWith(suffix)}}, given we statically define all of the regular 
columns?  If we wanted to we could impose a runtime check when building the 
metadata that only the expected columns end with this suffix, but this is 
probably unnecessary.

While we're here, should we consider renaming median to 50th, so it sorts 
correctly wrt 99th?  For consistency I'd love to see 100th, but this would mess 
with order.  It might be clearer to name them p50, p99, though, so we can also 
introduce p999 and maintain sort order.

> Improve readability of Table metrics Virtual tables units
> -
>
> Key: CASSANDRA-15194
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15194
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Virtual Tables
>Reporter: Jon Haddad
>Assignee: Chris Lohfink
>Priority: Normal
> Fix For: 4.0
>
>
> I just noticed this strange output in the coordinator_reads output::
> {code}
> cqlsh:system_views> select * from coordinator_reads ;
>  count | keyspace_name  | table_name | 99th | max | 
> median | per_second
> ---+++--+-++
>   7573 | tlp_stress |   keyvalue |0 |   0 |   
>0 | 2.2375e-16
>   6076 | tlp_stress |  random_access |0 |   0 |   
>0 | 7.4126e-12
>390 | tlp_stress |sensor_data_udt |0 |   0 |   
>0 | 1.7721e-64
> 30 | system |  local |0 |   0 |   
>0 |   0.006406
> 11 |  system_schema |columns |0 |   0 |   
>0 | 1.1192e-16
> 11 |  system_schema |indexes |0 |   0 |   
>0 | 1.1192e-16
> 11 |  system_schema | tables |0 |   0 |   
>0 | 1.1192e-16
> 11 |  system_schema |  views |0 |   0 |   
>0 | 1.1192e-16
> {code}
> cc [~cnlwsu]
> btw I realize the output is technically correct, but it's not very readable.  
> For practical purposes this should just say 0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15241) Virtual table to expose current running queries

2019-08-07 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902084#comment-16902084
 ] 

Benedict commented on CASSANDRA-15241:
--

bq. One problem was that the task field was not guaranteed under the JMM to 
ever see modifications to its contents

So, thinking on this more - this is probably not actually the case.  We always 
invoke a virtual method on executing the task (a virtual method on the task 
itself), which should require that the state of the class is fully visible (not 
to other threads, but we can depend on implementation details here).  If the 
task were to inspect this field, it should expect to see the value prior to its 
invocation.  So this is probably fine, and we can revert the changes I 
suggested here. 

> Virtual table to expose current running queries
> ---
>
> Key: CASSANDRA-15241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15241
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/Virtual Tables
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
>
> Expose current running queries and their duration.
> {code}cqlsh> select * from system_views.queries;
>  thread_id| duration_micros | task
> --+-+-
>  Native-Transport-Requests-17 |6325 |  QUERY 
> select * from system_views.queries; [pageSize = 100]
>   Native-Transport-Requests-4 |   14681 | EXECUTE 
> f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE
>   Native-Transport-Requests-6 |   14678 | EXECUTE 
> f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE
>  ReadStage-10 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-13 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-14 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-19 |   11861 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-20 |   11861 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-22 |7279 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-23 |4716 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-5 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-7 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-8 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15241) Virtual table to expose current running queries

2019-08-07 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902037#comment-16902037
 ] 

Benedict commented on CASSANDRA-15241:
--

The patch looks good, thanks.  It's a low cost, low risk and valuable insight 
into what the system is doing, so I think it will be really helpful.

I've pushed some minor suggestions 
[here|https://github.com/belliottsmith/cassandra/tree/15241-suggest].

One problem was that the {{task}} field was not guaranteed under the JMM to 
ever see modifications to its contents.  It may be that we typically would see 
them, but we may have to incur the cost of a lazySet to absolutely guarantee 
it.  Unfortunately I don't know of another mechanism for guaranteeing a write 
is made visible eventually from within a loop.

There was also a visibility issue with {{nowNanos}} in 
{{LocalMutationRunnable}}, since the task can be assigned to a worker before it 
runs, at which time the nowNanos will be zero and the running time would look 
absurd.

Finally, there remains an issue with the semantics of the start time - in some 
cases it's when the task starts running, in others when the task was created, 
so including any time spent queued.  If we want, we could quite easily expose 
_both_ of these items, and perhaps we might like to.  But otherwise we should 
pick one and stick to it for all of the implementations.  I would presume we 
are slightly _more_ interested in actual running time, as otherwise tasks 
queued up behind long running tasks might appear long running themselves - 
though this would require the executor to be entirely saturated.

We might also want to put a bit of time into improving the descriptions, to 
include e.g. the CQL being executed.  Right now, for executing a prepared 
statement, we only get the statementId and the options it is invoked with.

> Virtual table to expose current running queries
> ---
>
> Key: CASSANDRA-15241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15241
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/Virtual Tables
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
>
> Expose current running queries and their duration.
> {code}cqlsh> select * from system_views.queries;
>  thread_id| duration_micros | task
> --+-+-
>  Native-Transport-Requests-17 |6325 |  QUERY 
> select * from system_views.queries; [pageSize = 100]
>   Native-Transport-Requests-4 |   14681 | EXECUTE 
> f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE
>   Native-Transport-Requests-6 |   14678 | EXECUTE 
> f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE
>  ReadStage-10 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-13 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-14 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-19 |   11861 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-20 |   11861 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-22 |7279 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-23 |4716 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-5 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-7 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-8 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15225) FileUtils.close() does not handle non-IOException

2019-08-07 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15225:
-
  Fix Version/s: (was: 4.0.x)
 (was: 3.11.x)
 (was: 3.0.x)
 4.0
 3.11.5
 3.0.19
 2.2.15
Source Control Link: 
[f21106fcd0e5d870cf9d85b2d396eab9fe4515cd|https://github.com/apache/cassandra/commit/f21106fcd0e5d870cf9d85b2d396eab9fe4515cd]
  Since Version: 1.0.0
 Status: Resolved  (was: Ready to Commit)
 Resolution: Fixed

> FileUtils.close() does not handle non-IOException
> -
>
> Key: CASSANDRA-15225
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15225
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Benedict
>Assignee: Liudmila Kornilova
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 2.2.15, 3.0.19, 3.11.5, 4.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This can lead to {{close}} not being invoked on remaining items



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15225) FileUtils.close() does not handle non-IOException

2019-08-07 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15225:
-
Status: Review In Progress  (was: Patch Available)

> FileUtils.close() does not handle non-IOException
> -
>
> Key: CASSANDRA-15225
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15225
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Benedict
>Assignee: Liudmila Kornilova
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This can lead to {{close}} not being invoked on remaining items



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15225) FileUtils.close() does not handle non-IOException

2019-08-07 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15225:
-
Status: Ready to Commit  (was: Review In Progress)

> FileUtils.close() does not handle non-IOException
> -
>
> Key: CASSANDRA-15225
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15225
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Benedict
>Assignee: Liudmila Kornilova
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This can lead to {{close}} not being invoked on remaining items



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15225) FileUtils.close() does not handle non-IOException

2019-08-07 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15225:
-
Test and Documentation Plan: unnecessary
 Status: Patch Available  (was: Open)

> FileUtils.close() does not handle non-IOException
> -
>
> Key: CASSANDRA-15225
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15225
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Benedict
>Assignee: Liudmila Kornilova
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This can lead to {{close}} not being invoked on remaining items



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15225) FileUtils.close() does not handle non-IOException

2019-08-07 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901969#comment-16901969
 ] 

Benedict commented on CASSANDRA-15225:
--

Thanks [~Override].  One tiny nit: {{maybeFail}} doesn't require the null 
check, you can simply invoke it, and if the parameter is {{null}} nothing will 
happen.

If you want to prepare the final patch for commit, by squashing, adding a 
CHANGES.txt entry, and adding a commit message of the form described on 
CASSANDRA-15246, I'll commit it to trunk.

Welcome to the contributor community!

> FileUtils.close() does not handle non-IOException
> -
>
> Key: CASSANDRA-15225
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15225
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Benedict
>Assignee: Liudmila Kornilova
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This can lead to {{close}} not being invoked on remaining items



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15263) LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null

2019-08-07 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-15263:


Assignee: Benedict

> LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null
> ---
>
> Key: CASSANDRA-15263
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15263
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: feroz shaik
>Assignee: Benedict
>Priority: Normal
>  Labels: 2.1.16, 3.11.4
> Attachments: schema.txt, stack_trace.txt
>
>
> We have  hit a problem today while upgrading from 2.1.16 to 3.11.4.
> we encountered this as soon as the first node started up with 3.11.4 
> The full error stack is attached - [^stack_trace.txt] 
>  
> The below errors continued in the log file as long as the process was up.
> ERROR [Native-Transport-Requests-12] 2019-08-06 03:00:47,135 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-8] 2019-08-06 03:00:48,778 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-13] 2019-08-06 03:00:57,454 
>  
> The nodetool version says 3.11.4 and the no of connections on native por t- 
> 9042 was similar to other nodes. The exceptions were scary that we had to 
> call off the change. Any help and insights to this problem from the community 
> is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15263) LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null

2019-08-07 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901912#comment-16901912
 ] 

Benedict commented on CASSANDRA-15263:
--

Thanks [~ferozshaik...@gmail.com].  When you say this is one of the oldest and 
very first Cassandra implementations, and that legacy mode was thrift, do you 
mean that this schema was converted from a thrift schema, and that in the past 
it was written to using the thrift data model?

If so, this could very well explain the problem.

The problem appears to relate to range tombstones, i.e. those that cover a 
range of CQL rows within a partition.  These were intended to permit row 
deletions in the CQL data model, but it _was_ possible to construct them 
yourself using the thrift data model API, and it was possible to create range 
tombstones that could cause problems.  

It's _possible_ this is what is happening, but more information would help us 
figure out if so, and if so in what way and how it might be mitigated for you.

More information about how you delete data, particularly if you delete multiple 
rows (essentially range deletions), and preferably access to some affected 
sstables would be really helpful.  It may be hard to establish which sstables 
are affected, but we could potentially supply you with a build that would 
report this information in the error, although this would require you to 
upgrade again, perhaps in a development environment.

> LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null
> ---
>
> Key: CASSANDRA-15263
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15263
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: feroz shaik
>Priority: Normal
>  Labels: 2.1.16, 3.11.4
> Attachments: schema.txt, stack_trace.txt
>
>
> We have  hit a problem today while upgrading from 2.1.16 to 3.11.4.
> we encountered this as soon as the first node started up with 3.11.4 
> The full error stack is attached - [^stack_trace.txt] 
>  
> The below errors continued in the log file as long as the process was up.
> ERROR [Native-Transport-Requests-12] 2019-08-06 03:00:47,135 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-8] 2019-08-06 03:00:48,778 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-13] 2019-08-06 03:00:57,454 
>  
> The nodetool version says 3.11.4 and the no of connections on native por t- 
> 9042 was similar to other nodes. The exceptions were scary that we had to 
> call off the change. Any help and insights to this problem from the community 
> is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15262) server_encryption_options is not backwards compatible with 3.11

2019-08-06 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15262:
-
Reviewers: Benedict

> server_encryption_options is not backwards compatible with 3.11
> ---
>
> Key: CASSANDRA-15262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15262
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
>
> The current `server_encryption_options` configuration options are as follows:
> {noformat}
> server_encryption_options:
> # set to true for allowing secure incoming connections
> enabled: false
> # If enabled and optional are both set to true, encrypted and unencrypted 
> connections are handled on the storage_port
> optional: false
> # if enabled, will open up an encrypted listening socket on 
> ssl_storage_port. Should be used
> # during upgrade to 4.0; otherwise, set to false.
> enable_legacy_ssl_storage_port: false
> # on outbound connections, determine which type of peers to securely 
> connect to. 'enabled' must be set to true.
> internode_encryption: none
> keystore: conf/.keystore
> keystore_password: cassandra
> truststore: conf/.truststore
> truststore_password: cassandra
> # More advanced defaults below:
> # protocol: TLS
> # store_type: JKS
> # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
> # require_client_auth: false
> # require_endpoint_verification: false
> {noformat}
> A couple of issues here:
> 1. optional defaults to false, which will break existing TLS configurations 
> for (from what I can tell) no particularly good reason
> 2. The provided protocol and cipher suites are not good ideas (in particular 
> encouraging anyone to use CBC ciphers is a bad plan
> I propose that before the 4.0 cut we fixup server_encryption_options and even 
> client_encryption_options :
> # Change the default {{optional}} setting to true. As the new Netty code 
> intelligently decides to open a TLS connection or not this is the more 
> sensible default (saves operators a step while transitioning to TLS as well)
> # Update the defaults to what netty actually defaults to



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15225) FileUtils.close() does not handle non-IOException

2019-08-06 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901269#comment-16901269
 ] 

Benedict commented on CASSANDRA-15225:
--

Thanks for the patch [~Override].

There's a third option, namely accumulating the throwable in a {{Throwable}} 
variable, and maintaining the single catch clause.  We have a utility method, 
{{Throwables.maybeFail}} that takes a checked exception class, and rethrows the 
exception as its checked-type if possible, or as an unchecked type if possible, 
and otherwise wraps it in a {{RuntimeException}}.

Does that sound reasonable to you?



> FileUtils.close() does not handle non-IOException
> -
>
> Key: CASSANDRA-15225
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15225
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Benedict
>Assignee: Liudmila Kornilova
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This can lead to {{close}} not being invoked on remaining items



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15232) Arithmetic operators over decimal truncate results

2019-08-06 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901240#comment-16901240
 ] 

Benedict commented on CASSANDRA-15232:
--

Oh, also, it helps if you "Submit Patch", so I can "Start Review" etc :)

> Arithmetic operators over decimal truncate results
> --
>
> Key: CASSANDRA-15232
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15232
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Benedict
>Assignee: Liudmila Kornilova
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The decimal operators hard-code a 128 bit precision for their computations.  
> Probably a precision needs to be configured or decided somehow, but it’s not 
> clear why 128bit was chosen.  Particularly for multiplication and addition, 
> it’s very unclear why we truncate, which is different to our behaviour for 
> e.g. sum() aggregates.  Probably for division we should also ensure that we 
> do not reduce the precision of the two operands.  A minimum of decimal128 
> seems reasonable, but a maximum does not.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15232) Arithmetic operators over decimal truncate results

2019-08-06 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901239#comment-16901239
 ] 

Benedict commented on CASSANDRA-15232:
--

Thanks for the insight [~blerer] :)

[~Override] thanks for the patch.  It looks good.  I have only one concern, 
around division, namely how the scale calculation is now potentially costlier 
than the division itself.  This may not be a huge problem, but it is a shame, 
and it would be nice to mitigate it.

The costliest part is calculating the first digit of each operand - I 
personally think it would be acceptable to ignore this part of Postgres' 
calculation to avoid introducing the extra garbage and computation.

However, if we do want to retain this part, we should at least avoid performing 
it when we know it is of no value - i.e. when the scale of the result is such 
that we must ignore the contribution of these digits, because we know for sure 
the scale will be overridden by min or max; or when the scales are equal, we 
can perform a comparison of the two decimals themselves without extracting the 
first digit.

We can _probably_ also make this computation less costly by using 
{{BigDecimal.scaleByPowerOfTen}}, if we keep it.

What do you think?

> Arithmetic operators over decimal truncate results
> --
>
> Key: CASSANDRA-15232
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15232
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Benedict
>Assignee: Liudmila Kornilova
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The decimal operators hard-code a 128 bit precision for their computations.  
> Probably a precision needs to be configured or decided somehow, but it’s not 
> clear why 128bit was chosen.  Particularly for multiplication and addition, 
> it’s very unclear why we truncate, which is different to our behaviour for 
> e.g. sum() aggregates.  Probably for division we should also ensure that we 
> do not reduce the precision of the two operands.  A minimum of decimal128 
> seems reasonable, but a maximum does not.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15246) Add more information around commit message format expected for a patch

2019-08-06 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901195#comment-16901195
 ] 

Benedict commented on CASSANDRA-15246:
--

Looks reasonable, but I'm not sure where the docs and website stuff actually 
lives?  The patch doesn't apply to trunk, so after that I'm flummoxed.

I would have assumed it would be a modification to 
docs/source/development/how_to_commit.rst, but honestly I haven't a clue about 
this side of things.

> Add more information around commit message format expected for a patch
> --
>
> Key: CASSANDRA-15246
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15246
> Project: Cassandra
>  Issue Type: Task
>  Components: Documentation/Website
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 4.0
>
> Attachments: patch_commit_message.patch
>
>
> This is primarily from the suggestion 
> https://issues.apache.org/jira/browse/CASSANDRA-15013?focusedCommentId=16885255=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16885255,
>  to have the expected commit message format documented.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



  1   2   3   4   5   6   7   8   9   10   >