[jira] [Created] (CASSANDRA-13293) MV read-before-write can be omitted for some operations

2017-03-02 Thread Benjamin Roth (JIRA)
Benjamin Roth created CASSANDRA-13293:
-

 Summary: MV read-before-write can be omitted for some operations
 Key: CASSANDRA-13293
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13293
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benjamin Roth


A view that has the same fields in the primary key as its base table (i call it 
a congruent key), does not require read-before-writes except:

- Range deletes
- Partition deletes

If the view uses filters on non-pk columns either a rbw is required or a write 
that does not match the filter has to be turned into a delete. In doubt I'd 
stay with the current behaviour and to a rbw.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CASSANDRA-13067) Integer overflows with file system size reported by Amazon Elastic File System (EFS)

2017-03-02 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa reassigned CASSANDRA-13067:
--

Assignee: Matt Wringe

> Integer overflows with file system size reported by Amazon Elastic File 
> System (EFS)
> 
>
> Key: CASSANDRA-13067
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13067
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra in OpenShift running on Amazon EC2 instance 
> with EFS mounted for data
>Reporter: Michael Hanselmann
>Assignee: Matt Wringe
> Attachments: 0001-Handle-exabyte-sized-filesystems.patch
>
>
> When not explicitly configured Cassandra uses 
> [{{nio.FileStore.getTotalSpace}}|https://docs.oracle.com/javase/7/docs/api/java/nio/file/FileStore.html]
>  to determine the total amount of available space in order to [calculate the 
> preferred commit log 
> size|https://github.com/apache/cassandra/blob/cassandra-3.9/src/java/org/apache/cassandra/config/DatabaseDescriptor.java#L553].
>  [Amazon EFS|https://aws.amazon.com/efs/] instances report a filesystem size 
> of 8 EiB when empty. [{{getTotalSpace}} causes an integer overflow 
> (JDK-8162520)|https://bugs.openjdk.java.net/browse/JDK-8162520] and returns a 
> negative number, resulting in a negative preferred size and causing the 
> checked integer to throw.
> Overriding {{commitlog_total_space_in_mb}} is not sufficient as 
> [{{DataDirectory.getAvailableSpace}}|https://github.com/apache/cassandra/blob/cassandra-3.9/src/java/org/apache/cassandra/db/Directories.java#L550]
>  makes use of {{nio.FileStore.getUsableSpace}}.
> [AMQ-6441] is a comparable issue in ActiveMQ.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13234) Add histogram for delay to deliver hints

2017-03-02 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-13234:
---
Reviewer: Stefan Podkowinski

> Add histogram for delay to deliver hints
> 
>
> Key: CASSANDRA-13234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13234
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
>Priority: Minor
> Fix For: 3.0.x, 3.11.x
>
>
> There is very little visibility into hint delivery in general - having 
> histograms available to understand how long it takes to deliver hints is 
> useful for operators to better identify problems. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12676) Message coalescing regression

2017-03-02 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893769#comment-15893769
 ] 

Jeff Jirsa commented on CASSANDRA-12676:


Also [ninja'd in a supplemental 
commit|https://github.com/apache/cassandra/commit/ff170af2a2879e1b195f0ea9df540b83b07caad8]
 to update cassandra.yaml since this parameter was actually added/exposed in 
CASSANDRA-13090



> Message coalescing regression
> -
>
> Key: CASSANDRA-12676
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12676
> Project: Cassandra
>  Issue Type: Bug
>Reporter: T Jake Luciani
>Assignee: Jeff Jirsa
>  Labels: docs-impacting
> Fix For: 3.0.12, 3.11.0, 4.0
>
> Attachments: 12676.diff, coalescing_disabled.png, result.html
>
>
> The default in 2.2+ was to enable TIMEHORIZON message coalescing.  After 
> reports of performance regressions after upgrading from 2.1 to 2.2/3.0 we 
> have discovered the issue to be this default.
> We need to re-test our assumptions on this feature but in the meantime we 
> should default back to disabled.
> Here is a performance run [with and without message 
> coalescing|http://cstar.datastax.com/graph?command=one_job&stats=9a26b5f2-7f48-11e6-92e7-0256e416528f&metric=op_rate&operation=2_user&smoothing=1&show_aggregates=true&xmin=0&xmax=508.86&ymin=0&ymax=91223]
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[3/3] cassandra git commit: Merge branch 'cassandra-3.11' into trunk

2017-03-02 Thread jjirsa
Merge branch 'cassandra-3.11' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a49cf2c6
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a49cf2c6
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a49cf2c6

Branch: refs/heads/trunk
Commit: a49cf2c6ccc8f269858958c86e2437429cd5ca59
Parents: b300cc4 ff170af
Author: Jeff Jirsa 
Authored: Thu Mar 2 21:37:39 2017 -0800
Committer: Jeff Jirsa 
Committed: Thu Mar 2 21:38:14 2017 -0800

--
 conf/cassandra.yaml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/a49cf2c6/conf/cassandra.yaml
--



[1/3] cassandra git commit: Supplemental Ninja: Address message coalescing regression - Update yaml

2017-03-02 Thread jjirsa
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.11 10154272e -> ff170af2a
  refs/heads/trunk b300cc4d5 -> a49cf2c6c


Supplemental Ninja: Address message coalescing regression - Update yaml

Patch by Jeff Jirsa following CASSANDRA-12676


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ff170af2
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ff170af2
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ff170af2

Branch: refs/heads/cassandra-3.11
Commit: ff170af2a2879e1b195f0ea9df540b83b07caad8
Parents: 1015427
Author: Jeff Jirsa 
Authored: Thu Mar 2 21:36:34 2017 -0800
Committer: Jeff Jirsa 
Committed: Thu Mar 2 21:36:34 2017 -0800

--
 conf/cassandra.yaml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/ff170af2/conf/cassandra.yaml
--
diff --git a/conf/cassandra.yaml b/conf/cassandra.yaml
index 90e28b2..7e1c761 100644
--- a/conf/cassandra.yaml
+++ b/conf/cassandra.yaml
@@ -1209,9 +1209,9 @@ back_pressure_strategy:
 # See CASSANDRA-8692 for details.
 
 # Strategy to use for coalescing messages in OutboundTcpConnection.
-# Can be fixed, movingaverage, timehorizon (default), disabled.
+# Can be fixed, movingaverage, timehorizon, disabled (default).
 # You can also specify a subclass of CoalescingStrategies.CoalescingStrategy 
by name.
-# otc_coalescing_strategy: TIMEHORIZON
+# otc_coalescing_strategy: DISABLED
 
 # How many microseconds to wait for coalescing. For fixed strategy this is the 
amount of time after the first
 # message is received before it will be sent with any accompanying messages. 
For moving average this is the



[2/3] cassandra git commit: Supplemental Ninja: Address message coalescing regression - Update yaml

2017-03-02 Thread jjirsa
Supplemental Ninja: Address message coalescing regression - Update yaml

Patch by Jeff Jirsa following CASSANDRA-12676


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ff170af2
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ff170af2
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ff170af2

Branch: refs/heads/trunk
Commit: ff170af2a2879e1b195f0ea9df540b83b07caad8
Parents: 1015427
Author: Jeff Jirsa 
Authored: Thu Mar 2 21:36:34 2017 -0800
Committer: Jeff Jirsa 
Committed: Thu Mar 2 21:36:34 2017 -0800

--
 conf/cassandra.yaml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/ff170af2/conf/cassandra.yaml
--
diff --git a/conf/cassandra.yaml b/conf/cassandra.yaml
index 90e28b2..7e1c761 100644
--- a/conf/cassandra.yaml
+++ b/conf/cassandra.yaml
@@ -1209,9 +1209,9 @@ back_pressure_strategy:
 # See CASSANDRA-8692 for details.
 
 # Strategy to use for coalescing messages in OutboundTcpConnection.
-# Can be fixed, movingaverage, timehorizon (default), disabled.
+# Can be fixed, movingaverage, timehorizon, disabled (default).
 # You can also specify a subclass of CoalescingStrategies.CoalescingStrategy 
by name.
-# otc_coalescing_strategy: TIMEHORIZON
+# otc_coalescing_strategy: DISABLED
 
 # How many microseconds to wait for coalescing. For fixed strategy this is the 
amount of time after the first
 # message is received before it will be sent with any accompanying messages. 
For moving average this is the



[jira] [Updated] (CASSANDRA-12676) Message coalescing regression

2017-03-02 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-12676:
---
   Resolution: Fixed
Fix Version/s: (was: 3.11.x)
   3.11.0
   3.0.12
   Status: Resolved  (was: Patch Available)

Committed in {{e11f75081727e650353ac12ec50f508fa9387d60}} and merged up through 
trunk.

I've only changed NEWS.txt on 3.0:

{quote}
+   - In 2.1, the default for otc_coalescing_strategy was 'DISABLED'.
 + In 2.2 and 3.0, it was changed to 'TIMEHORIZON', but that value was shown
 + to be a performance regression. The default for 3.11.0 and newer has
 + been reverted to 'DISABLED'. Users upgrading to Cassandra 3.0 should
 + consider setting otc_coalescing_strategy to 'DISABLED'.
{quote}

And used a slightly different wording on 3.11 and trunk (in addition to 
actually changing the default:

{quote}
+   - In 2.1, the default for otc_coalescing_strategy was 'DISABLED'.
 + In 2.2 and 3.0, it was changed to 'TIMEHORIZON', but that value was shown
 + to be a performance regression. The default for 3.11.0 and newer has
 + been reverted to 'DISABLED'. Users upgrading from Cassandra 2.2 or 3.0 
should
 + be aware that the default has changed.
{quote}


> Message coalescing regression
> -
>
> Key: CASSANDRA-12676
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12676
> Project: Cassandra
>  Issue Type: Bug
>Reporter: T Jake Luciani
>Assignee: Jeff Jirsa
>  Labels: docs-impacting
> Fix For: 3.0.12, 3.11.0, 4.0
>
> Attachments: 12676.diff, coalescing_disabled.png, result.html
>
>
> The default in 2.2+ was to enable TIMEHORIZON message coalescing.  After 
> reports of performance regressions after upgrading from 2.1 to 2.2/3.0 we 
> have discovered the issue to be this default.
> We need to re-test our assumptions on this feature but in the meantime we 
> should default back to disabled.
> Here is a performance run [with and without message 
> coalescing|http://cstar.datastax.com/graph?command=one_job&stats=9a26b5f2-7f48-11e6-92e7-0256e416528f&metric=op_rate&operation=2_user&smoothing=1&show_aggregates=true&xmin=0&xmax=508.86&ymin=0&ymax=91223]
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[2/6] cassandra git commit: Address message coalescing regression

2017-03-02 Thread jjirsa
Address message coalescing regression

Patch by Jeff Jirsa; Reviewed by T Jake Luciani for CASSANDRA-12676


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e11f7508
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e11f7508
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e11f7508

Branch: refs/heads/cassandra-3.11
Commit: e11f75081727e650353ac12ec50f508fa9387d60
Parents: adbe2cc
Author: Jeff Jirsa 
Authored: Thu Mar 2 21:23:59 2017 -0800
Committer: Jeff Jirsa 
Committed: Thu Mar 2 21:23:59 2017 -0800

--
 NEWS.txt | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/e11f7508/NEWS.txt
--
diff --git a/NEWS.txt b/NEWS.txt
index 6088dcf..faba342 100644
--- a/NEWS.txt
+++ b/NEWS.txt
@@ -18,8 +18,11 @@ using the provided 'sstableupgrade' tool.
 
 Upgrading
 -
-   - Nothing specific to this release, but please see previous versions 
upgrading section,
- especially if you are upgrading from 2.2.
+   - In 2.1, the default for otc_coalescing_strategy was 'DISABLED'.
+ In 2.2 and 3.0, it was changed to 'TIMEHORIZON', but that value was shown
+ to be a performance regression. The default for 3.11.0 and newer has
+ been reverted to 'DISABLED'. Users upgrading to Cassandra 3.0 should
+ consider setting otc_coalescing_strategy to 'DISABLED'.
 
 3.0.11
 ==



[3/6] cassandra git commit: Address message coalescing regression

2017-03-02 Thread jjirsa
Address message coalescing regression

Patch by Jeff Jirsa; Reviewed by T Jake Luciani for CASSANDRA-12676


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e11f7508
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e11f7508
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e11f7508

Branch: refs/heads/trunk
Commit: e11f75081727e650353ac12ec50f508fa9387d60
Parents: adbe2cc
Author: Jeff Jirsa 
Authored: Thu Mar 2 21:23:59 2017 -0800
Committer: Jeff Jirsa 
Committed: Thu Mar 2 21:23:59 2017 -0800

--
 NEWS.txt | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/e11f7508/NEWS.txt
--
diff --git a/NEWS.txt b/NEWS.txt
index 6088dcf..faba342 100644
--- a/NEWS.txt
+++ b/NEWS.txt
@@ -18,8 +18,11 @@ using the provided 'sstableupgrade' tool.
 
 Upgrading
 -
-   - Nothing specific to this release, but please see previous versions 
upgrading section,
- especially if you are upgrading from 2.2.
+   - In 2.1, the default for otc_coalescing_strategy was 'DISABLED'.
+ In 2.2 and 3.0, it was changed to 'TIMEHORIZON', but that value was shown
+ to be a performance regression. The default for 3.11.0 and newer has
+ been reverted to 'DISABLED'. Users upgrading to Cassandra 3.0 should
+ consider setting otc_coalescing_strategy to 'DISABLED'.
 
 3.0.11
 ==



[4/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11

2017-03-02 Thread jjirsa
Merge branch 'cassandra-3.0' into cassandra-3.11


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/10154272
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/10154272
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/10154272

Branch: refs/heads/trunk
Commit: 10154272ede9b520ee12414aca9e150fa0f250b5
Parents: 6f9610d e11f750
Author: Jeff Jirsa 
Authored: Thu Mar 2 21:24:45 2017 -0800
Committer: Jeff Jirsa 
Committed: Thu Mar 2 21:26:34 2017 -0800

--
 CHANGES.txt  | 1 +
 NEWS.txt | 5 +
 src/java/org/apache/cassandra/config/Config.java | 2 +-
 3 files changed, 7 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/10154272/CHANGES.txt
--
diff --cc CHANGES.txt
index eeb2215,1c3869f..5de9ece
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,16 -1,6 +1,17 @@@
 -3.0.12
 +3.11.0
 + * Fix equality comparisons of columns using the duration type 
(CASSANDRA-13174)
 + * Obfuscate password in stress-graphs (CASSANDRA-12233)
 + * Move to FastThreadLocalThread and FastThreadLocal (CASSANDRA-13034)
 + * nodetool stopdaemon errors out (CASSANDRA-13030)
 + * Tables in system_distributed should not use gcgs of 0 (CASSANDRA-12954)
 + * Fix primary index calculation for SASI (CASSANDRA-12910)
 + * More fixes to the TokenAllocator (CASSANDRA-12990)
 + * NoReplicationTokenAllocator should work with zero replication factor 
(CASSANDRA-12983)
++ * Address message coalescing regression (CASSANDRA-12676)
 +Merged from 3.0:
   * Cqlsh copy-from should error out when csv contains invalid data for 
collections (CASSANDRA-13071)
 - * Update c.yaml doc for offheap memtables (CASSANDRA-13179)
 + * Fix "multiple versions of ant detected..." when running ant test 
(CASSANDRA-13232)
 + * Coalescing strategy sleeps too much (CASSANDRA-13090)
   * Faster StreamingHistogram (CASSANDRA-13038)
   * Legacy deserializer can create unexpected boundary range tombstones 
(CASSANDRA-13237)
   * Remove unnecessary assertion from AntiCompactionTest (CASSANDRA-13070)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/10154272/NEWS.txt
--
diff --cc NEWS.txt
index fc27526,faba342..4c2e217
--- a/NEWS.txt
+++ b/NEWS.txt
@@@ -19,139 -29,32 +19,144 @@@ using the provided 'sstableupgrade' too
  
  Upgrading
  -
 -   - Support for alter types of already defined tables and of UDTs fields has 
been disabled.
 - If it is necessary to return a different type, please use casting 
instead. See
 - CASSANDRA-12443 for more details.
 -   - Specifying the default_time_to_live option when creating or altering a
 - materialized view was erroneously accepted (and ignored). It is now
 - properly rejected.
 -   - Only Java and JavaScript are now supported UDF languages.
 - The sandbox in 3.0 already prevented the use of script languages except 
Java
 - and JavaScript.
 -   - Compaction now correctly drops sstables out of CompactionTask when there
 - isn't enough disk space to perform the full compaction.  This should 
reduce
 - pending compaction tasks on systems with little remaining disk space.
 - Primary ranges in the system.size_estimates table are now based on the 
keyspace
   replication settings and adjacent ranges are no longer merged 
(CASSANDRA-9639).
++   - In 2.1, the default for otc_coalescing_strategy was 'DISABLED'.
++ In 2.2 and 3.0, it was changed to 'TIMEHORIZON', but that value was shown
++ to be a performance regression. The default for 3.11.0 and newer has
++ been reverted to 'DISABLED'. Users upgrading from Cassandra 2.2 or 3.0 
should
++ be aware that the default has changed.
  
 -3.0.10
 -==
 +3.10
 +
  
 -Upgrading
 --
 -   - memtable_allocation_type: offheap_buffers is no longer allowed to be 
specified in the 3.0 series.
 - This was an oversight that can cause segfaults. Offheap was 
re-introduced in 3.4 see CASSANDRA-11039
 - and CASSANDRA-9472 for details.
 +New features
 +
 +   - New `DurationType` (cql duration). See CASSANDRA-11873
 +   - Runtime modification of concurrent_compactors is now available via 
nodetool
 +   - Support for the assignment operators +=/-= has been added for update 
queries.
 +   - An Index implementation may now provide a task which runs prior to 
joining
 + the ring. See CASSANDRA-12039
 +   - Filtering on partition key columns is now also supported for queries 
without
 + secondary indexes.
 +   - A slow query log has been added: slow queries will be logged at DEBUG 
level.
 + For more details refer to CASSANDRA-12403 and 
slow_query_l

[6/6] cassandra git commit: Merge branch 'cassandra-3.11' into trunk

2017-03-02 Thread jjirsa
Merge branch 'cassandra-3.11' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b300cc4d
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b300cc4d
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b300cc4d

Branch: refs/heads/trunk
Commit: b300cc4d5fa11606b96d28b4311dcadd22f44080
Parents: 387ba4f 1015427
Author: Jeff Jirsa 
Authored: Thu Mar 2 21:26:47 2017 -0800
Committer: Jeff Jirsa 
Committed: Thu Mar 2 21:28:02 2017 -0800

--
 CHANGES.txt  | 1 +
 NEWS.txt | 5 +
 src/java/org/apache/cassandra/config/Config.java | 2 +-
 3 files changed, 7 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/b300cc4d/CHANGES.txt
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/b300cc4d/NEWS.txt
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/b300cc4d/src/java/org/apache/cassandra/config/Config.java
--



[5/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11

2017-03-02 Thread jjirsa
Merge branch 'cassandra-3.0' into cassandra-3.11


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/10154272
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/10154272
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/10154272

Branch: refs/heads/cassandra-3.11
Commit: 10154272ede9b520ee12414aca9e150fa0f250b5
Parents: 6f9610d e11f750
Author: Jeff Jirsa 
Authored: Thu Mar 2 21:24:45 2017 -0800
Committer: Jeff Jirsa 
Committed: Thu Mar 2 21:26:34 2017 -0800

--
 CHANGES.txt  | 1 +
 NEWS.txt | 5 +
 src/java/org/apache/cassandra/config/Config.java | 2 +-
 3 files changed, 7 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/10154272/CHANGES.txt
--
diff --cc CHANGES.txt
index eeb2215,1c3869f..5de9ece
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,16 -1,6 +1,17 @@@
 -3.0.12
 +3.11.0
 + * Fix equality comparisons of columns using the duration type 
(CASSANDRA-13174)
 + * Obfuscate password in stress-graphs (CASSANDRA-12233)
 + * Move to FastThreadLocalThread and FastThreadLocal (CASSANDRA-13034)
 + * nodetool stopdaemon errors out (CASSANDRA-13030)
 + * Tables in system_distributed should not use gcgs of 0 (CASSANDRA-12954)
 + * Fix primary index calculation for SASI (CASSANDRA-12910)
 + * More fixes to the TokenAllocator (CASSANDRA-12990)
 + * NoReplicationTokenAllocator should work with zero replication factor 
(CASSANDRA-12983)
++ * Address message coalescing regression (CASSANDRA-12676)
 +Merged from 3.0:
   * Cqlsh copy-from should error out when csv contains invalid data for 
collections (CASSANDRA-13071)
 - * Update c.yaml doc for offheap memtables (CASSANDRA-13179)
 + * Fix "multiple versions of ant detected..." when running ant test 
(CASSANDRA-13232)
 + * Coalescing strategy sleeps too much (CASSANDRA-13090)
   * Faster StreamingHistogram (CASSANDRA-13038)
   * Legacy deserializer can create unexpected boundary range tombstones 
(CASSANDRA-13237)
   * Remove unnecessary assertion from AntiCompactionTest (CASSANDRA-13070)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/10154272/NEWS.txt
--
diff --cc NEWS.txt
index fc27526,faba342..4c2e217
--- a/NEWS.txt
+++ b/NEWS.txt
@@@ -19,139 -29,32 +19,144 @@@ using the provided 'sstableupgrade' too
  
  Upgrading
  -
 -   - Support for alter types of already defined tables and of UDTs fields has 
been disabled.
 - If it is necessary to return a different type, please use casting 
instead. See
 - CASSANDRA-12443 for more details.
 -   - Specifying the default_time_to_live option when creating or altering a
 - materialized view was erroneously accepted (and ignored). It is now
 - properly rejected.
 -   - Only Java and JavaScript are now supported UDF languages.
 - The sandbox in 3.0 already prevented the use of script languages except 
Java
 - and JavaScript.
 -   - Compaction now correctly drops sstables out of CompactionTask when there
 - isn't enough disk space to perform the full compaction.  This should 
reduce
 - pending compaction tasks on systems with little remaining disk space.
 - Primary ranges in the system.size_estimates table are now based on the 
keyspace
   replication settings and adjacent ranges are no longer merged 
(CASSANDRA-9639).
++   - In 2.1, the default for otc_coalescing_strategy was 'DISABLED'.
++ In 2.2 and 3.0, it was changed to 'TIMEHORIZON', but that value was shown
++ to be a performance regression. The default for 3.11.0 and newer has
++ been reverted to 'DISABLED'. Users upgrading from Cassandra 2.2 or 3.0 
should
++ be aware that the default has changed.
  
 -3.0.10
 -==
 +3.10
 +
  
 -Upgrading
 --
 -   - memtable_allocation_type: offheap_buffers is no longer allowed to be 
specified in the 3.0 series.
 - This was an oversight that can cause segfaults. Offheap was 
re-introduced in 3.4 see CASSANDRA-11039
 - and CASSANDRA-9472 for details.
 +New features
 +
 +   - New `DurationType` (cql duration). See CASSANDRA-11873
 +   - Runtime modification of concurrent_compactors is now available via 
nodetool
 +   - Support for the assignment operators +=/-= has been added for update 
queries.
 +   - An Index implementation may now provide a task which runs prior to 
joining
 + the ring. See CASSANDRA-12039
 +   - Filtering on partition key columns is now also supported for queries 
without
 + secondary indexes.
 +   - A slow query log has been added: slow queries will be logged at DEBUG 
level.
 + For more details refer to CASSANDRA-12403 and 
slo

[1/6] cassandra git commit: Address message coalescing regression

2017-03-02 Thread jjirsa
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.0 adbe2cc4d -> e11f75081
  refs/heads/cassandra-3.11 6f9610d47 -> 10154272e
  refs/heads/trunk 387ba4f30 -> b300cc4d5


Address message coalescing regression

Patch by Jeff Jirsa; Reviewed by T Jake Luciani for CASSANDRA-12676


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e11f7508
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e11f7508
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e11f7508

Branch: refs/heads/cassandra-3.0
Commit: e11f75081727e650353ac12ec50f508fa9387d60
Parents: adbe2cc
Author: Jeff Jirsa 
Authored: Thu Mar 2 21:23:59 2017 -0800
Committer: Jeff Jirsa 
Committed: Thu Mar 2 21:23:59 2017 -0800

--
 NEWS.txt | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/e11f7508/NEWS.txt
--
diff --git a/NEWS.txt b/NEWS.txt
index 6088dcf..faba342 100644
--- a/NEWS.txt
+++ b/NEWS.txt
@@ -18,8 +18,11 @@ using the provided 'sstableupgrade' tool.
 
 Upgrading
 -
-   - Nothing specific to this release, but please see previous versions 
upgrading section,
- especially if you are upgrading from 2.2.
+   - In 2.1, the default for otc_coalescing_strategy was 'DISABLED'.
+ In 2.2 and 3.0, it was changed to 'TIMEHORIZON', but that value was shown
+ to be a performance regression. The default for 3.11.0 and newer has
+ been reverted to 'DISABLED'. Users upgrading to Cassandra 3.0 should
+ consider setting otc_coalescing_strategy to 'DISABLED'.
 
 3.0.11
 ==



cassandra git commit: close streams

2017-03-02 Thread dbrosius
Repository: cassandra
Updated Branches:
  refs/heads/trunk a7c9fa0f1 -> 387ba4f30


close streams


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/387ba4f3
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/387ba4f3
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/387ba4f3

Branch: refs/heads/trunk
Commit: 387ba4f309d38223769a3d9501f6baada4ded8bf
Parents: a7c9fa0
Author: Dave Brosius 
Authored: Thu Mar 2 21:22:11 2017 -0500
Committer: Dave Brosius 
Committed: Thu Mar 2 21:22:11 2017 -0500

--
 tools/stress/src/org/apache/cassandra/stress/StressGraph.java | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/387ba4f3/tools/stress/src/org/apache/cassandra/stress/StressGraph.java
--
diff --git a/tools/stress/src/org/apache/cassandra/stress/StressGraph.java 
b/tools/stress/src/org/apache/cassandra/stress/StressGraph.java
index 663bde6..6729a28 100644
--- a/tools/stress/src/org/apache/cassandra/stress/StressGraph.java
+++ b/tools/stress/src/org/apache/cassandra/stress/StressGraph.java
@@ -110,17 +110,14 @@ public class StressGraph
 
 private String getGraphHTML()
 {
-InputStream graphHTMLRes = 
StressGraph.class.getClassLoader().getResourceAsStream("org/apache/cassandra/stress/graph/graph.html");
-String graphHTML;
-try
+try (InputStream graphHTMLRes = 
StressGraph.class.getClassLoader().getResourceAsStream("org/apache/cassandra/stress/graph/graph.html"))
 {
-graphHTML = new String(ByteStreams.toByteArray(graphHTMLRes));
+return new String(ByteStreams.toByteArray(graphHTMLRes));
 }
 catch (IOException e)
 {
 throw new RuntimeException(e);
 }
-return graphHTML;
 }
 
 /** Parse log and append to stats array */



[jira] [Comment Edited] (CASSANDRA-13234) Add histogram for delay to deliver hints

2017-03-02 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893556#comment-15893556
 ] 

Jeff Jirsa edited comment on CASSANDRA-13234 at 3/3/17 1:49 AM:


Force pushed to all three branches with the changes requested from both 
[~spo...@gmail.com] and [~iamaleksey] - Aleksey, I've pushed the metrics into 
{{HintsServiceMetrics}} for 3.11 and trunk only, but not 3.0. If you feel 
strongly that they need to go into HintsServiceMetrics for 3.0, I'm happy to 
move it, but it's pretty stark in 3.0.




was (Author: jjirsa):
Force pushed to all three branches with the changes requested from both 
[~spo...@gmail.com] and [~iamaleksey] - Aleksey, I've pushed the metrics into 
{{HintsServiceMetrics}} for 3.11 and trunk only, but not 3.0.


> Add histogram for delay to deliver hints
> 
>
> Key: CASSANDRA-13234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13234
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
>Priority: Minor
> Fix For: 3.0.x, 3.11.x
>
>
> There is very little visibility into hint delivery in general - having 
> histograms available to understand how long it takes to deliver hints is 
> useful for operators to better identify problems. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13234) Add histogram for delay to deliver hints

2017-03-02 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893556#comment-15893556
 ] 

Jeff Jirsa commented on CASSANDRA-13234:


Force pushed to all three branches with the changes requested from both 
[~spo...@gmail.com] and [~iamaleksey] - Aleksey, I've pushed the metrics into 
{{HintsServiceMetrics}} for 3.11 and trunk only, but not 3.0.


> Add histogram for delay to deliver hints
> 
>
> Key: CASSANDRA-13234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13234
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
>Priority: Minor
> Fix For: 3.0.x, 3.11.x
>
>
> There is very little visibility into hint delivery in general - having 
> histograms available to understand how long it takes to deliver hints is 
> useful for operators to better identify problems. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13289) Make it possible to monitor an ideal consistency level separate from actual consistency level

2017-03-02 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-13289:
---
Description: 
As an operator there are several issues related to multi-datacenter replication 
and consistency you may want to have more information on from your production 
database.

For instance. If your application writes at LOCAL_QUORUM how often are those 
writes failing to achieve EACH_QUORUM at other data centers. If you failed your 
application over to one of those data centers roughly how inconsistent might it 
be given the number of writes that didn't propagate since the last incremental 
repair?

You might also want to know roughly what the latency of writes would be if you 
switched to a different consistency level. For instance you are writing at 
LOCAL_QUORUM and want to know what would happen if you switched to EACH_QUORUM.

The proposed change is to allow an ideal_consistency_level to be specified in 
cassandra.yaml as well as get/set via JMX. If no ideal consistency level is 
specified no additional tracking is done.

if an ideal consistency level is specified then the 
{{AbstractWriteResponesHandler}} will contain a delegate WriteResponseHandler 
that tracks whether the ideal consistency level is met before a write times 
out. It also tracks the latency for achieving the ideal CL  of successful 
writes.

These two metrics would be reported on a per keyspace basis.

  was:
As an operator there are several issues related to multi-datacenter replication 
and consistency you may want to have more information on from your production 
database.

For instance. If your application writes at LOCAL_QUORUM how often are those 
writes failing to achieve EACH_QUORUM at other data centers. If you failed your 
application over to one of those data centers roughly how inconsistent might it 
be given the number of writes that didn't propagate since the last incremental 
repair?

You might also want to know roughly what the latency of writes would be if 
switch to a different consistency level. For instance you are writing at 
LOCAL_QUORUM and want to know what would happen if you switched to EACH_QUORUM.

The proposed change is to allow an ideal_consistency_level to be specified in 
cassandra.yaml as well as get/set via JMX. If no ideal consistency level is 
specified no additional tracking is done.

if an ideal consistency level is specified then the 
{{AbstractWriteResponesHandler}} will contain a delegate WriteResponseHandler 
that tracks whether the ideal consistency level is met before a write times 
out. It also tracks the latency for achieving the ideal CL  of successful 
writes on.

These two metrics would be reported on a per Keyspace basis.


> Make it possible to monitor an ideal consistency level separate from actual 
> consistency level
> -
>
> Key: CASSANDRA-13289
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13289
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>
> As an operator there are several issues related to multi-datacenter 
> replication and consistency you may want to have more information on from 
> your production database.
> For instance. If your application writes at LOCAL_QUORUM how often are those 
> writes failing to achieve EACH_QUORUM at other data centers. If you failed 
> your application over to one of those data centers roughly how inconsistent 
> might it be given the number of writes that didn't propagate since the last 
> incremental repair?
> You might also want to know roughly what the latency of writes would be if 
> you switched to a different consistency level. For instance you are writing 
> at LOCAL_QUORUM and want to know what would happen if you switched to 
> EACH_QUORUM.
> The proposed change is to allow an ideal_consistency_level to be specified in 
> cassandra.yaml as well as get/set via JMX. If no ideal consistency level is 
> specified no additional tracking is done.
> if an ideal consistency level is specified then the 
> {{AbstractWriteResponesHandler}} will contain a delegate WriteResponseHandler 
> that tracks whether the ideal consistency level is met before a write times 
> out. It also tracks the latency for achieving the ideal CL  of successful 
> writes.
> These two metrics would be reported on a per keyspace basis.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13233) Improve testing on macOS by eliminating sigar logging

2017-03-02 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893232#comment-15893232
 ] 

Jason Brown commented on CASSANDRA-13233:
-

Alright cool, looks like those changes came in CASSANDRA-12342. As I'm planning 
to fix this for 3.0+, in 3.0 we'll leave it as the functions keep fetching the 
{{Field}} ref on every invocation, and 3.11/trunk we'll use the class member 
constants. I'll update branches for cassci and rerun the tests shortly.

> Improve testing on macOS by eliminating sigar logging
> -
>
> Key: CASSANDRA-13233
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13233
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
> Attachments: 28827709.diff, CASSANDRA-13233-trunk-v2.diff
>
>
> The changes introduced in CASSANDRA-7838 (Resolved; Fixed; 2.2.0 beta 1): 
> "Warn user when OS settings are poor / integrate sigar" are not Mac friendly.
> {code}
> INFO  [main] 2016-10-18T11:20:10,330 SigarLibrary.java:44 - Initializing 
> SIGAR library
> DEBUG [main] 2016-10-18T11:20:10,342 SigarLog.java:60 - no 
> libsigar-universal64-macosx.dylib in java.library.path
> org.hyperic.sigar.SigarException: no libsigar-universal64-macosx.dylib in 
> java.library.path
> at org.hyperic.sigar.Sigar.loadLibrary(Sigar.java:172) 
> ~[sigar-1.6.4.jar:?]
> at org.hyperic.sigar.Sigar.(Sigar.java:100) 
> [sigar-1.6.4.jar:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:47) [main/:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:28) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:363) [main/:?]
> at org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:342) 
> [main/:?]
> at 
> org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:291) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.(UUIDGen.java:42) 
> [main/:?]
> at 
> org.apache.cassandra.config.CFMetaData$Builder.build(CFMetaData.java:1278) 
> [main/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:369) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:356) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:351) 
> [classes/:?]
> at 
> org.apache.cassandra.batchlog.BatchTest.defineSchema(BatchTest.java:59) 
> [classes/:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_66]
> at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_66]
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>  [junit-4.6.jar:?]
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) 
> [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) 
> [junit-4.6.jar:?]
> at org.junit.runners.ParentRunner.run(ParentRunner.java:220) 
> [junit-4.6.jar:?]
> at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) 
> [junit-4.6.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033)
>  [ant-junit.jar:?]
> INFO  [main] 2016-10-18T11:20:10,350 SigarLibrary.java:57 - Could not 
> initialize SIGAR library 
> org.hyperic.sigar.Sigar.getFileSystemListNative()[Lorg/hyperic/sigar/FileSystem;
> {code}
> There are 2 issues addressed by the attached patch:
> # Create platform aware (windows, Darwin, linux) implementations of CLibrary 
> (for instance CLibrary today assumes all platforms have support for 
> posix_fadvise but this doesn't exist in the Darwin kernel). If methods are 
> defined with the "native" JNI keyword in java when the class is loaded it 
> will cause our jna check to fail incorrectly making all of CLibrary 
> "disabled" even though because jnaAvailable = false even though on a platform 
> like Darwin all of t

[jira] [Updated] (CASSANDRA-13233) Improve testing on macOS by eliminating sigar logging

2017-03-02 Thread Michael Kjellman (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Kjellman updated CASSANDRA-13233:
-
Attachment: CASSANDRA-13233-trunk-v2.diff

> Improve testing on macOS by eliminating sigar logging
> -
>
> Key: CASSANDRA-13233
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13233
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
> Attachments: 28827709.diff, CASSANDRA-13233-trunk-v2.diff
>
>
> The changes introduced in CASSANDRA-7838 (Resolved; Fixed; 2.2.0 beta 1): 
> "Warn user when OS settings are poor / integrate sigar" are not Mac friendly.
> {code}
> INFO  [main] 2016-10-18T11:20:10,330 SigarLibrary.java:44 - Initializing 
> SIGAR library
> DEBUG [main] 2016-10-18T11:20:10,342 SigarLog.java:60 - no 
> libsigar-universal64-macosx.dylib in java.library.path
> org.hyperic.sigar.SigarException: no libsigar-universal64-macosx.dylib in 
> java.library.path
> at org.hyperic.sigar.Sigar.loadLibrary(Sigar.java:172) 
> ~[sigar-1.6.4.jar:?]
> at org.hyperic.sigar.Sigar.(Sigar.java:100) 
> [sigar-1.6.4.jar:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:47) [main/:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:28) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:363) [main/:?]
> at org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:342) 
> [main/:?]
> at 
> org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:291) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.(UUIDGen.java:42) 
> [main/:?]
> at 
> org.apache.cassandra.config.CFMetaData$Builder.build(CFMetaData.java:1278) 
> [main/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:369) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:356) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:351) 
> [classes/:?]
> at 
> org.apache.cassandra.batchlog.BatchTest.defineSchema(BatchTest.java:59) 
> [classes/:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_66]
> at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_66]
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>  [junit-4.6.jar:?]
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) 
> [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) 
> [junit-4.6.jar:?]
> at org.junit.runners.ParentRunner.run(ParentRunner.java:220) 
> [junit-4.6.jar:?]
> at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) 
> [junit-4.6.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033)
>  [ant-junit.jar:?]
> INFO  [main] 2016-10-18T11:20:10,350 SigarLibrary.java:57 - Could not 
> initialize SIGAR library 
> org.hyperic.sigar.Sigar.getFileSystemListNative()[Lorg/hyperic/sigar/FileSystem;
> {code}
> There are 2 issues addressed by the attached patch:
> # Create platform aware (windows, Darwin, linux) implementations of CLibrary 
> (for instance CLibrary today assumes all platforms have support for 
> posix_fadvise but this doesn't exist in the Darwin kernel). If methods are 
> defined with the "native" JNI keyword in java when the class is loaded it 
> will cause our jna check to fail incorrectly making all of CLibrary 
> "disabled" even though because jnaAvailable = false even though on a platform 
> like Darwin all of the native methods except posix_fadvise are supported.
> # Replace sigar usage to get current pid with calls to CLibrary/native 
> equivalent -- and fall back to Sigar for platforms like Windows who don't 
> have that support with JDK8 (and without a CLibrary equivalent)



--
This message was sent by Atlassian J

[jira] [Commented] (CASSANDRA-13233) Improve testing on macOS by eliminating sigar logging

2017-03-02 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893204#comment-15893204
 ] 

Michael Kjellman commented on CASSANDRA-13233:
--

Attaching an updated patch to add the class member constants back. Looks like 
this was changed to use a static constant vs calling getField each time in some 
3.x version -- which was after I did the initial patch back on 2.1 and I just 
didn't catch it when I rebased the patch for trunk.

> Improve testing on macOS by eliminating sigar logging
> -
>
> Key: CASSANDRA-13233
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13233
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
> Attachments: 28827709.diff, CASSANDRA-13233-trunk-v2.diff
>
>
> The changes introduced in CASSANDRA-7838 (Resolved; Fixed; 2.2.0 beta 1): 
> "Warn user when OS settings are poor / integrate sigar" are not Mac friendly.
> {code}
> INFO  [main] 2016-10-18T11:20:10,330 SigarLibrary.java:44 - Initializing 
> SIGAR library
> DEBUG [main] 2016-10-18T11:20:10,342 SigarLog.java:60 - no 
> libsigar-universal64-macosx.dylib in java.library.path
> org.hyperic.sigar.SigarException: no libsigar-universal64-macosx.dylib in 
> java.library.path
> at org.hyperic.sigar.Sigar.loadLibrary(Sigar.java:172) 
> ~[sigar-1.6.4.jar:?]
> at org.hyperic.sigar.Sigar.(Sigar.java:100) 
> [sigar-1.6.4.jar:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:47) [main/:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:28) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:363) [main/:?]
> at org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:342) 
> [main/:?]
> at 
> org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:291) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.(UUIDGen.java:42) 
> [main/:?]
> at 
> org.apache.cassandra.config.CFMetaData$Builder.build(CFMetaData.java:1278) 
> [main/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:369) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:356) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:351) 
> [classes/:?]
> at 
> org.apache.cassandra.batchlog.BatchTest.defineSchema(BatchTest.java:59) 
> [classes/:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_66]
> at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_66]
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>  [junit-4.6.jar:?]
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) 
> [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) 
> [junit-4.6.jar:?]
> at org.junit.runners.ParentRunner.run(ParentRunner.java:220) 
> [junit-4.6.jar:?]
> at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) 
> [junit-4.6.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033)
>  [ant-junit.jar:?]
> INFO  [main] 2016-10-18T11:20:10,350 SigarLibrary.java:57 - Could not 
> initialize SIGAR library 
> org.hyperic.sigar.Sigar.getFileSystemListNative()[Lorg/hyperic/sigar/FileSystem;
> {code}
> There are 2 issues addressed by the attached patch:
> # Create platform aware (windows, Darwin, linux) implementations of CLibrary 
> (for instance CLibrary today assumes all platforms have support for 
> posix_fadvise but this doesn't exist in the Darwin kernel). If methods are 
> defined with the "native" JNI keyword in java when the class is loaded it 
> will cause our jna check to fail incorrectly making all of CLibrary 
> "disabled" even though because jnaAvailable = false even though on a platform 
> like Darwin all of the native 

[jira] [Commented] (CASSANDRA-13233) Improve testing on macOS by eliminating sigar logging

2017-03-02 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893189#comment-15893189
 ] 

Michael Kjellman commented on CASSANDRA-13233:
--

No, I don't know why I did that looking at it now. Weird I'd do that though... 
Testing it now...

> Improve testing on macOS by eliminating sigar logging
> -
>
> Key: CASSANDRA-13233
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13233
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
> Attachments: 28827709.diff
>
>
> The changes introduced in CASSANDRA-7838 (Resolved; Fixed; 2.2.0 beta 1): 
> "Warn user when OS settings are poor / integrate sigar" are not Mac friendly.
> {code}
> INFO  [main] 2016-10-18T11:20:10,330 SigarLibrary.java:44 - Initializing 
> SIGAR library
> DEBUG [main] 2016-10-18T11:20:10,342 SigarLog.java:60 - no 
> libsigar-universal64-macosx.dylib in java.library.path
> org.hyperic.sigar.SigarException: no libsigar-universal64-macosx.dylib in 
> java.library.path
> at org.hyperic.sigar.Sigar.loadLibrary(Sigar.java:172) 
> ~[sigar-1.6.4.jar:?]
> at org.hyperic.sigar.Sigar.(Sigar.java:100) 
> [sigar-1.6.4.jar:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:47) [main/:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:28) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:363) [main/:?]
> at org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:342) 
> [main/:?]
> at 
> org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:291) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.(UUIDGen.java:42) 
> [main/:?]
> at 
> org.apache.cassandra.config.CFMetaData$Builder.build(CFMetaData.java:1278) 
> [main/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:369) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:356) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:351) 
> [classes/:?]
> at 
> org.apache.cassandra.batchlog.BatchTest.defineSchema(BatchTest.java:59) 
> [classes/:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_66]
> at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_66]
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>  [junit-4.6.jar:?]
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) 
> [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) 
> [junit-4.6.jar:?]
> at org.junit.runners.ParentRunner.run(ParentRunner.java:220) 
> [junit-4.6.jar:?]
> at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) 
> [junit-4.6.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033)
>  [ant-junit.jar:?]
> INFO  [main] 2016-10-18T11:20:10,350 SigarLibrary.java:57 - Could not 
> initialize SIGAR library 
> org.hyperic.sigar.Sigar.getFileSystemListNative()[Lorg/hyperic/sigar/FileSystem;
> {code}
> There are 2 issues addressed by the attached patch:
> # Create platform aware (windows, Darwin, linux) implementations of CLibrary 
> (for instance CLibrary today assumes all platforms have support for 
> posix_fadvise but this doesn't exist in the Darwin kernel). If methods are 
> defined with the "native" JNI keyword in java when the class is loaded it 
> will cause our jna check to fail incorrectly making all of CLibrary 
> "disabled" even though because jnaAvailable = false even though on a platform 
> like Darwin all of the native methods except posix_fadvise are supported.
> # Replace sigar usage to get current pid with calls to CLibrary/native 
> equivalent -- and fall back to Sigar for platforms like Windows who don't 
> have that support with JDK8

[jira] [Commented] (CASSANDRA-13271) Reduce lock contention on instance factories of ListType and SetType

2017-03-02 Thread vincent royer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893183#comment-15893183
 ] 

vincent royer commented on CASSANDRA-13271:
---

Here is the branch including the patch (i hope that's good ;-)  
https://github.com/strapdata/cassandra/tree/13271-trunk
I noticed a contention from UntypedResultSet.Row.getList() when concurrent 
threads were reading rows in Elassandra.

> Reduce lock contention on instance factories of ListType and SetType
> 
>
> Key: CASSANDRA-13271
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13271
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: vincent royer
>Priority: Minor
>  Labels: performance
> Fix For: 4.x
>
> Attachments: 0001-CASSANDRA-13271-computeIfAbsent.patch, 
> 0001-CASSANDRA-13271-singleton-factory-concurrency-opimiz.patch
>
>
> By doing some performance tests, i noticed that getInstance() in 
> org.apache.cassandra.db.marshal.ListType and SetType could suffer from lock 
> contention on the singleton factory getInstance(). Here is a proposal to 
> reduce lock contention by using a ConcurrentMap and the putIfAbsent method 
> rather than a synchronized method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13233) Improve testing on macOS by eliminating sigar logging

2017-03-02 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893167#comment-15893167
 ] 

Jason Brown commented on CASSANDRA-13233:
-

On the whole, this is pretty good, but why did you remove the class member 
constants {{FILE_DESCRIPTOR_FD_FIELD}} and {{FILE_CHANNEL_FD_FIELD}} from 
{{CLIbrary}}? The code now needs to get the protected field on every invocation 
of {{getfd(FileChannel)}} and  {{getfd(FileDescriptor)}}. Is there something 
inherently more safe in fetching the reference to the {{Field}} on every 
invocation?

> Improve testing on macOS by eliminating sigar logging
> -
>
> Key: CASSANDRA-13233
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13233
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
> Attachments: 28827709.diff
>
>
> The changes introduced in CASSANDRA-7838 (Resolved; Fixed; 2.2.0 beta 1): 
> "Warn user when OS settings are poor / integrate sigar" are not Mac friendly.
> {code}
> INFO  [main] 2016-10-18T11:20:10,330 SigarLibrary.java:44 - Initializing 
> SIGAR library
> DEBUG [main] 2016-10-18T11:20:10,342 SigarLog.java:60 - no 
> libsigar-universal64-macosx.dylib in java.library.path
> org.hyperic.sigar.SigarException: no libsigar-universal64-macosx.dylib in 
> java.library.path
> at org.hyperic.sigar.Sigar.loadLibrary(Sigar.java:172) 
> ~[sigar-1.6.4.jar:?]
> at org.hyperic.sigar.Sigar.(Sigar.java:100) 
> [sigar-1.6.4.jar:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:47) [main/:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:28) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:363) [main/:?]
> at org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:342) 
> [main/:?]
> at 
> org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:291) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.(UUIDGen.java:42) 
> [main/:?]
> at 
> org.apache.cassandra.config.CFMetaData$Builder.build(CFMetaData.java:1278) 
> [main/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:369) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:356) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:351) 
> [classes/:?]
> at 
> org.apache.cassandra.batchlog.BatchTest.defineSchema(BatchTest.java:59) 
> [classes/:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_66]
> at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_66]
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>  [junit-4.6.jar:?]
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) 
> [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) 
> [junit-4.6.jar:?]
> at org.junit.runners.ParentRunner.run(ParentRunner.java:220) 
> [junit-4.6.jar:?]
> at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) 
> [junit-4.6.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033)
>  [ant-junit.jar:?]
> INFO  [main] 2016-10-18T11:20:10,350 SigarLibrary.java:57 - Could not 
> initialize SIGAR library 
> org.hyperic.sigar.Sigar.getFileSystemListNative()[Lorg/hyperic/sigar/FileSystem;
> {code}
> There are 2 issues addressed by the attached patch:
> # Create platform aware (windows, Darwin, linux) implementations of CLibrary 
> (for instance CLibrary today assumes all platforms have support for 
> posix_fadvise but this doesn't exist in the Darwin kernel). If methods are 
> defined with the "native" JNI keyword in java when the class is loaded it 
> will cause our jna check to fail incorrectly making all of CLibrary 
> "disabled" even though because jnaAvailable = f

[jira] [Commented] (CASSANDRA-13291) Replace usages of MessageDigest with Guava's Hasher

2017-03-02 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893149#comment-15893149
 ] 

Michael Kjellman commented on CASSANDRA-13291:
--

Created CASSANDRA-13292 to track the actual switching from MD5 -> something 
else pending on this ticket.

> Replace usages of MessageDigest with Guava's Hasher
> ---
>
> Key: CASSANDRA-13291
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13291
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
> Attachments: CASSANDRA-13291-trunk.diff
>
>
> During my profiling of C* I frequently see lots of aggregate time across 
> threads being spent inside the MD5 MessageDigest implementation. Given that 
> there are tons of modern alternative hashing functions better than MD5 
> available -- both in terms of providing better collision resistance and 
> actual computational speed -- I wanted to switch out our usage of MD5 for 
> alternatives (like adler128 or murmur3_128) and test for performance 
> improvements.
> Unfortunately, I found given the fact we use MessageDigest everywhere --  
> switching out the hashing function to something like adler128 or murmur3_128 
> (for example) -- which don't ship with the JDK --  wasn't straight forward.
> The goal of this ticket is to propose switching out usages of MessageDigest 
> directly in favor of Hasher from Guava. This means going forward we can 
> change a single line of code to switch the hashing algorithm being used 
> (assuming there is an implementation in Guava).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13292) Replace MessagingService usage of MD5 with something more modern

2017-03-02 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893145#comment-15893145
 ] 

Michael Kjellman commented on CASSANDRA-13292:
--

I have a patch for this as a separate commit on top of CASSANDRA-13291.

I'll hold off attaching one until we have some discussion about what hashing 
implementations we might want to go with -- and after CASSANDRA-13291 is +1'ed 
(which takes care of the bulk of changes required to make this change).

> Replace MessagingService usage of MD5 with something more modern
> 
>
> Key: CASSANDRA-13292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
>
> While profiling C* via multiple profilers, I've consistently seen a 
> significant amount of time being spent calculating MD5 digests.
> {code}
> Stack Trace   Sample CountPercentage(%)
> sun.security.provider.MD5.implCompress(byte[], int)   264 1.566
>sun.security.provider.DigestBase.implCompressMultiBlock(byte[], int, int)  
> 200 1.187
>   sun.security.provider.DigestBase.engineUpdate(byte[], int, int) 200 
> 1.187
>  java.security.MessageDigestSpi.engineUpdate(ByteBuffer)  200 
> 1.187
> java.security.MessageDigest$Delegate.engineUpdate(ByteBuffer) 
> 200 1.187
>java.security.MessageDigest.update(ByteBuffer) 200 1.187
>   org.apache.cassandra.db.Column.updateDigest(MessageDigest)  
> 193 1.145
>  
> org.apache.cassandra.db.ColumnFamily.updateDigest(MessageDigest) 193 1.145
> 
> org.apache.cassandra.db.ColumnFamily.digest(ColumnFamily) 193 1.145
>
> org.apache.cassandra.service.RowDigestResolver.resolve()   106 0.629
>   
> org.apache.cassandra.service.RowDigestResolver.resolve()106 0.629
>  
> org.apache.cassandra.service.ReadCallback.get()  88  0.522
> 
> org.apache.cassandra.service.AbstractReadExecutor.get()   88  0.522
>
> org.apache.cassandra.service.StorageProxy.fetchRows(List, ConsistencyLevel)   
>  88  0.522
>   
> org.apache.cassandra.service.StorageProxy.read(List, ConsistencyLevel)  
> 88  0.522
>  
> org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(int, 
> ConsistencyLevel, boolean) 88  0.522
> 
> org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(int)  88  
> 0.522
>
> org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(int)  88  
> 0.522
>   
> org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
> QueryOptions)  88  0.522
>  
> org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
> QueryOptions)   88  0.522
> 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(CQLStatement, 
> QueryState, QueryOptions) 88  0.522
>
> org.apache.cassandra.cql3.QueryProcessor.process(String, QueryState, 
> QueryOptions) 88  0.522
>   
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryState)
> 88  0.522
>  
> org.apache.cassandra.transport.Message$Dispatcher.messageReceived(ChannelHandlerContext,
>  MessageEvent)   88  0.522
> 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(ChannelHandlerContext,
>  ChannelEvent)  88  0.522
>
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline$DefaultChannelHandlerContext,
>  ChannelEvent) 88  0.522
>   
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(ChannelEvent)
>   88  0.522
>   
>org.jboss.netty.handler.execution.Channel

[jira] [Commented] (CASSANDRA-13233) Improve testing on macOS by eliminating sigar logging

2017-03-02 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893144#comment-15893144
 ] 

Jason Brown commented on CASSANDRA-13233:
-

Created branches for testing

||3.0||3.11||trunk||
|[branch|https://github.com/jasobrown/cassandra/tree/13233-3.0]|[branch|https://github.com/jasobrown/cassandra/tree/13233-3.11]|[branch|https://github.com/jasobrown/cassandra/tree/13233-trunk]|
|[dtest|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13233-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13233-3.11-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13233-trunk-dtest/]|
|[testall|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13233-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13233-3.11-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13233-trunk-testall/]|


> Improve testing on macOS by eliminating sigar logging
> -
>
> Key: CASSANDRA-13233
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13233
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
> Attachments: 28827709.diff
>
>
> The changes introduced in CASSANDRA-7838 (Resolved; Fixed; 2.2.0 beta 1): 
> "Warn user when OS settings are poor / integrate sigar" are not Mac friendly.
> {code}
> INFO  [main] 2016-10-18T11:20:10,330 SigarLibrary.java:44 - Initializing 
> SIGAR library
> DEBUG [main] 2016-10-18T11:20:10,342 SigarLog.java:60 - no 
> libsigar-universal64-macosx.dylib in java.library.path
> org.hyperic.sigar.SigarException: no libsigar-universal64-macosx.dylib in 
> java.library.path
> at org.hyperic.sigar.Sigar.loadLibrary(Sigar.java:172) 
> ~[sigar-1.6.4.jar:?]
> at org.hyperic.sigar.Sigar.(Sigar.java:100) 
> [sigar-1.6.4.jar:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:47) [main/:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:28) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:363) [main/:?]
> at org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:342) 
> [main/:?]
> at 
> org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:291) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.(UUIDGen.java:42) 
> [main/:?]
> at 
> org.apache.cassandra.config.CFMetaData$Builder.build(CFMetaData.java:1278) 
> [main/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:369) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:356) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:351) 
> [classes/:?]
> at 
> org.apache.cassandra.batchlog.BatchTest.defineSchema(BatchTest.java:59) 
> [classes/:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_66]
> at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_66]
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>  [junit-4.6.jar:?]
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) 
> [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) 
> [junit-4.6.jar:?]
> at org.junit.runners.ParentRunner.run(ParentRunner.java:220) 
> [junit-4.6.jar:?]
> at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) 
> [junit-4.6.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033)
>  [ant-junit.jar:?]
> INFO  [main] 2016-10-18T11:20:10,350 SigarLibrary.java:57 - Could not 
> initialize SIGAR library 
> org.hyperic.sigar.Sigar.getFileSystemListNative()[Lorg/hyperic/sigar/FileSystem;
> {code}
> There are 2 issues addressed by the attached patch:
> # Create pla

[jira] [Assigned] (CASSANDRA-13292) Replace MessagingService usage of MD5 with something more modern

2017-03-02 Thread Michael Kjellman (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Kjellman reassigned CASSANDRA-13292:


Assignee: Michael Kjellman

> Replace MessagingService usage of MD5 with something more modern
> 
>
> Key: CASSANDRA-13292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
>
> While profiling C* via multiple profilers, I've consistently seen a 
> significant amount of time being spent calculating MD5 digests.
> {code}
> Stack Trace   Sample CountPercentage(%)
> sun.security.provider.MD5.implCompress(byte[], int)   264 1.566
>sun.security.provider.DigestBase.implCompressMultiBlock(byte[], int, int)  
> 200 1.187
>   sun.security.provider.DigestBase.engineUpdate(byte[], int, int) 200 
> 1.187
>  java.security.MessageDigestSpi.engineUpdate(ByteBuffer)  200 
> 1.187
> java.security.MessageDigest$Delegate.engineUpdate(ByteBuffer) 
> 200 1.187
>java.security.MessageDigest.update(ByteBuffer) 200 1.187
>   org.apache.cassandra.db.Column.updateDigest(MessageDigest)  
> 193 1.145
>  
> org.apache.cassandra.db.ColumnFamily.updateDigest(MessageDigest) 193 1.145
> 
> org.apache.cassandra.db.ColumnFamily.digest(ColumnFamily) 193 1.145
>
> org.apache.cassandra.service.RowDigestResolver.resolve()   106 0.629
>   
> org.apache.cassandra.service.RowDigestResolver.resolve()106 0.629
>  
> org.apache.cassandra.service.ReadCallback.get()  88  0.522
> 
> org.apache.cassandra.service.AbstractReadExecutor.get()   88  0.522
>
> org.apache.cassandra.service.StorageProxy.fetchRows(List, ConsistencyLevel)   
>  88  0.522
>   
> org.apache.cassandra.service.StorageProxy.read(List, ConsistencyLevel)  
> 88  0.522
>  
> org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(int, 
> ConsistencyLevel, boolean) 88  0.522
> 
> org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(int)  88  
> 0.522
>
> org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(int)  88  
> 0.522
>   
> org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
> QueryOptions)  88  0.522
>  
> org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
> QueryOptions)   88  0.522
> 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(CQLStatement, 
> QueryState, QueryOptions) 88  0.522
>
> org.apache.cassandra.cql3.QueryProcessor.process(String, QueryState, 
> QueryOptions) 88  0.522
>   
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryState)
> 88  0.522
>  
> org.apache.cassandra.transport.Message$Dispatcher.messageReceived(ChannelHandlerContext,
>  MessageEvent)   88  0.522
> 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(ChannelHandlerContext,
>  ChannelEvent)  88  0.522
>
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline$DefaultChannelHandlerContext,
>  ChannelEvent) 88  0.522
>   
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(ChannelEvent)
>   88  0.522
>   
>org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun() 
>   88  0.522
>   
>   org.jboss.netty.handler.execution.ChannelEventRunnable.run()  88
>   0.522
>   
>  
> jav

[jira] [Created] (CASSANDRA-13292) Replace MessagingService usage of MD5 with something more modern

2017-03-02 Thread Michael Kjellman (JIRA)
Michael Kjellman created CASSANDRA-13292:


 Summary: Replace MessagingService usage of MD5 with something more 
modern
 Key: CASSANDRA-13292
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13292
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Michael Kjellman


While profiling C* via multiple profilers, I've consistently seen a significant 
amount of time being spent calculating MD5 digests.

{code}
Stack Trace Sample CountPercentage(%)
sun.security.provider.MD5.implCompress(byte[], int) 264 1.566
   sun.security.provider.DigestBase.implCompressMultiBlock(byte[], int, int)
200 1.187
  sun.security.provider.DigestBase.engineUpdate(byte[], int, int)   200 
1.187
 java.security.MessageDigestSpi.engineUpdate(ByteBuffer)200 
1.187
java.security.MessageDigest$Delegate.engineUpdate(ByteBuffer)   
200 1.187
   java.security.MessageDigest.update(ByteBuffer)   200 1.187
  org.apache.cassandra.db.Column.updateDigest(MessageDigest)
193 1.145
 
org.apache.cassandra.db.ColumnFamily.updateDigest(MessageDigest)   193 1.145

org.apache.cassandra.db.ColumnFamily.digest(ColumnFamily)   193 1.145
   
org.apache.cassandra.service.RowDigestResolver.resolve() 106 0.629
  
org.apache.cassandra.service.RowDigestResolver.resolve()  106 0.629
 
org.apache.cassandra.service.ReadCallback.get()88  0.522

org.apache.cassandra.service.AbstractReadExecutor.get() 88  0.522
   
org.apache.cassandra.service.StorageProxy.fetchRows(List, ConsistencyLevel) 
 88  0.522
  
org.apache.cassandra.service.StorageProxy.read(List, ConsistencyLevel)
88  0.522
 
org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(int, 
ConsistencyLevel, boolean)   88  0.522

org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(int)88  
0.522
   
org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(int)88  
0.522
  
org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
QueryOptions)88  0.522
 
org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
QueryOptions) 88  0.522

org.apache.cassandra.cql3.QueryProcessor.processStatement(CQLStatement, 
QueryState, QueryOptions)   88  0.522
   
org.apache.cassandra.cql3.QueryProcessor.process(String, QueryState, 
QueryOptions)   88  0.522
  
org.apache.cassandra.transport.messages.QueryMessage.execute(QueryState)  
88  0.522
 
org.apache.cassandra.transport.Message$Dispatcher.messageReceived(ChannelHandlerContext,
 MessageEvent) 88  0.522

org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(ChannelHandlerContext,
 ChannelEvent)88  0.522
   
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline$DefaultChannelHandlerContext,
 ChannelEvent)   88  0.522
  
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(ChannelEvent)
88  0.522

 org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun() 88  
0.522

org.jboss.netty.handler.execution.ChannelEventRunnable.run()88  
0.522

   
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
 88  0.522

  java.util.concurrent.ThreadPoolExecutor$Worker.run()  88  0.522

 java.lang.Thread.run() 88  0.

[jira] [Updated] (CASSANDRA-13291) Replace usages of MessageDigest with Guava's Hasher

2017-03-02 Thread Michael Kjellman (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Kjellman updated CASSANDRA-13291:
-
Status: Patch Available  (was: Open)

> Replace usages of MessageDigest with Guava's Hasher
> ---
>
> Key: CASSANDRA-13291
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13291
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
> Attachments: CASSANDRA-13291-trunk.diff
>
>
> During my profiling of C* I frequently see lots of aggregate time across 
> threads being spent inside the MD5 MessageDigest implementation. Given that 
> there are tons of modern alternative hashing functions better than MD5 
> available -- both in terms of providing better collision resistance and 
> actual computational speed -- I wanted to switch out our usage of MD5 for 
> alternatives (like adler128 or murmur3_128) and test for performance 
> improvements.
> Unfortunately, I found given the fact we use MessageDigest everywhere --  
> switching out the hashing function to something like adler128 or murmur3_128 
> (for example) -- which don't ship with the JDK --  wasn't straight forward.
> The goal of this ticket is to propose switching out usages of MessageDigest 
> directly in favor of Hasher from Guava. This means going forward we can 
> change a single line of code to switch the hashing algorithm being used 
> (assuming there is an implementation in Guava).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13291) Replace usages of MessageDigest with Guava's Hasher

2017-03-02 Thread Michael Kjellman (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Kjellman updated CASSANDRA-13291:
-
Attachment: CASSANDRA-13291-trunk.diff

> Replace usages of MessageDigest with Guava's Hasher
> ---
>
> Key: CASSANDRA-13291
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13291
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
> Attachments: CASSANDRA-13291-trunk.diff
>
>
> During my profiling of C* I frequently see lots of aggregate time across 
> threads being spent inside the MD5 MessageDigest implementation. Given that 
> there are tons of modern alternative hashing functions better than MD5 
> available -- both in terms of providing better collision resistance and 
> actual computational speed -- I wanted to switch out our usage of MD5 for 
> alternatives (like adler128 or murmur3_128) and test for performance 
> improvements.
> Unfortunately, I found given the fact we use MessageDigest everywhere --  
> switching out the hashing function to something like adler128 or murmur3_128 
> (for example) -- which don't ship with the JDK --  wasn't straight forward.
> The goal of this ticket is to propose switching out usages of MessageDigest 
> directly in favor of Hasher from Guava. This means going forward we can 
> change a single line of code to switch the hashing algorithm being used 
> (assuming there is an implementation in Guava).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13291) Replace usages of MessageDigest with Guava's Hasher

2017-03-02 Thread Michael Kjellman (JIRA)
Michael Kjellman created CASSANDRA-13291:


 Summary: Replace usages of MessageDigest with Guava's Hasher
 Key: CASSANDRA-13291
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13291
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Michael Kjellman
Assignee: Michael Kjellman


During my profiling of C* I frequently see lots of aggregate time across 
threads being spent inside the MD5 MessageDigest implementation. Given that 
there are tons of modern alternative hashing functions better than MD5 
available -- both in terms of providing better collision resistance and actual 
computational speed -- I wanted to switch out our usage of MD5 for alternatives 
(like adler128 or murmur3_128) and test for performance improvements.

Unfortunately, I found given the fact we use MessageDigest everywhere --  
switching out the hashing function to something like adler128 or murmur3_128 
(for example) -- which don't ship with the JDK --  wasn't straight forward.

The goal of this ticket is to propose switching out usages of MessageDigest 
directly in favor of Hasher from Guava. This means going forward we can change 
a single line of code to switch the hashing algorithm being used (assuming 
there is an implementation in Guava).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13233) Improve testing on macOS by eliminating sigar logging

2017-03-02 Thread Jason Brown (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-13233:

Summary: Improve testing on macOS by eliminating sigar logging  (was: no 
libsigar-universal64-macosx.dylib in java.library.path)

> Improve testing on macOS by eliminating sigar logging
> -
>
> Key: CASSANDRA-13233
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13233
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
> Attachments: 28827709.diff
>
>
> The changes introduced in CASSANDRA-7838 (Resolved; Fixed; 2.2.0 beta 1): 
> "Warn user when OS settings are poor / integrate sigar" are not Mac friendly.
> {code}
> INFO  [main] 2016-10-18T11:20:10,330 SigarLibrary.java:44 - Initializing 
> SIGAR library
> DEBUG [main] 2016-10-18T11:20:10,342 SigarLog.java:60 - no 
> libsigar-universal64-macosx.dylib in java.library.path
> org.hyperic.sigar.SigarException: no libsigar-universal64-macosx.dylib in 
> java.library.path
> at org.hyperic.sigar.Sigar.loadLibrary(Sigar.java:172) 
> ~[sigar-1.6.4.jar:?]
> at org.hyperic.sigar.Sigar.(Sigar.java:100) 
> [sigar-1.6.4.jar:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:47) [main/:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:28) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:363) [main/:?]
> at org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:342) 
> [main/:?]
> at 
> org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:291) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.(UUIDGen.java:42) 
> [main/:?]
> at 
> org.apache.cassandra.config.CFMetaData$Builder.build(CFMetaData.java:1278) 
> [main/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:369) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:356) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:351) 
> [classes/:?]
> at 
> org.apache.cassandra.batchlog.BatchTest.defineSchema(BatchTest.java:59) 
> [classes/:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_66]
> at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_66]
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>  [junit-4.6.jar:?]
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) 
> [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) 
> [junit-4.6.jar:?]
> at org.junit.runners.ParentRunner.run(ParentRunner.java:220) 
> [junit-4.6.jar:?]
> at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) 
> [junit-4.6.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033)
>  [ant-junit.jar:?]
> INFO  [main] 2016-10-18T11:20:10,350 SigarLibrary.java:57 - Could not 
> initialize SIGAR library 
> org.hyperic.sigar.Sigar.getFileSystemListNative()[Lorg/hyperic/sigar/FileSystem;
> {code}
> There are 2 issues addressed by the attached patch:
> # Create platform aware (windows, Darwin, linux) implementations of CLibrary 
> (for instance CLibrary today assumes all platforms have support for 
> posix_fadvise but this doesn't exist in the Darwin kernel). If methods are 
> defined with the "native" JNI keyword in java when the class is loaded it 
> will cause our jna check to fail incorrectly making all of CLibrary 
> "disabled" even though because jnaAvailable = false even though on a platform 
> like Darwin all of the native methods except posix_fadvise are supported.
> # Replace sigar usage to get current pid with calls to CLibrary/native 
> equivalent -- and fall back to Sigar for platforms like Windows who don't 
> have that support with JDK8 (and without a CLibrary equival

[jira] [Updated] (CASSANDRA-13233) no libsigar-universal64-macosx.dylib in java.library.path

2017-03-02 Thread Jason Brown (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-13233:

Reviewer: Jason Brown

> no libsigar-universal64-macosx.dylib in java.library.path
> -
>
> Key: CASSANDRA-13233
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13233
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
> Attachments: 28827709.diff
>
>
> The changes introduced in CASSANDRA-7838 (Resolved; Fixed; 2.2.0 beta 1): 
> "Warn user when OS settings are poor / integrate sigar" are not Mac friendly.
> {code}
> INFO  [main] 2016-10-18T11:20:10,330 SigarLibrary.java:44 - Initializing 
> SIGAR library
> DEBUG [main] 2016-10-18T11:20:10,342 SigarLog.java:60 - no 
> libsigar-universal64-macosx.dylib in java.library.path
> org.hyperic.sigar.SigarException: no libsigar-universal64-macosx.dylib in 
> java.library.path
> at org.hyperic.sigar.Sigar.loadLibrary(Sigar.java:172) 
> ~[sigar-1.6.4.jar:?]
> at org.hyperic.sigar.Sigar.(Sigar.java:100) 
> [sigar-1.6.4.jar:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:47) [main/:?]
> at 
> org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:28) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:363) [main/:?]
> at org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:342) 
> [main/:?]
> at 
> org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:291) 
> [main/:?]
> at org.apache.cassandra.utils.UUIDGen.(UUIDGen.java:42) 
> [main/:?]
> at 
> org.apache.cassandra.config.CFMetaData$Builder.build(CFMetaData.java:1278) 
> [main/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:369) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:356) 
> [classes/:?]
> at 
> org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:351) 
> [classes/:?]
> at 
> org.apache.cassandra.batchlog.BatchTest.defineSchema(BatchTest.java:59) 
> [classes/:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_66]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_66]
> at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_66]
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>  [junit-4.6.jar:?]
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>  [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) 
> [junit-4.6.jar:?]
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) 
> [junit-4.6.jar:?]
> at org.junit.runners.ParentRunner.run(ParentRunner.java:220) 
> [junit-4.6.jar:?]
> at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) 
> [junit-4.6.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182)
>  [ant-junit.jar:?]
> at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033)
>  [ant-junit.jar:?]
> INFO  [main] 2016-10-18T11:20:10,350 SigarLibrary.java:57 - Could not 
> initialize SIGAR library 
> org.hyperic.sigar.Sigar.getFileSystemListNative()[Lorg/hyperic/sigar/FileSystem;
> {code}
> There are 2 issues addressed by the attached patch:
> # Create platform aware (windows, Darwin, linux) implementations of CLibrary 
> (for instance CLibrary today assumes all platforms have support for 
> posix_fadvise but this doesn't exist in the Darwin kernel). If methods are 
> defined with the "native" JNI keyword in java when the class is loaded it 
> will cause our jna check to fail incorrectly making all of CLibrary 
> "disabled" even though because jnaAvailable = false even though on a platform 
> like Darwin all of the native methods except posix_fadvise are supported.
> # Replace sigar usage to get current pid with calls to CLibrary/native 
> equivalent -- and fall back to Sigar for platforms like Windows who don't 
> have that support with JDK8 (and without a CLibrary equivalent)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (CASSANDRA-12467) dtest failure in jmx_test.TestJMX.netstats_test

2017-03-02 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson resolved CASSANDRA-12467.
-
Resolution: Not A Problem

> dtest failure in jmx_test.TestJMX.netstats_test
> ---
>
> Key: CASSANDRA-12467
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12467
> Project: Cassandra
>  Issue Type: Test
>Reporter: Craig Kodman
>Assignee: DS Test Eng
>  Labels: dtest, windows
> Attachments: node1_debug.log, node1_gc.log, node1.log, 
> node2_debug.log, node2_gc.log, node2.log, node3_debug.log, node3_gc.log, 
> node3.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-2.2_dtest_win32/285/testReport/jmx_test/TestJMX/netstats_test
> {code}
> Error Message
> "ConnectException: 'Connection refused'." does not match "Subprocess 
> ['nodetool', '-h', 'localhost', '-p', '7100', ['netstats']] exited with 
> non-zero status; exit status: 1; 
> stdout: Starting NodeTool
> ; 
> stderr: nodetool: Failed to connect to 'localhost:7100' - ConnectException: 
> 'Connection refused: connect'.
> "
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: d:\temp\2\dtest-dbbq3u
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> - >> end captured logging << -
> {code}
> {code}
> Stacktrace
>   File "C:\tools\python2\lib\unittest\case.py", line 329, in run
> testMethod()
>   File 
> "D:\jenkins\workspace\cassandra-2.2_dtest_win32\cassandra-dtest\jmx_test.py", 
> line 35, in netstats_test
> node1.nodetool('netstats')
>   File "C:\tools\python2\lib\unittest\case.py", line 127, in __exit__
> (expected_regexp.pattern, str(exc_value)))
> '"ConnectException: \'Connection refused\'." does not match "Subprocess 
> [\'nodetool\', \'-h\', \'localhost\', \'-p\', \'7100\', [\'netstats\']] 
> exited with non-zero status; exit status: 1; \nstdout: Starting NodeTool\n; 
> \nstderr: nodetool: Failed to connect to \'localhost:7100\' - 
> ConnectException: \'Connection refused: connect\'.\n"\n 
> >> begin captured logging << \ndtest: DEBUG: cluster ccm 
> directory: d:\\temp\\2\\dtest-dbbq3u\ndtest: DEBUG: Done setting 
> configuration options:\n{   \'initial_token\': None,\n\'num_tokens\': 
> \'32\',\n\'phi_convict_threshold\': 5,\n
> \'range_request_timeout_in_ms\': 1,\n\'read_request_timeout_in_ms\': 
> 1,\n\'request_timeout_in_ms\': 1,\n
> \'truncate_request_timeout_in_ms\': 1,\n
> \'write_request_timeout_in_ms\': 1}\n- >> end 
> captured logging << -'
> Standard Error
> Started: node1 with pid: 6288
> Started: node3 with pid: 2280
> Started: node2 with pid: 6980
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (CASSANDRA-11250) (windows) dtest failure in upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_22_test

2017-03-02 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson resolved CASSANDRA-11250.
-
Resolution: Not A Problem

> (windows) dtest failure in 
> upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_22_test
> 
>
> Key: CASSANDRA-11250
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11250
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: DS Test Eng
>  Labels: dtest, windows
>
> example failure:
> http://cassci.datastax.com/job/cassandra-2.2_dtest_win32/174/testReport/upgrade_internal_auth_test/TestAuthUpgrade/upgrade_to_22_test
> Failed on CassCI build cassandra-2.2_dtest_win32 #174
> looks like there could be multiple causes for this intermittent failure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (CASSANDRA-11267) (windows) dtest failure in upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_30_test

2017-03-02 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson resolved CASSANDRA-11267.
-
Resolution: Not A Problem

> (windows) dtest failure in 
> upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_30_test
> 
>
> Key: CASSANDRA-11267
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11267
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: DS Test Eng
>  Labels: dtest, windows
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.0_dtest_win32/167/testReport/upgrade_internal_auth_test/TestAuthUpgrade/upgrade_to_30_test
> Failed on CassCI build cassandra-3.0_dtest_win32 #167
> this test is flapping pretty frequently. not certain yet on failure cause, 
> might vary across builds.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13233) no libsigar-universal64-macosx.dylib in java.library.path

2017-03-02 Thread Jason Brown (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-13233:

Description: 
The changes introduced in CASSANDRA-7838 (Resolved; Fixed; 2.2.0 beta 1): "Warn 
user when OS settings are poor / integrate sigar" are not Mac friendly.

{code}

INFO  [main] 2016-10-18T11:20:10,330 SigarLibrary.java:44 - Initializing SIGAR 
library
DEBUG [main] 2016-10-18T11:20:10,342 SigarLog.java:60 - no 
libsigar-universal64-macosx.dylib in java.library.path
org.hyperic.sigar.SigarException: no libsigar-universal64-macosx.dylib in 
java.library.path
at org.hyperic.sigar.Sigar.loadLibrary(Sigar.java:172) 
~[sigar-1.6.4.jar:?]
at org.hyperic.sigar.Sigar.(Sigar.java:100) [sigar-1.6.4.jar:?]
at org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:47) 
[main/:?]
at 
org.apache.cassandra.utils.SigarLibrary.(SigarLibrary.java:28) [main/:?]
at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:363) [main/:?]
at org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:342) 
[main/:?]
at 
org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:291) 
[main/:?]
at org.apache.cassandra.utils.UUIDGen.(UUIDGen.java:42) 
[main/:?]
at 
org.apache.cassandra.config.CFMetaData$Builder.build(CFMetaData.java:1278) 
[main/:?]
at 
org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:369) 
[classes/:?]
at 
org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:356) 
[classes/:?]
at 
org.apache.cassandra.SchemaLoader.standardCFMD(SchemaLoader.java:351) 
[classes/:?]
at 
org.apache.cassandra.batchlog.BatchTest.defineSchema(BatchTest.java:59) 
[classes/:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_66]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_66]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_66]
at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_66]
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 [junit-4.6.jar:?]
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 [junit-4.6.jar:?]
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 [junit-4.6.jar:?]
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) 
[junit-4.6.jar:?]
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) 
[junit-4.6.jar:?]
at org.junit.runners.ParentRunner.run(ParentRunner.java:220) 
[junit-4.6.jar:?]
at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) 
[junit-4.6.jar:?]
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535)
 [ant-junit.jar:?]
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182)
 [ant-junit.jar:?]
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033)
 [ant-junit.jar:?]
INFO  [main] 2016-10-18T11:20:10,350 SigarLibrary.java:57 - Could not 
initialize SIGAR library 
org.hyperic.sigar.Sigar.getFileSystemListNative()[Lorg/hyperic/sigar/FileSystem;
{code}

There are 2 issues addressed by the attached patch:
# Create platform aware (windows, Darwin, linux) implementations of CLibrary 
(for instance CLibrary today assumes all platforms have support for 
posix_fadvise but this doesn't exist in the Darwin kernel). If methods are 
defined with the "native" JNI keyword in java when the class is loaded it will 
cause our jna check to fail incorrectly making all of CLibrary "disabled" even 
though because jnaAvailable = false even though on a platform like Darwin all 
of the native methods except posix_fadvise are supported.
# Replace sigar usage to get current pid with calls to CLibrary/native 
equivalent -- and fall back to Sigar for platforms like Windows who don't have 
that support with JDK8 (and without a CLibrary equivalent)

  was:
The changes introduced in https://issues.apache.org/jira/browse/CASSANDRA-7838 
(Resolved; Fixed; 2.2.0 beta 1): "Warn user when OS settings are poor / 
integrate sigar" are not Mac friendly.

{code}

INFO  [main] 2016-10-18T11:20:10,330 SigarLibrary.java:44 - Initializing SIGAR 
library
DEBUG [main] 2016-10-18T11:20:10,342 SigarLog.java:60 - no 
libsigar-universal64-macosx.dylib in java.library.path
org.hyperic.sigar.SigarException: no libsigar-universal64-macosx.dylib in 
java.library.path
at org.hyperic.sigar.Sigar.loadLibrary(Sigar.java:172) 
~[sigar-1.6.4.jar:?]
at org.hyperic.sigar.Sigar.(Sigar.java:100) [sigar-1.6.4.jar:?]
at org.apache.cassandra.utils.SigarLib

[jira] [Commented] (CASSANDRA-13290) Optimizing very small repair streams

2017-03-02 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892981#comment-15892981
 ] 

Benjamin Roth commented on CASSANDRA-13290:
---

While CASSANDRA-8911 brings a very interesting approach into the game, however 
the solution is rather complex (as can be seen in stalled ticket activity).

I guess both 12888 and this ticket are lower hanging fruits for a start, 
whereas I don't say it's not worth working on both approaches.

> Optimizing very small repair streams
> 
>
> Key: CASSANDRA-13290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13290
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benjamin Roth
>
> I often encountered repair scenarios, where a lot of tiny repair streams were 
> created. This results in hundrets, thousands or even ten-thousands super 
> small SSTables (some bytes to some kbytes).
> This puts a lot of pressure on compaction and may even lead to a crash due to 
> too many open files - I also encountered this.
> What could help to avoid this:
> After CASSANDRA-12888 is resolved, a tiny stream (e.g. < 100kb) could be sent 
> through the write path to be buffered by memtables instead of creating an 
> SSTable each.
> Without CASSANDRA-12888 this would break incremental repairs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13290) Optimizing very small repair streams

2017-03-02 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892947#comment-15892947
 ] 

Jeff Jirsa commented on CASSANDRA-13290:


See also CASSANDRA-8911



> Optimizing very small repair streams
> 
>
> Key: CASSANDRA-13290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13290
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benjamin Roth
>
> I often encountered repair scenarios, where a lot of tiny repair streams were 
> created. This results in hundrets, thousands or even ten-thousands super 
> small SSTables (some bytes to some kbytes).
> This puts a lot of pressure on compaction and may even lead to a crash due to 
> too many open files - I also encountered this.
> What could help to avoid this:
> After CASSANDRA-12888 is resolved, a tiny stream (e.g. < 100kb) could be sent 
> through the write path to be buffered by memtables instead of creating an 
> SSTable each.
> Without CASSANDRA-12888 this would break incremental repairs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13290) Optimizing very small repair streams

2017-03-02 Thread Benjamin Roth (JIRA)
Benjamin Roth created CASSANDRA-13290:
-

 Summary: Optimizing very small repair streams
 Key: CASSANDRA-13290
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13290
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benjamin Roth


I often encountered repair scenarios, where a lot of tiny repair streams were 
created. This results in hundrets, thousands or even ten-thousands super small 
SSTables (some bytes to some kbytes).
This puts a lot of pressure on compaction and may even lead to a crash due to 
too many open files - I also encountered this.

What could help to avoid this:
After CASSANDRA-12888 is resolved, a tiny stream (e.g. < 100kb) could be sent 
through the write path to be buffered by memtables instead of creating an 
SSTable each.

Without CASSANDRA-12888 this would break incremental repairs.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13289) Make it possible to monitor an ideal consistency level separate from actual consistency level

2017-03-02 Thread Ariel Weisberg (JIRA)
Ariel Weisberg created CASSANDRA-13289:
--

 Summary: Make it possible to monitor an ideal consistency level 
separate from actual consistency level
 Key: CASSANDRA-13289
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13289
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg


As an operator there are several issues related to multi-datacenter replication 
and consistency you may want to have more information on from your production 
database.

For instance. If your application writes at LOCAL_QUORUM how often are those 
writes failing to achieve EACH_QUORUM at other data centers. If you failed your 
application over to one of those data centers roughly how inconsistent might it 
be given the number of writes that didn't propagate since the last incremental 
repair?

You might also want to know roughly what the latency of writes would be if 
switch to a different consistency level. For instance you are writing at 
LOCAL_QUORUM and want to know what would happen if you switched to EACH_QUORUM.

The proposed change is to allow an ideal_consistency_level to be specified in 
cassandra.yaml as well as get/set via JMX. If no ideal consistency level is 
specified no additional tracking is done.

if an ideal consistency level is specified then the 
{{AbstractWriteResponesHandler}} will contain a delegate WriteResponseHandler 
that tracks whether the ideal consistency level is met before a write times 
out. It also tracks the latency for achieving the ideal CL  of successful 
writes on.

These two metrics would be reported on a per Keyspace basis.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13282) Commitlog replay may fail if last mutation is within 4 bytes of end of segment

2017-03-02 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-13282:
---
Status: Patch Available  (was: In Progress)

> Commitlog replay may fail if last mutation is within 4 bytes of end of segment
> --
>
> Key: CASSANDRA-13282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13282
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
> Attachments: whiteboard.png
>
>
> Following CASSANDRA-9749 , stricter correctness checks on commitlog replay 
> can incorrectly detect "corrupt segments" and stop commitlog replay (and 
> potentially stop cassandra, depending on the configured policy). In 
> {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int 
> {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the 
> segment was created), we continue on to the next segment. However, it appears 
> that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining 
> in the segment, we'll pass the {{isEOF}} on the while loop but fail to read 
> the {{serializedSize}} int, and fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13282) Commitlog replay may fail if last mutation is within 4 bytes of end of segment

2017-03-02 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892849#comment-15892849
 ] 

Jeff Jirsa commented on CASSANDRA-13282:


Attaching a drawing for whichever reviewer wants this ticket - drawing created 
while discussing this offline because it's somewhat nuanced apparently. 
Basically when we allocate in {{sync()}}, if we're at the end of a file, we 
return -1, and then the end marker for the segment gets set to the end of the 
file. Therefore within the while loop as we replay an individual sync section, 
we can get to a point where we throw trying to read an int from the unused tail 
of the section. 

> Commitlog replay may fail if last mutation is within 4 bytes of end of segment
> --
>
> Key: CASSANDRA-13282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13282
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
> Attachments: whiteboard.png
>
>
> Following CASSANDRA-9749 , stricter correctness checks on commitlog replay 
> can incorrectly detect "corrupt segments" and stop commitlog replay (and 
> potentially stop cassandra, depending on the configured policy). In 
> {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int 
> {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the 
> segment was created), we continue on to the next segment. However, it appears 
> that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining 
> in the segment, we'll pass the {{isEOF}} on the while loop but fail to read 
> the {{serializedSize}} int, and fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-9998) LEAK DETECTED with snapshot/sequential repairs

2017-03-02 Thread Vladimir Kuzmin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892845#comment-15892845
 ] 

Vladimir Kuzmin edited comment on CASSANDRA-9998 at 3/2/17 7:31 PM:


I still observe it at Cassandra 2.2.5:

ERROR [ValidationExecutor:29] 2017-02-13 00:10:38,903 
CompactionManager.java:1086 - Cannot start multiple repair sessions over the 
same sstables
ERROR [ValidationExecutor:29] 2017-02-13 00:10:38,903 Validator.java:246 - 
Failed creating a merkle tree for [repair #aea6e561-f180-11e6-9f5c-8fee5bd1c5f8 
on logdb/tomcat_sessions, (5876725599440613959,5877112887947005799]], 
/10.145.144.71 (see log for details)
ERROR [ValidationExecutor:29] 2017-02-13 00:10:38,904 CassandraDaemon.java:185 
- Exception in thread Thread[ValidationExecutor:29,1,main]
ERROR [Reference-Reaper:1] 2017-02-13 00:10:49,560 Ref.java:187 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@1387fbb6) to class 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1451583394:/storage/core/loginsight/cidata/cassandra/data/logdb/tomcat_sessions-2ef89820cc0711e6bc7253e2bf7d3086/la-357-big
 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-02-13 00:10:49,561 Ref.java:187 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@33992414) to class 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1729712987:/storage/core/loginsight/cidata/cassandra/data/logdb/tomcat_sessions-2ef89820cc0711e6bc7253e2bf7d3086/la-356-big
 was not released before the reference was garbage collected

Is it a new or patch wasn't applied to 2.2.5 version?


was (Author: kuzminva):
I still observer it at Cassandra 2.2.5:

ERROR [ValidationExecutor:29] 2017-02-13 00:10:38,903 
CompactionManager.java:1086 - Cannot start multiple repair sessions over the 
same sstables
ERROR [ValidationExecutor:29] 2017-02-13 00:10:38,903 Validator.java:246 - 
Failed creating a merkle tree for [repair #aea6e561-f180-11e6-9f5c-8fee5bd1c5f8 
on logdb/tomcat_sessions, (5876725599440613959,5877112887947005799]], 
/10.145.144.71 (see log for details)
ERROR [ValidationExecutor:29] 2017-02-13 00:10:38,904 CassandraDaemon.java:185 
- Exception in thread Thread[ValidationExecutor:29,1,main]
ERROR [Reference-Reaper:1] 2017-02-13 00:10:49,560 Ref.java:187 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@1387fbb6) to class 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1451583394:/storage/core/loginsight/cidata/cassandra/data/logdb/tomcat_sessions-2ef89820cc0711e6bc7253e2bf7d3086/la-357-big
 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-02-13 00:10:49,561 Ref.java:187 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@33992414) to class 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1729712987:/storage/core/loginsight/cidata/cassandra/data/logdb/tomcat_sessions-2ef89820cc0711e6bc7253e2bf7d3086/la-356-big
 was not released before the reference was garbage collected

Is it a new or patch wasn't applied to 2.2.5 version?

> LEAK DETECTED with snapshot/sequential repairs
> --
>
> Key: CASSANDRA-9998
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9998
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 2.1.9, 2.2.1, 3.0 beta 1
>
>
> http://cassci.datastax.com/job/cassandra-2.1_dtest/lastCompletedBuild/testReport/repair_test/TestRepair/simple_sequential_repair_test/
> does not happen if I add -par to the test



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-9998) LEAK DETECTED with snapshot/sequential repairs

2017-03-02 Thread Vladimir Kuzmin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892845#comment-15892845
 ] 

Vladimir Kuzmin commented on CASSANDRA-9998:


I still observer it at Cassandra 2.2.5:

ERROR [ValidationExecutor:29] 2017-02-13 00:10:38,903 
CompactionManager.java:1086 - Cannot start multiple repair sessions over the 
same sstables
ERROR [ValidationExecutor:29] 2017-02-13 00:10:38,903 Validator.java:246 - 
Failed creating a merkle tree for [repair #aea6e561-f180-11e6-9f5c-8fee5bd1c5f8 
on logdb/tomcat_sessions, (5876725599440613959,5877112887947005799]], 
/10.145.144.71 (see log for details)
ERROR [ValidationExecutor:29] 2017-02-13 00:10:38,904 CassandraDaemon.java:185 
- Exception in thread Thread[ValidationExecutor:29,1,main]
ERROR [Reference-Reaper:1] 2017-02-13 00:10:49,560 Ref.java:187 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@1387fbb6) to class 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1451583394:/storage/core/loginsight/cidata/cassandra/data/logdb/tomcat_sessions-2ef89820cc0711e6bc7253e2bf7d3086/la-357-big
 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-02-13 00:10:49,561 Ref.java:187 - LEAK 
DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@33992414) to class 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1729712987:/storage/core/loginsight/cidata/cassandra/data/logdb/tomcat_sessions-2ef89820cc0711e6bc7253e2bf7d3086/la-356-big
 was not released before the reference was garbage collected

Is it a new or patch wasn't applied to 2.2.5 version?

> LEAK DETECTED with snapshot/sequential repairs
> --
>
> Key: CASSANDRA-9998
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9998
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 2.1.9, 2.2.1, 3.0 beta 1
>
>
> http://cassci.datastax.com/job/cassandra-2.1_dtest/lastCompletedBuild/testReport/repair_test/TestRepair/simple_sequential_repair_test/
> does not happen if I add -par to the test



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13282) Commitlog replay may fail if last mutation is within 4 bytes of end of segment

2017-03-02 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-13282:
---
Attachment: whiteboard.png

> Commitlog replay may fail if last mutation is within 4 bytes of end of segment
> --
>
> Key: CASSANDRA-13282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13282
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
> Attachments: whiteboard.png
>
>
> Following CASSANDRA-9749 , stricter correctness checks on commitlog replay 
> can incorrectly detect "corrupt segments" and stop commitlog replay (and 
> potentially stop cassandra, depending on the configured policy). In 
> {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int 
> {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the 
> segment was created), we continue on to the next segment. However, it appears 
> that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining 
> in the segment, we'll pass the {{isEOF}} on the while loop but fail to read 
> the {{serializedSize}} int, and fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13282) Commitlog replay may fail if last mutation is within 4 bytes of end of segment

2017-03-02 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-13282:
---
Fix Version/s: 2.2.x

> Commitlog replay may fail if last mutation is within 4 bytes of end of segment
> --
>
> Key: CASSANDRA-13282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13282
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> Following CASSANDRA-9749 , stricter correctness checks on commitlog replay 
> can incorrectly detect "corrupt segments" and stop commitlog replay (and 
> potentially stop cassandra, depending on the configured policy). In 
> {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int 
> {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the 
> segment was created), we continue on to the next segment. However, it appears 
> that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining 
> in the segment, we'll pass the {{isEOF}} on the while loop but fail to read 
> the {{serializedSize}} int, and fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13282) Commitlog replay may fail if last mutation is within 4 bytes of end of segment

2017-03-02 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892831#comment-15892831
 ] 

Jeff Jirsa commented on CASSANDRA-13282:


Will kick off CI in a bit, not going to do all 8 tests at once because I don't 
want to monopolize cassci, but links should be accurate once tests start. 

|| Branch || Unit Tests || DTests ||
| [2.2|https://github.com/jeffjirsa/cassandra/commits/cassandra-2.2-13282] | 
[testall|http://cassci.datastax.com/job/jeffjirsa-cassandra-2.2-13282-testall/] 
| [dtest|http://cassci.datastax.com/job/jeffjirsa-cassandra-2.2-13282-dtest/] |
| [3.0|https://github.com/jeffjirsa/cassandra/commits/cassandra-3.0-13282] | 
[testall|http://cassci.datastax.com/job/jeffjirsa-cassandra-3.0-13282-testall/] 
| [dtest|http://cassci.datastax.com/job/jeffjirsa-cassandra-3.0-13282-dtest/] |
| [3.11|https://github.com/jeffjirsa/cassandra/commits/cassandra-3.11-13282] | 
[testall|http://cassci.datastax.com/job/jeffjirsa-cassandra-3.11-13282-testall/]
 | [dtest|http://cassci.datastax.com/job/jeffjirsa-cassandra-3.11-13282-dtest/] 
|
| [trunk|https://github.com/jeffjirsa/cassandra/commits/cassandra-13282] | 
[testall|http://cassci.datastax.com/job/jeffjirsa-cassandra-13282-testall/] | 
[dtest|http://cassci.datastax.com/job/jeffjirsa-cassandra-13282-dtest/] |


> Commitlog replay may fail if last mutation is within 4 bytes of end of segment
> --
>
> Key: CASSANDRA-13282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13282
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Following CASSANDRA-9749 , stricter correctness checks on commitlog replay 
> can incorrectly detect "corrupt segments" and stop commitlog replay (and 
> potentially stop cassandra, depending on the configured policy). In 
> {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int 
> {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the 
> segment was created), we continue on to the next segment. However, it appears 
> that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining 
> in the segment, we'll pass the {{isEOF}} on the while loop but fail to read 
> the {{serializedSize}} int, and fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13231) org.apache.cassandra.db.DirectoriesTest(testStandardDirs) unit test failing

2017-03-02 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli updated CASSANDRA-13231:
--
Status: Patch Available  (was: Open)

> org.apache.cassandra.db.DirectoriesTest(testStandardDirs) unit test failing
> ---
>
> Key: CASSANDRA-13231
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13231
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
> Attachments: 674.diff
>
>
> The testStandardDirs(org.apache.cassandra.db.DirectoriesTest) unit test 
> always fails. This appears to be due to a commit by Yuki for CASSANDRA-10587 
> which switched the SSTable descriptor to use the canonical path.
> From one of Yuki's comments in CASSANDRA-10587:
> "I ended up fixing Descriptor object to always have canonical path as its 
> directory.
> This way we don't need to think about given directory is relative or absolute.
> In fact, right now Desctiptor (and corresponding SSTable) is not considered 
> equal between Descriptor's directory being relative and absolute. (Added 
> simple unit test to DescriptorTest)."
> The issue here is that canonical path will expand out differently than even 
> absolute path. In this case /var/folders -> /private/var/folders. The unit 
> test is looking for /var/folders/... but the Descriptor expands out to 
> /private/var/folders and the unit test fails.
> Descriptor#L88 seems to be the real root cause.
>[junit] Testcase: 
> testStandardDirs(org.apache.cassandra.db.DirectoriesTest):   FAILED
> [junit] 
> expected:
>  but 
> was:
> [junit] junit.framework.AssertionFailedError: 
> expected:
>  but 
> was:
> [junit]   at 
> org.apache.cassandra.db.DirectoriesTest.testStandardDirs(DirectoriesTest.java:159)
> [junit] 
> [junit] 
> [junit] Test org.apache.cassandra.db.DirectoriesTest FAILED
> I'm guessing given we went to canonicalPath() on purpose the "fix" here is to 
> call .getCanonicalFile() on both expected Files generated (snapshotDir and 
> backupsDir) for the junit assert.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CASSANDRA-13231) org.apache.cassandra.db.DirectoriesTest(testStandardDirs) unit test failing

2017-03-02 Thread sankalp kohli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sankalp kohli reassigned CASSANDRA-13231:
-

Assignee: Michael Kjellman

> org.apache.cassandra.db.DirectoriesTest(testStandardDirs) unit test failing
> ---
>
> Key: CASSANDRA-13231
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13231
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
> Attachments: 674.diff
>
>
> The testStandardDirs(org.apache.cassandra.db.DirectoriesTest) unit test 
> always fails. This appears to be due to a commit by Yuki for CASSANDRA-10587 
> which switched the SSTable descriptor to use the canonical path.
> From one of Yuki's comments in CASSANDRA-10587:
> "I ended up fixing Descriptor object to always have canonical path as its 
> directory.
> This way we don't need to think about given directory is relative or absolute.
> In fact, right now Desctiptor (and corresponding SSTable) is not considered 
> equal between Descriptor's directory being relative and absolute. (Added 
> simple unit test to DescriptorTest)."
> The issue here is that canonical path will expand out differently than even 
> absolute path. In this case /var/folders -> /private/var/folders. The unit 
> test is looking for /var/folders/... but the Descriptor expands out to 
> /private/var/folders and the unit test fails.
> Descriptor#L88 seems to be the real root cause.
>[junit] Testcase: 
> testStandardDirs(org.apache.cassandra.db.DirectoriesTest):   FAILED
> [junit] 
> expected:
>  but 
> was:
> [junit] junit.framework.AssertionFailedError: 
> expected:
>  but 
> was:
> [junit]   at 
> org.apache.cassandra.db.DirectoriesTest.testStandardDirs(DirectoriesTest.java:159)
> [junit] 
> [junit] 
> [junit] Test org.apache.cassandra.db.DirectoriesTest FAILED
> I'm guessing given we went to canonicalPath() on purpose the "fix" here is to 
> call .getCanonicalFile() on both expected Files generated (snapshotDir and 
> backupsDir) for the junit assert.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12653) In-flight shadow round requests

2017-03-02 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892772#comment-15892772
 ] 

Joel Knighton commented on CASSANDRA-12653:
---

Thanks! The latest changes look good - however, if moving the System.nanoTime() 
call to the comparison site, it seems that the {{firstSynSendAt}} truly does 
reduce to a boolean, since the comparison will now always be true if 
{{firstSynSendAt}} has been set. I don't think the existing patch will cause 
any problems, but it may be more complicated than it needs to be.

> In-flight shadow round requests
> ---
>
> Key: CASSANDRA-12653
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12653
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Minor
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> Bootstrapping or replacing a node in the cluster requires to gather and check 
> some host IDs or tokens by doing a gossip "shadow round" once before joining 
> the cluster. This is done by sending a gossip SYN to all seeds until we 
> receive a response with the cluster state, from where we can move on in the 
> bootstrap process. Receiving a response will call the shadow round done and 
> calls {{Gossiper.resetEndpointStateMap}} for cleaning up the received state 
> again.
> The issue here is that at this point there might be other in-flight requests 
> and it's very likely that shadow round responses from other seeds will be 
> received afterwards, while the current state of the bootstrap process doesn't 
> expect this to happen (e.g. gossiper may or may not be enabled). 
> One side effect will be that MigrationTasks are spawned for each shadow round 
> reply except the first. Tasks might or might not execute based on whether at 
> execution time {{Gossiper.resetEndpointStateMap}} had been called, which 
> effects the outcome of {{FailureDetector.instance.isAlive(endpoint))}} at 
> start of the task. You'll see error log messages such as follows when this 
> happend:
> {noformat}
> INFO  [SharedPool-Worker-1] 2016-09-08 08:36:39,255 Gossiper.java:993 - 
> InetAddress /xx.xx.xx.xx is now UP
> ERROR [MigrationStage:1]2016-09-08 08:36:39,255 FailureDetector.java:223 
> - unknown endpoint /xx.xx.xx.xx
> {noformat}
> Although is isn't pretty, I currently don't see any serious harm from this, 
> but it would be good to get a second opinion (feel free to close as "wont 
> fix").
> /cc [~Stefania] [~thobbs]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12661) Make gc_log and gc_warn settable at runtime

2017-03-02 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892748#comment-15892748
 ] 

Jon Haddad commented on CASSANDRA-12661:


You're right - I think i had tried to apply it twice.  Long day.  Reviewing now.

> Make gc_log and gc_warn settable at runtime
> ---
>
> Key: CASSANDRA-12661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12661
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
>
> Changes:
> * Move gc_log_threshold_in_ms and gc_warn_threshold_in_ms close together in 
> the config
> * rename variables to match properties
> * add unit tests to ensure hybration
> * add unit tests to ensure variables are set propertly
> * minor perf (do not consturct string from buffer f not logging)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-12661) Make gc_log and gc_warn settable at runtime

2017-03-02 Thread Jon Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-12661:
---
Reviewer: Jon Haddad

> Make gc_log and gc_warn settable at runtime
> ---
>
> Key: CASSANDRA-12661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12661
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
>
> Changes:
> * Move gc_log_threshold_in_ms and gc_warn_threshold_in_ms close together in 
> the config
> * rename variables to match properties
> * add unit tests to ensure hybration
> * add unit tests to ensure variables are set propertly
> * minor perf (do not consturct string from buffer f not logging)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12661) Make gc_log and gc_warn settable at runtime

2017-03-02 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892735#comment-15892735
 ] 

Edward Capriolo commented on CASSANDRA-12661:
-

Make it happen capn :) aiming to get a small chunk of meritocracy every blue 
moon or so! :)

> Make gc_log and gc_warn settable at runtime
> ---
>
> Key: CASSANDRA-12661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12661
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
>
> Changes:
> * Move gc_log_threshold_in_ms and gc_warn_threshold_in_ms close together in 
> the config
> * rename variables to match properties
> * add unit tests to ensure hybration
> * add unit tests to ensure variables are set propertly
> * minor perf (do not consturct string from buffer f not logging)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13001) pluggable slow query logging / handling

2017-03-02 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892695#comment-15892695
 ] 

Jon Haddad commented on CASSANDRA-13001:


The easiest code to look at is probably the seed provider:

{code}
# constructor that takes a Map of parameters will do.
seed_provider:
# Addresses of hosts that are deemed contact points. 
# Cassandra nodes use this list of hosts to find each other and learn
# the topology of the ring.  You must change this if you are running
# multiple nodes!
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
{code}

It's a very, very simple interface, where implementation is provided by the 
codebase and users can drop in their own if they want.

> pluggable slow query logging / handling
> ---
>
> Key: CASSANDRA-13001
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13001
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jon Haddad
>Assignee: Murukesh Mohanan
> Fix For: 4.0
>
> Attachments: 
> 0001-Add-multiple-logging-methods-for-slow-queries-CASSAN.patch
>
>
> Currently CASSANDRA-12403 logs slow queries as DEBUG to a file.  It would be 
> better to have this as an interface which we can log to alternative 
> locations, such as to a table on the cluster or to a remote location (statsd, 
> graphite, etc).  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread

2017-03-02 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-13265:
---
Summary: Expiration in OutboundTcpConnection can block the reader Thread  
(was: Epxiration in OutboundTcpConnection can block the reader Thread)

> Expiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[1/6] cassandra git commit: Test fixes for CASSANDRA-13038

2017-03-02 Thread jjirsa
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.0 496cfa8f5 -> adbe2cc4d
  refs/heads/cassandra-3.11 943fb02ff -> 6f9610d47
  refs/heads/trunk 14c3edcce -> a7c9fa0f1


Test fixes for CASSANDRA-13038

Patch by Jeff Jirsa; Reviewed by Joel Knighton for CASSANDRA-13038


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/adbe2cc4
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/adbe2cc4
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/adbe2cc4

Branch: refs/heads/cassandra-3.0
Commit: adbe2cc4df0134955a2c83ae4ebd0086ea5e9164
Parents: 496cfa8
Author: Jeff Jirsa 
Authored: Thu Mar 2 09:42:38 2017 -0800
Committer: Jeff Jirsa 
Committed: Thu Mar 2 09:42:38 2017 -0800

--
 src/java/org/apache/cassandra/utils/StreamingHistogram.java  | 4 +++-
 .../unit/org/apache/cassandra/db/compaction/CompactionsTest.java | 3 ++-
 .../db/compaction/DateTieredCompactionStrategyTest.java  | 4 ++--
 .../cassandra/db/compaction/LeveledCompactionStrategyTest.java   | 4 ++--
 .../db/compaction/SizeTieredCompactionStrategyTest.java  | 4 ++--
 test/unit/org/apache/cassandra/db/compaction/TTLExpiryTest.java  | 4 ++--
 .../db/compaction/TimeWindowCompactionStrategyTest.java  | 4 ++--
 7 files changed, 15 insertions(+), 12 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/adbe2cc4/src/java/org/apache/cassandra/utils/StreamingHistogram.java
--
diff --git a/src/java/org/apache/cassandra/utils/StreamingHistogram.java 
b/src/java/org/apache/cassandra/utils/StreamingHistogram.java
index fffa73e..f1752a9 100644
--- a/src/java/org/apache/cassandra/utils/StreamingHistogram.java
+++ b/src/java/org/apache/cassandra/utils/StreamingHistogram.java
@@ -271,7 +271,9 @@ public class StreamingHistogram
 return false;
 
 StreamingHistogram that = (StreamingHistogram) o;
-return maxBinSize == that.maxBinSize && bin.equals(that.bin);
+return maxBinSize == that.maxBinSize
+   && spool.equals(that.spool)
+   && bin.equals(that.bin);
 }
 
 @Override

http://git-wip-us.apache.org/repos/asf/cassandra/blob/adbe2cc4/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
--
diff --git a/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java 
b/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
index 2a30ae1..1530741 100644
--- a/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
+++ b/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
@@ -60,11 +60,12 @@ public class CompactionsTest
 {
 Map compactionOptions = new HashMap<>();
 compactionOptions.put("tombstone_compaction_interval", "1");
-SchemaLoader.prepareServer();
 
 // Disable tombstone histogram rounding for tests
 System.setProperty("cassandra.streaminghistogram.roundseconds", "1");
 
+SchemaLoader.prepareServer();
+
 SchemaLoader.createKeyspace(KEYSPACE1,
 KeyspaceParams.simple(1),
 SchemaLoader.denseCFMD(KEYSPACE1, 
CF_DENSE1)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/adbe2cc4/test/unit/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyTest.java
--
diff --git 
a/test/unit/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyTest.java
 
b/test/unit/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyTest.java
index 8920d46..aa886b4 100644
--- 
a/test/unit/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyTest.java
+++ 
b/test/unit/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyTest.java
@@ -53,11 +53,11 @@ public class DateTieredCompactionStrategyTest extends 
SchemaLoader
 @BeforeClass
 public static void defineSchema() throws ConfigurationException
 {
-SchemaLoader.prepareServer();
-
 // Disable tombstone histogram rounding for tests
 System.setProperty("cassandra.streaminghistogram.roundseconds", "1");
 
+SchemaLoader.prepareServer();
+
 SchemaLoader.createKeyspace(KEYSPACE1,
 KeyspaceParams.simple(1),
 SchemaLoader.standardCFMD(KEYSPACE1, CF_STANDARD1));

http://git-wip-us.apache.org/repos/asf/cassandra/blob/adbe2cc4/test/unit/org/apache/cassandra/db/compaction/LeveledCompactionStrategyTest.java
--
diff --git 
a/test/unit/org/apache/cassandra/db/compaction/LeveledCompactionStrategyTest.java
 
b/test/unit/org/apache/cassandra/

[jira] [Resolved] (CASSANDRA-13038) 33% of compaction time spent in StreamingHistogram.update()

2017-03-02 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa resolved CASSANDRA-13038.

Resolution: Fixed

CI looks good (there's one failure, but you and I chatted on IRC, and it looks 
unrelated).

Committed to 3.0 as {{adbe2cc4df0134955a2c83ae4ebd0086ea5e9164}} and merged up 
through 3.11 and trunk.

Thanks again, and apologies for the test breakage.

> 33% of compaction time spent in StreamingHistogram.update()
> ---
>
> Key: CASSANDRA-13038
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13038
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Jeff Jirsa
> Fix For: 3.0.12, 3.11.0
>
> Attachments: compaction-speedup.patch, 
> compaction-streaminghistrogram.png, profiler-snapshot.nps
>
>
> With the following table, that contains a *lot* of cells: 
> {code}
> CREATE TABLE biggraphite.datapoints_11520p_60s (
> metric uuid,
> time_start_ms bigint,
> offset smallint,
> count int,
> value double,
> PRIMARY KEY ((metric, time_start_ms), offset)
> ) WITH CLUSTERING ORDER BY (offset DESC);
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
> 'compaction_window_size': '6', 'compaction_window_unit': 'HOURS', 
> 'max_threshold': '32', 'min_threshold': '6'}
> Keyspace : biggraphite
> Read Count: 1822
> Read Latency: 1.8870054884742042 ms.
> Write Count: 2212271647
> Write Latency: 0.027705127678653473 ms.
> Pending Flushes: 0
> Table: datapoints_11520p_60s
> SSTable count: 47
> Space used (live): 300417555945
> Space used (total): 303147395017
> Space used by snapshots (total): 0
> Off heap memory used (total): 207453042
> SSTable Compression Ratio: 0.4955200053039823
> Number of keys (estimate): 16343723
> Memtable cell count: 220576
> Memtable data size: 17115128
> Memtable off heap memory used: 0
> Memtable switch count: 2872
> Local read count: 0
> Local read latency: NaN ms
> Local write count: 1103167888
> Local write latency: 0.025 ms
> Pending flushes: 0
> Percent repaired: 0.0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 105118296
> Bloom filter off heap memory used: 106547192
> Index summary off heap memory used: 27730962
> Compression metadata off heap memory used: 73174888
> Compacted partition minimum bytes: 61
> Compacted partition maximum bytes: 51012
> Compacted partition mean bytes: 7899
> Average live cells per slice (last five minutes): NaN
> Maximum live cells per slice (last five minutes): 0
> Average tombstones per slice (last five minutes): NaN
> Maximum tombstones per slice (last five minutes): 0
> Dropped Mutations: 0
> {code}
> It looks like a good chunk of the compaction time is lost in 
> StreamingHistogram.update() (which is used to store the estimated tombstone 
> drop times).
> This could be caused by a huge number of different deletion times which would 
> makes the bin huge but it this histogram should be capped to 100 keys. It's 
> more likely caused by the huge number of cells.
> A simple solutions could be to only take into accounts part of the cells, the 
> fact the this table has a TWCS also gives us an additional hint that sampling 
> deletion times would be fine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[5/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11

2017-03-02 Thread jjirsa
Merge branch 'cassandra-3.0' into cassandra-3.11


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6f9610d4
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6f9610d4
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6f9610d4

Branch: refs/heads/cassandra-3.11
Commit: 6f9610d4796a1592bf68e1755134a873a5b847d2
Parents: 943fb02 adbe2cc
Author: Jeff Jirsa 
Authored: Thu Mar 2 09:43:26 2017 -0800
Committer: Jeff Jirsa 
Committed: Thu Mar 2 09:43:49 2017 -0800

--
 src/java/org/apache/cassandra/utils/StreamingHistogram.java  | 3 +--
 .../unit/org/apache/cassandra/db/compaction/CompactionsTest.java | 3 ++-
 .../db/compaction/DateTieredCompactionStrategyTest.java  | 4 ++--
 .../cassandra/db/compaction/LeveledCompactionStrategyTest.java   | 4 ++--
 .../db/compaction/SizeTieredCompactionStrategyTest.java  | 4 ++--
 test/unit/org/apache/cassandra/db/compaction/TTLExpiryTest.java  | 4 ++--
 .../db/compaction/TimeWindowCompactionStrategyTest.java  | 4 ++--
 7 files changed, 13 insertions(+), 13 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f9610d4/src/java/org/apache/cassandra/utils/StreamingHistogram.java
--
diff --cc src/java/org/apache/cassandra/utils/StreamingHistogram.java
index 6fde931,f1752a9..9114c7d
--- a/src/java/org/apache/cassandra/utils/StreamingHistogram.java
+++ b/src/java/org/apache/cassandra/utils/StreamingHistogram.java
@@@ -296,10 -271,9 +296,9 @@@ public class StreamingHistogra
  return false;
  
  StreamingHistogram that = (StreamingHistogram) o;
 -return maxBinSize == that.maxBinSize
 -   && spool.equals(that.spool)
 -   && bin.equals(that.bin);
 +return maxBinSize == that.maxBinSize &&
-maxSpoolSize == that.maxSpoolSize &&
 +   spool.equals(that.spool) &&
 +   bin.equals(that.bin);
  }
  
  @Override

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f9610d4/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f9610d4/test/unit/org/apache/cassandra/db/compaction/LeveledCompactionStrategyTest.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f9610d4/test/unit/org/apache/cassandra/db/compaction/TTLExpiryTest.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f9610d4/test/unit/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyTest.java
--



[3/6] cassandra git commit: Test fixes for CASSANDRA-13038

2017-03-02 Thread jjirsa
Test fixes for CASSANDRA-13038

Patch by Jeff Jirsa; Reviewed by Joel Knighton for CASSANDRA-13038


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/adbe2cc4
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/adbe2cc4
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/adbe2cc4

Branch: refs/heads/trunk
Commit: adbe2cc4df0134955a2c83ae4ebd0086ea5e9164
Parents: 496cfa8
Author: Jeff Jirsa 
Authored: Thu Mar 2 09:42:38 2017 -0800
Committer: Jeff Jirsa 
Committed: Thu Mar 2 09:42:38 2017 -0800

--
 src/java/org/apache/cassandra/utils/StreamingHistogram.java  | 4 +++-
 .../unit/org/apache/cassandra/db/compaction/CompactionsTest.java | 3 ++-
 .../db/compaction/DateTieredCompactionStrategyTest.java  | 4 ++--
 .../cassandra/db/compaction/LeveledCompactionStrategyTest.java   | 4 ++--
 .../db/compaction/SizeTieredCompactionStrategyTest.java  | 4 ++--
 test/unit/org/apache/cassandra/db/compaction/TTLExpiryTest.java  | 4 ++--
 .../db/compaction/TimeWindowCompactionStrategyTest.java  | 4 ++--
 7 files changed, 15 insertions(+), 12 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/adbe2cc4/src/java/org/apache/cassandra/utils/StreamingHistogram.java
--
diff --git a/src/java/org/apache/cassandra/utils/StreamingHistogram.java 
b/src/java/org/apache/cassandra/utils/StreamingHistogram.java
index fffa73e..f1752a9 100644
--- a/src/java/org/apache/cassandra/utils/StreamingHistogram.java
+++ b/src/java/org/apache/cassandra/utils/StreamingHistogram.java
@@ -271,7 +271,9 @@ public class StreamingHistogram
 return false;
 
 StreamingHistogram that = (StreamingHistogram) o;
-return maxBinSize == that.maxBinSize && bin.equals(that.bin);
+return maxBinSize == that.maxBinSize
+   && spool.equals(that.spool)
+   && bin.equals(that.bin);
 }
 
 @Override

http://git-wip-us.apache.org/repos/asf/cassandra/blob/adbe2cc4/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
--
diff --git a/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java 
b/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
index 2a30ae1..1530741 100644
--- a/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
+++ b/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
@@ -60,11 +60,12 @@ public class CompactionsTest
 {
 Map compactionOptions = new HashMap<>();
 compactionOptions.put("tombstone_compaction_interval", "1");
-SchemaLoader.prepareServer();
 
 // Disable tombstone histogram rounding for tests
 System.setProperty("cassandra.streaminghistogram.roundseconds", "1");
 
+SchemaLoader.prepareServer();
+
 SchemaLoader.createKeyspace(KEYSPACE1,
 KeyspaceParams.simple(1),
 SchemaLoader.denseCFMD(KEYSPACE1, 
CF_DENSE1)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/adbe2cc4/test/unit/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyTest.java
--
diff --git 
a/test/unit/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyTest.java
 
b/test/unit/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyTest.java
index 8920d46..aa886b4 100644
--- 
a/test/unit/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyTest.java
+++ 
b/test/unit/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyTest.java
@@ -53,11 +53,11 @@ public class DateTieredCompactionStrategyTest extends 
SchemaLoader
 @BeforeClass
 public static void defineSchema() throws ConfigurationException
 {
-SchemaLoader.prepareServer();
-
 // Disable tombstone histogram rounding for tests
 System.setProperty("cassandra.streaminghistogram.roundseconds", "1");
 
+SchemaLoader.prepareServer();
+
 SchemaLoader.createKeyspace(KEYSPACE1,
 KeyspaceParams.simple(1),
 SchemaLoader.standardCFMD(KEYSPACE1, CF_STANDARD1));

http://git-wip-us.apache.org/repos/asf/cassandra/blob/adbe2cc4/test/unit/org/apache/cassandra/db/compaction/LeveledCompactionStrategyTest.java
--
diff --git 
a/test/unit/org/apache/cassandra/db/compaction/LeveledCompactionStrategyTest.java
 
b/test/unit/org/apache/cassandra/db/compaction/LeveledCompactionStrategyTest.java
index fc88987..2cda2e8 100644
--- 
a/test/unit/org/apache/cassandra/db/compaction/LeveledCompactionStrategyTest.java
+++ 
b/test/unit/org/apache

[2/6] cassandra git commit: Test fixes for CASSANDRA-13038

2017-03-02 Thread jjirsa
Test fixes for CASSANDRA-13038

Patch by Jeff Jirsa; Reviewed by Joel Knighton for CASSANDRA-13038


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/adbe2cc4
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/adbe2cc4
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/adbe2cc4

Branch: refs/heads/cassandra-3.11
Commit: adbe2cc4df0134955a2c83ae4ebd0086ea5e9164
Parents: 496cfa8
Author: Jeff Jirsa 
Authored: Thu Mar 2 09:42:38 2017 -0800
Committer: Jeff Jirsa 
Committed: Thu Mar 2 09:42:38 2017 -0800

--
 src/java/org/apache/cassandra/utils/StreamingHistogram.java  | 4 +++-
 .../unit/org/apache/cassandra/db/compaction/CompactionsTest.java | 3 ++-
 .../db/compaction/DateTieredCompactionStrategyTest.java  | 4 ++--
 .../cassandra/db/compaction/LeveledCompactionStrategyTest.java   | 4 ++--
 .../db/compaction/SizeTieredCompactionStrategyTest.java  | 4 ++--
 test/unit/org/apache/cassandra/db/compaction/TTLExpiryTest.java  | 4 ++--
 .../db/compaction/TimeWindowCompactionStrategyTest.java  | 4 ++--
 7 files changed, 15 insertions(+), 12 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/adbe2cc4/src/java/org/apache/cassandra/utils/StreamingHistogram.java
--
diff --git a/src/java/org/apache/cassandra/utils/StreamingHistogram.java 
b/src/java/org/apache/cassandra/utils/StreamingHistogram.java
index fffa73e..f1752a9 100644
--- a/src/java/org/apache/cassandra/utils/StreamingHistogram.java
+++ b/src/java/org/apache/cassandra/utils/StreamingHistogram.java
@@ -271,7 +271,9 @@ public class StreamingHistogram
 return false;
 
 StreamingHistogram that = (StreamingHistogram) o;
-return maxBinSize == that.maxBinSize && bin.equals(that.bin);
+return maxBinSize == that.maxBinSize
+   && spool.equals(that.spool)
+   && bin.equals(that.bin);
 }
 
 @Override

http://git-wip-us.apache.org/repos/asf/cassandra/blob/adbe2cc4/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
--
diff --git a/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java 
b/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
index 2a30ae1..1530741 100644
--- a/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
+++ b/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
@@ -60,11 +60,12 @@ public class CompactionsTest
 {
 Map compactionOptions = new HashMap<>();
 compactionOptions.put("tombstone_compaction_interval", "1");
-SchemaLoader.prepareServer();
 
 // Disable tombstone histogram rounding for tests
 System.setProperty("cassandra.streaminghistogram.roundseconds", "1");
 
+SchemaLoader.prepareServer();
+
 SchemaLoader.createKeyspace(KEYSPACE1,
 KeyspaceParams.simple(1),
 SchemaLoader.denseCFMD(KEYSPACE1, 
CF_DENSE1)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/adbe2cc4/test/unit/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyTest.java
--
diff --git 
a/test/unit/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyTest.java
 
b/test/unit/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyTest.java
index 8920d46..aa886b4 100644
--- 
a/test/unit/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyTest.java
+++ 
b/test/unit/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyTest.java
@@ -53,11 +53,11 @@ public class DateTieredCompactionStrategyTest extends 
SchemaLoader
 @BeforeClass
 public static void defineSchema() throws ConfigurationException
 {
-SchemaLoader.prepareServer();
-
 // Disable tombstone histogram rounding for tests
 System.setProperty("cassandra.streaminghistogram.roundseconds", "1");
 
+SchemaLoader.prepareServer();
+
 SchemaLoader.createKeyspace(KEYSPACE1,
 KeyspaceParams.simple(1),
 SchemaLoader.standardCFMD(KEYSPACE1, CF_STANDARD1));

http://git-wip-us.apache.org/repos/asf/cassandra/blob/adbe2cc4/test/unit/org/apache/cassandra/db/compaction/LeveledCompactionStrategyTest.java
--
diff --git 
a/test/unit/org/apache/cassandra/db/compaction/LeveledCompactionStrategyTest.java
 
b/test/unit/org/apache/cassandra/db/compaction/LeveledCompactionStrategyTest.java
index fc88987..2cda2e8 100644
--- 
a/test/unit/org/apache/cassandra/db/compaction/LeveledCompactionStrategyTest.java
+++ 
b/test/unit/o

[4/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11

2017-03-02 Thread jjirsa
Merge branch 'cassandra-3.0' into cassandra-3.11


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6f9610d4
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6f9610d4
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6f9610d4

Branch: refs/heads/trunk
Commit: 6f9610d4796a1592bf68e1755134a873a5b847d2
Parents: 943fb02 adbe2cc
Author: Jeff Jirsa 
Authored: Thu Mar 2 09:43:26 2017 -0800
Committer: Jeff Jirsa 
Committed: Thu Mar 2 09:43:49 2017 -0800

--
 src/java/org/apache/cassandra/utils/StreamingHistogram.java  | 3 +--
 .../unit/org/apache/cassandra/db/compaction/CompactionsTest.java | 3 ++-
 .../db/compaction/DateTieredCompactionStrategyTest.java  | 4 ++--
 .../cassandra/db/compaction/LeveledCompactionStrategyTest.java   | 4 ++--
 .../db/compaction/SizeTieredCompactionStrategyTest.java  | 4 ++--
 test/unit/org/apache/cassandra/db/compaction/TTLExpiryTest.java  | 4 ++--
 .../db/compaction/TimeWindowCompactionStrategyTest.java  | 4 ++--
 7 files changed, 13 insertions(+), 13 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f9610d4/src/java/org/apache/cassandra/utils/StreamingHistogram.java
--
diff --cc src/java/org/apache/cassandra/utils/StreamingHistogram.java
index 6fde931,f1752a9..9114c7d
--- a/src/java/org/apache/cassandra/utils/StreamingHistogram.java
+++ b/src/java/org/apache/cassandra/utils/StreamingHistogram.java
@@@ -296,10 -271,9 +296,9 @@@ public class StreamingHistogra
  return false;
  
  StreamingHistogram that = (StreamingHistogram) o;
 -return maxBinSize == that.maxBinSize
 -   && spool.equals(that.spool)
 -   && bin.equals(that.bin);
 +return maxBinSize == that.maxBinSize &&
-maxSpoolSize == that.maxSpoolSize &&
 +   spool.equals(that.spool) &&
 +   bin.equals(that.bin);
  }
  
  @Override

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f9610d4/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f9610d4/test/unit/org/apache/cassandra/db/compaction/LeveledCompactionStrategyTest.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f9610d4/test/unit/org/apache/cassandra/db/compaction/TTLExpiryTest.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f9610d4/test/unit/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyTest.java
--



[6/6] cassandra git commit: Merge branch 'cassandra-3.11' into trunk

2017-03-02 Thread jjirsa
Merge branch 'cassandra-3.11' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a7c9fa0f
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a7c9fa0f
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a7c9fa0f

Branch: refs/heads/trunk
Commit: a7c9fa0f17518482c3ca691e542ed2c0f41af84a
Parents: 14c3edc 6f9610d
Author: Jeff Jirsa 
Authored: Thu Mar 2 09:43:57 2017 -0800
Committer: Jeff Jirsa 
Committed: Thu Mar 2 09:44:23 2017 -0800

--
 src/java/org/apache/cassandra/utils/StreamingHistogram.java  | 3 +--
 .../unit/org/apache/cassandra/db/compaction/CompactionsTest.java | 3 ++-
 .../db/compaction/DateTieredCompactionStrategyTest.java  | 4 ++--
 .../cassandra/db/compaction/LeveledCompactionStrategyTest.java   | 4 ++--
 .../db/compaction/SizeTieredCompactionStrategyTest.java  | 4 ++--
 test/unit/org/apache/cassandra/db/compaction/TTLExpiryTest.java  | 4 ++--
 .../db/compaction/TimeWindowCompactionStrategyTest.java  | 4 ++--
 7 files changed, 13 insertions(+), 13 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/a7c9fa0f/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a7c9fa0f/test/unit/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyTest.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a7c9fa0f/test/unit/org/apache/cassandra/db/compaction/LeveledCompactionStrategyTest.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a7c9fa0f/test/unit/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyTest.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a7c9fa0f/test/unit/org/apache/cassandra/db/compaction/TTLExpiryTest.java
--
diff --cc test/unit/org/apache/cassandra/db/compaction/TTLExpiryTest.java
index 615b206,e340ee3..0a970e1
--- a/test/unit/org/apache/cassandra/db/compaction/TTLExpiryTest.java
+++ b/test/unit/org/apache/cassandra/db/compaction/TTLExpiryTest.java
@@@ -60,21 -59,21 +60,21 @@@ public class TTLExpiryTes
  // Disable tombstone histogram rounding for tests
  System.setProperty("cassandra.streaminghistogram.roundseconds", "1");
  
+ SchemaLoader.prepareServer();
+ 
  SchemaLoader.createKeyspace(KEYSPACE1,
  KeyspaceParams.simple(1),
 -CFMetaData.Builder.create(KEYSPACE1, 
CF_STANDARD1)
 -  
.addPartitionKey("pKey", AsciiType.instance)
 -  
.addRegularColumn("col1", AsciiType.instance)
 -  
.addRegularColumn("col", AsciiType.instance)
 -  
.addRegularColumn("col311", AsciiType.instance)
 -  
.addRegularColumn("col2", AsciiType.instance)
 -  
.addRegularColumn("col3", AsciiType.instance)
 -  
.addRegularColumn("col7", AsciiType.instance)
 -  
.addRegularColumn("col8", MapType.getInstance(AsciiType.instance, 
AsciiType.instance, true))
 -  
.addRegularColumn("shadow", AsciiType.instance)
 -  
.build().gcGraceSeconds(0));
 +TableMetadata.builder(KEYSPACE1, 
CF_STANDARD1)
 + 
.addPartitionKeyColumn("pKey", AsciiType.instance)
 + .addRegularColumn("col1", 
AsciiType.instance)
 + .addRegularColumn("col", 
AsciiType.instance)
 + .addRegularColumn("col311", 
AsciiType.instance)
 + .addRegularColumn("col2", 
AsciiType.instance)
 + .addRegularColumn("col3", 
AsciiType.instance)
 + .addRegularColumn("col7", 
AsciiType.instance)
 + .addRegularColumn("col8", 
MapType.getInstance(AsciiType.instance, AsciiType.instance, true))
 +   

[jira] [Commented] (CASSANDRA-13271) Reduce lock contention on instance factories of ListType and SetType

2017-03-02 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892669#comment-15892669
 ] 

Jon Haddad commented on CASSANDRA-13271:


Are there any performance tests that show the effect of this patch?  

> Reduce lock contention on instance factories of ListType and SetType
> 
>
> Key: CASSANDRA-13271
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13271
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: vincent royer
>Priority: Minor
>  Labels: performance
> Fix For: 4.x
>
> Attachments: 0001-CASSANDRA-13271-computeIfAbsent.patch, 
> 0001-CASSANDRA-13271-singleton-factory-concurrency-opimiz.patch
>
>
> By doing some performance tests, i noticed that getInstance() in 
> org.apache.cassandra.db.marshal.ListType and SetType could suffer from lock 
> contention on the singleton factory getInstance(). Here is a proposal to 
> reduce lock contention by using a ConcurrentMap and the putIfAbsent method 
> rather than a synchronized method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (CASSANDRA-13281) testall failure in org.apache.cassandra.io.sstable.metadata.MetadataSerializerTest.testSerialization

2017-03-02 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton resolved CASSANDRA-13281.
---
Resolution: Duplicate

Confirmed this is a failure due to a small update needed to the test after 
[CASSANDRA-13038]. Reopened and being fixed there.

> testall failure in 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializerTest.testSerialization
> 
>
> Key: CASSANDRA-13281
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13281
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Sean McCarthy
>Assignee: Joel Knighton
>  Labels: test-failure, testall
> Attachments: 
> TEST-org.apache.cassandra.io.sstable.metadata.MetadataSerializerTest.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.11_testall/96/testReport/org.apache.cassandra.io.sstable.metadata/MetadataSerializerTest/testSerialization
> {code}
> Error Message
> expected: 
> but was:
> {code}{code}
> Stacktrace
> junit.framework.AssertionFailedError: 
> expected: 
> but was:
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializerTest.testSerialization(MetadataSerializerTest.java:72)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CASSANDRA-13281) testall failure in org.apache.cassandra.io.sstable.metadata.MetadataSerializerTest.testSerialization

2017-03-02 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton reassigned CASSANDRA-13281:
-

Assignee: Joel Knighton

> testall failure in 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializerTest.testSerialization
> 
>
> Key: CASSANDRA-13281
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13281
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Sean McCarthy
>Assignee: Joel Knighton
>  Labels: test-failure, testall
> Attachments: 
> TEST-org.apache.cassandra.io.sstable.metadata.MetadataSerializerTest.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.11_testall/96/testReport/org.apache.cassandra.io.sstable.metadata/MetadataSerializerTest/testSerialization
> {code}
> Error Message
> expected: 
> but was:
> {code}{code}
> Stacktrace
> junit.framework.AssertionFailedError: 
> expected: 
> but was:
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializerTest.testSerialization(MetadataSerializerTest.java:72)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread

2017-03-02 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892595#comment-15892595
 ] 

Ariel Weisberg commented on CASSANDRA-13265:


Sorry one other nit. Instead of retrieving System.nanoTime() twice when 
enqueuing a message can you retrieve it once in {{enqueue}} and then pass it as 
a parameter to {{QueuedMessage}}? 

I think it's actually OK if it's slightly old because expiration runs for a 
while. In that scenario we want to be timing out messages sooner not later.

> Epxiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-11471) Add SASL mechanism negotiation to the native protocol

2017-03-02 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892559#comment-15892559
 ] 

Ariel Weisberg commented on CASSANDRA-11471:


* Please add a unit test for {{CommonNameCertificateAuthenticator}}
* [Only if encryption is 
optional?|https://github.com/apache/cassandra/compare/trunk...benbromhead:11471#diff-ab99d4b775ea620e439db41d18353541R123]
 Basically because the authenticator can only work if the certificates are 
there? It seems like this can NPE?
* It seems like this adds the capability to allow you to select a mechanism, 
but the authenticators don't actually select multiple mechanisms only one is 
supported?
* {{NegotiatingSaslNegotiator.setupOnCompletedNegotiation()}} appears to have 
no implementations?
* [Debug is on all the time so this may be a bit 
much|https://github.com/apache/cassandra/compare/trunk...benbromhead:11471#diff-ef1e335e8d51911f09bcc735b0632c5cR218]
* [Same debug 
issue|https://github.com/apache/cassandra/compare/trunk...benbromhead:11471#diff-5c1697b2ca600e9e034b27ac03b0129dR43]
* [Same debug 
issue|https://github.com/apache/cassandra/compare/trunk...benbromhead:11471#diff-ef1e335e8d51911f09bcc735b0632c5cR205]
* [Extra 
line|https://github.com/apache/cassandra/compare/trunk...benbromhead:11471#diff-f4a806982d4fb565a8ceb5476cfb5978R84]
* [Extra 
line|https://github.com/apache/cassandra/compare/trunk...benbromhead:11471#diff-f4a806982d4fb565a8ceb5476cfb5978R71]

For the driver and protocol change you should create a subtask here 
https://datastax-oss.atlassian.net/browse/JAVA-1361

> Add SASL mechanism negotiation to the native protocol
> -
>
> Key: CASSANDRA-11471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11471
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: CQL
>Reporter: Sam Tunnicliffe
>Assignee: Ben Bromhead
>  Labels: client-impacting
> Attachments: CASSANDRA-11471
>
>
> Introducing an additional message exchange into the authentication sequence 
> would allow us to support multiple authentication schemes and [negotiation of 
> SASL mechanisms|https://tools.ietf.org/html/rfc4422#section-3.2]. 
> The current {{AUTHENTICATE}} message sent from Client to Server includes the 
> java classname of the configured {{IAuthenticator}}. This could be superceded 
> by a new message which lists the SASL mechanisms supported by the server. The 
> client would then respond with a new message which indicates it's choice of 
> mechanism.  This would allow the server to support multiple mechanisms, for 
> example enabling both {{PLAIN}} for username/password authentication and 
> {{EXTERNAL}} for a mechanism for extracting credentials from SSL 
> certificates\* (see the example in 
> [RFC-4422|https://tools.ietf.org/html/rfc4422#appendix-A]). Furthermore, the 
> server could tailor the list of supported mechanisms on a per-connection 
> basis, e.g. only offering certificate based auth to encrypted clients. 
> The client's response should include the selected mechanism and any initial 
> response data. This is mechanism-specific; the {{PLAIN}} mechanism consists 
> of a single round in which the client sends encoded credentials as the 
> initial response data and the server response indicates either success or 
> failure with no futher challenges required.
> From a protocol perspective, after the mechanism negotiation the exchange 
> would continue as in protocol v4, with one or more rounds of 
> {{AUTH_CHALLENGE}} and {{AUTH_RESPONSE}} messages, terminated by an 
> {{AUTH_SUCCESS}} sent from Server to Client upon successful authentication or 
> an {{ERROR}} on auth failure. 
> XMPP performs mechanism negotiation in this way, 
> [RFC-3920|http://tools.ietf.org/html/rfc3920#section-6] includes a good 
> overview.
> \* Note: this would require some a priori agreement between client and server 
> over the implementation of the {{EXTERNAL}} mechanism.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13223) Unable to compute when histogram overflowed

2017-03-02 Thread Vladimir Bukhtoyarov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892556#comment-15892556
 ] 

Vladimir Bukhtoyarov commented on CASSANDRA-13223:
--

{quote}
But if we just ignore the overflow-counter, that means we would possibly loose 
a lot information.
{quote}
I do not ignore big latency record, I write the maximum value of resolution, so 
at least this anomaly can be observed on monitoring screens. But, you raised a 
good point, I think I must to add an error message to log(including 
stacktrace), because histogram for which overflown can not be monitored by end 
user. Log message should be with stacktrace because just logging fact that 
histogram overflown is useless because histogram has no an ID, so without 
stacktrace it would be imposiible to determine which concrete histogram is 
overflown. Are you agree?

{quote}
I don't think we want to drop the testDecayingMean test case and the 
corresponding lines in the EstimatedHistogramReservoirSnapshot ctor. See 
CASSANDRA-12876 for details
{quote}
Ok, I will restore this test tomorrow.




> Unable to compute when histogram overflowed
> ---
>
> Key: CASSANDRA-13223
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13223
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Vladimir Bukhtoyarov
>Priority: Minor
>
> DecayingEstimatedHistogramReservoir throws exception when value upper max 
> recorded to reservoir. It is very undesired behavior, because functionality 
> like logging or monitoring should never fail with exception. Current behavior 
> of DecayingEstimatedHistogramReservoir violates contract for 
> [Reservoir|https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/Reservoir.java],
>  as you can see javadocs for Reservoir says nothing that implementation can 
> throw exception in getSnapshot method. As result all Dropwizzard/Metrics 
> reporters are broken, because nobody expect that metric will throw exception 
> on get, for example our monitoring pipeline is broken with exception:
> {noformat}
> com.fasterxml.jackson.databind.JsonMappingException: Unable to compute when 
> histogram overflowed (through reference chain: 
> java.util.UnmodifiableSortedMap["org.apache.cassandra.metrics.Table
> .ColUpdateTimeDeltaHistogram.all"])
> at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:339)
> at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:299)
> at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:342)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serializeFields(MapSerializer.java:620)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serialize(MapSerializer.java:519)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serialize(MapSerializer.java:31)
> at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
> at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:2436)
> at 
> com.fasterxml.jackson.core.base.GeneratorBase.writeObject(GeneratorBase.java:355)
> at 
> com.fasterxml.jackson.core.JsonGenerator.writeObjectField(JsonGenerator.java:1442)
> at 
> com.codahale.metrics.json.MetricsModule$MetricRegistrySerializer.serialize(MetricsModule.java:188)
> at 
> com.codahale.metrics.json.MetricsModule$MetricRegistrySerializer.serialize(MetricsModule.java:171)
> at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
> at 
> com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1428)
> at 
> com.fasterxml.jackson.databind.ObjectWriter._configAndWriteValue(ObjectWriter.java:1129)
> at 
> com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:967)
> at 
> com.codahale.metrics.servlets.MetricsServlet.doGet(MetricsServlet.java:176)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
> at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689)
> at 
> com.ringcentral.slf4j.CleanMDCFilter.doFilter(CleanMDCFilter.java:18)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
> at 
> org.eclipse.je

[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-03-02 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892486#comment-15892486
 ] 

Stefan Podkowinski commented on CASSANDRA-13153:


I've changed my branches to the bare minimum of what needs to be done for 
filtering already repaired sstables and re-run tests. See above for links.

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>Assignee: Stefan Podkowinski
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread

2017-03-02 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892378#comment-15892378
 ] 

Ariel Weisberg edited comment on CASSANDRA-13265 at 3/2/17 4:10 PM:


A nit that has been pointed out to me for System.nanoTime(). It can wrap so you 
should use {{now - lastExpirationTime > interval}}.


was (Author: aweisberg):
A nit that has been pointed out to be for System.nanoTime(). It can wrap so you 
should use {{now - lastExpirationTime > interval}}.

> Epxiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread

2017-03-02 Thread Christian Esken (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892471#comment-15892471
 ] 

Christian Esken commented on CASSANDRA-13265:
-

Change to System.nanoTime() is done. I kept the logging, but stripped it down 
and guarded it with a {{isTraceEnabled()}}

> Epxiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13223) Unable to compute when histogram overflowed

2017-03-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892381#comment-15892381
 ] 

Per Otterström commented on CASSANDRA-13223:


I agree that it is not desirable to throw exception in the metrics reporting. 
Still, this was caused by that specific counter and should be fixed in 3.10.

Your proposal would fix the exception. But if we just ignore the 
overflow-counter, that means we would possibly loose a lot information. Rather 
then skipping it, which in practice means that we adjust value down, would it 
make sense to adjust the calculated value up? A lot?

I had a look at your patch, I don't think we want to drop the testDecayingMean 
test case and the corresponding lines in the 
EstimatedHistogramReservoirSnapshot ctor. See CASSANDRA-12876 for details

> Unable to compute when histogram overflowed
> ---
>
> Key: CASSANDRA-13223
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13223
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Vladimir Bukhtoyarov
>Priority: Minor
>
> DecayingEstimatedHistogramReservoir throws exception when value upper max 
> recorded to reservoir. It is very undesired behavior, because functionality 
> like logging or monitoring should never fail with exception. Current behavior 
> of DecayingEstimatedHistogramReservoir violates contract for 
> [Reservoir|https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/Reservoir.java],
>  as you can see javadocs for Reservoir says nothing that implementation can 
> throw exception in getSnapshot method. As result all Dropwizzard/Metrics 
> reporters are broken, because nobody expect that metric will throw exception 
> on get, for example our monitoring pipeline is broken with exception:
> {noformat}
> com.fasterxml.jackson.databind.JsonMappingException: Unable to compute when 
> histogram overflowed (through reference chain: 
> java.util.UnmodifiableSortedMap["org.apache.cassandra.metrics.Table
> .ColUpdateTimeDeltaHistogram.all"])
> at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:339)
> at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:299)
> at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:342)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serializeFields(MapSerializer.java:620)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serialize(MapSerializer.java:519)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serialize(MapSerializer.java:31)
> at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
> at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:2436)
> at 
> com.fasterxml.jackson.core.base.GeneratorBase.writeObject(GeneratorBase.java:355)
> at 
> com.fasterxml.jackson.core.JsonGenerator.writeObjectField(JsonGenerator.java:1442)
> at 
> com.codahale.metrics.json.MetricsModule$MetricRegistrySerializer.serialize(MetricsModule.java:188)
> at 
> com.codahale.metrics.json.MetricsModule$MetricRegistrySerializer.serialize(MetricsModule.java:171)
> at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
> at 
> com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1428)
> at 
> com.fasterxml.jackson.databind.ObjectWriter._configAndWriteValue(ObjectWriter.java:1129)
> at 
> com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:967)
> at 
> com.codahale.metrics.servlets.MetricsServlet.doGet(MetricsServlet.java:176)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
> at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689)
> at 
> com.ringcentral.slf4j.CleanMDCFilter.doFilter(CleanMDCFilter.java:18)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
> at 
> org.eclipse.

[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread

2017-03-02 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892378#comment-15892378
 ] 

Ariel Weisberg commented on CASSANDRA-13265:


A nit that has been pointed out to be for System.nanoTime(). It can wrap so you 
should use {{now - lastExpirationTime > interval}}.

> Epxiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread

2017-03-02 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892360#comment-15892360
 ] 

Jason Brown commented on CASSANDRA-13265:
-

bq. But the Verbs have different timeouts, the defaults ranging from 2 to 60 
seconds

Oh, dang! Nice find. Yeah, I guess we should traverse the {{backlog}} in that 
case.

bq. In real life it won't make a terrific difference even with the worst clocks

Clocks go wrong all the time at scale, so timestamping must be done as 
correctly as possible. {{System.nanoTime()}} does indeed call 
{{clock_gettime(CLOCK_MONOTONIC, ...)}}

> Epxiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13288) testall failure in org.apache.cassandra.io.sstable.SSTableRewriterTest.testSSTableSplit

2017-03-02 Thread Sean McCarthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean McCarthy updated CASSANDRA-13288:
--
Description: 
example failure:

http://cassci.datastax.com/job/cassandra-2.2_testall/653/testReport/org.apache.cassandra.io.sstable/SSTableRewriterTest/testSSTableSplit

{code}
Standard Output

ERROR 23:01:28 LEAK DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@554614db) to class 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1812160921:/home/automaton/cassandra/build/test/cassandra/data:187/SSTableRewriterTest/Standard1-fce608a0fed211e69ae3d1f5bcb99423/lb-6-big
 was not released before the reference was garbage collected
ERROR 23:01:28 LEAK DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@554614db) to class 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1812160921:/home/automaton/cassandra/build/test/cassandra/data:187/SSTableRewriterTest/Standard1-fce608a0fed211e69ae3d1f5bcb99423/lb-6-big
 was not released before the reference was garbage collected
ERROR 23:01:28 Allocate trace 
org.apache.cassandra.utils.concurrent.Ref$State@554614db:
Thread[main,5,main]
at java.lang.Thread.getStackTrace(Thread.java:1589)
at org.apache.cassandra.utils.concurrent.Ref$Debug.(Ref.java:228)
at org.apache.cassandra.utils.concurrent.Ref$State.(Ref.java:158)
at org.apache.cassandra.utils.concurrent.Ref.(Ref.java:80)
at 
org.apache.cassandra.io.sstable.format.SSTableReader.(SSTableReader.java:216)
at 
org.apache.cassandra.io.sstable.format.big.BigTableReader.(BigTableReader.java:60)
at 
org.apache.cassandra.io.sstable.format.big.BigFormat$ReaderFactory.open(BigFormat.java:116)
at 
org.apache.cassandra.io.sstable.format.SSTableReader.internalOpen(SSTableReader.java:587)
at 
org.apache.cassandra.io.sstable.format.SSTableReader.internalOpen(SSTableReader.java:565)
at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter.openFinal(BigTableWriter.java:346)
at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter.access$800(BigTableWriter.java:56)
at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter$TransactionalProxy.doPrepare(BigTableWriter.java:385)
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:169)
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish(Transactional.java:179)
at 
org.apache.cassandra.io.sstable.format.SSTableWriter.finish(SSTableWriter.java:205)
at 
org.apache.cassandra.io.sstable.SSTableRewriterTest.writeFiles(SSTableRewriterTest.java:969)
at 
org.apache.cassandra.io.sstable.SSTableRewriterTest.writeFile(SSTableRewriterTest.java:948)
at 
org.apache.cassandra.io.sstable.SSTableRewriterTest.testSSTableSplit(SSTableRewriterTest.java:618)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033)

ERROR 23:01:28 Allocate tr

[jira] [Created] (CASSANDRA-13288) testall failure in org.apache.cassandra.io.sstable.SSTableRewriterTest.testSSTableSplit

2017-03-02 Thread Sean McCarthy (JIRA)
Sean McCarthy created CASSANDRA-13288:
-

 Summary: testall failure in 
org.apache.cassandra.io.sstable.SSTableRewriterTest.testSSTableSplit
 Key: CASSANDRA-13288
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13288
 Project: Cassandra
  Issue Type: Bug
Reporter: Sean McCarthy


example failure:

http://cassci.datastax.com/job/cassandra-2.2_testall/653/testReport/org.apache.cassandra.io.sstable/SSTableRewriterTest/testSSTableSplit

{code}
Standard Output

ERROR 23:01:28 LEAK DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@554614db) to class 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1812160921:/home/automaton/cassandra/build/test/cassandra/data:187/SSTableRewriterTest/Standard1-fce608a0fed211e69ae3d1f5bcb99423/lb-6-big
 was not released before the reference was garbage collected
ERROR 23:01:28 LEAK DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@554614db) to class 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1812160921:/home/automaton/cassandra/build/test/cassandra/data:187/SSTableRewriterTest/Standard1-fce608a0fed211e69ae3d1f5bcb99423/lb-6-big
 was not released before the reference was garbage collected
ERROR 23:01:28 Allocate trace 
org.apache.cassandra.utils.concurrent.Ref$State@554614db:
Thread[main,5,main]
at java.lang.Thread.getStackTrace(Thread.java:1589)
at org.apache.cassandra.utils.concurrent.Ref$Debug.(Ref.java:228)
at org.apache.cassandra.utils.concurrent.Ref$State.(Ref.java:158)
at org.apache.cassandra.utils.concurrent.Ref.(Ref.java:80)
at 
org.apache.cassandra.io.sstable.format.SSTableReader.(SSTableReader.java:216)
at 
org.apache.cassandra.io.sstable.format.big.BigTableReader.(BigTableReader.java:60)
at 
org.apache.cassandra.io.sstable.format.big.BigFormat$ReaderFactory.open(BigFormat.java:116)
at 
org.apache.cassandra.io.sstable.format.SSTableReader.internalOpen(SSTableReader.java:587)
at 
org.apache.cassandra.io.sstable.format.SSTableReader.internalOpen(SSTableReader.java:565)
at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter.openFinal(BigTableWriter.java:346)
at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter.access$800(BigTableWriter.java:56)
at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter$TransactionalProxy.doPrepare(BigTableWriter.java:385)
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:169)
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish(Transactional.java:179)
at 
org.apache.cassandra.io.sstable.format.SSTableWriter.finish(SSTableWriter.java:205)
at 
org.apache.cassandra.io.sstable.SSTableRewriterTest.writeFiles(SSTableRewriterTest.java:969)
at 
org.apache.cassandra.io.sstable.SSTableRewriterTest.writeFile(SSTableRewriterTest.java:948)
at 
org.apache.cassandra.io.sstable.SSTableRewriterTest.testSSTableSplit(SSTableRewriterTest.java:618)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTes

[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread

2017-03-02 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892350#comment-15892350
 ] 

Ariel Weisberg commented on CASSANDRA-13265:


bq. Your argument "reasonably in ascending timestamp order" makes sense, if all 
entries would have the same expiration time. But the Verbs have different 
timeouts, the defaults ranging from 2 to 60 seconds. Thus iterating the whole 
Queue should be done, as in the worst case we will remove nothing even though 
most entries are timed out.
Oh you are right. That's a pretty serious bug in and of itself. That sucks!

If you want to leave the trace code in it's not the end of the world just use 
the trace functionality in the logger? You can grab whether trace is enabled 
once inside the expire method. Do reduce it to a single statement that prints 
the timing after expiration finishes.

> Epxiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13288) testall failure in org.apache.cassandra.io.sstable.SSTableRewriterTest.testSSTableSplit

2017-03-02 Thread Sean McCarthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean McCarthy updated CASSANDRA-13288:
--
Component/s: Testing

> testall failure in 
> org.apache.cassandra.io.sstable.SSTableRewriterTest.testSSTableSplit
> ---
>
> Key: CASSANDRA-13288
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13288
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Sean McCarthy
>  Labels: test-failure, testall
>
> example failure:
> http://cassci.datastax.com/job/cassandra-2.2_testall/653/testReport/org.apache.cassandra.io.sstable/SSTableRewriterTest/testSSTableSplit
> {code}
> Standard Output
> ERROR 23:01:28 LEAK DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@554614db) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1812160921:/home/automaton/cassandra/build/test/cassandra/data:187/SSTableRewriterTest/Standard1-fce608a0fed211e69ae3d1f5bcb99423/lb-6-big
>  was not released before the reference was garbage collected
> ERROR 23:01:28 LEAK DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@554614db) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1812160921:/home/automaton/cassandra/build/test/cassandra/data:187/SSTableRewriterTest/Standard1-fce608a0fed211e69ae3d1f5bcb99423/lb-6-big
>  was not released before the reference was garbage collected
> ERROR 23:01:28 Allocate trace 
> org.apache.cassandra.utils.concurrent.Ref$State@554614db:
> Thread[main,5,main]
>   at java.lang.Thread.getStackTrace(Thread.java:1589)
>   at org.apache.cassandra.utils.concurrent.Ref$Debug.(Ref.java:228)
>   at org.apache.cassandra.utils.concurrent.Ref$State.(Ref.java:158)
>   at org.apache.cassandra.utils.concurrent.Ref.(Ref.java:80)
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.(SSTableReader.java:216)
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableReader.(BigTableReader.java:60)
>   at 
> org.apache.cassandra.io.sstable.format.big.BigFormat$ReaderFactory.open(BigFormat.java:116)
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.internalOpen(SSTableReader.java:587)
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.internalOpen(SSTableReader.java:565)
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.openFinal(BigTableWriter.java:346)
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.access$800(BigTableWriter.java:56)
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter$TransactionalProxy.doPrepare(BigTableWriter.java:385)
>   at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:169)
>   at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish(Transactional.java:179)
>   at 
> org.apache.cassandra.io.sstable.format.SSTableWriter.finish(SSTableWriter.java:205)
>   at 
> org.apache.cassandra.io.sstable.SSTableRewriterTest.writeFiles(SSTableRewriterTest.java:969)
>   at 
> org.apache.cassandra.io.sstable.SSTableRewriterTest.writeFile(SSTableRewriterTest.java:948)
>   at 
> org.apache.cassandra.io.sstable.SSTableRewriterTest.testSSTableSplit(SSTableRewriterTest.java:618)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
>   at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
>   at 
> 

[jira] [Comment Edited] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread

2017-03-02 Thread Christian Esken (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892332#comment-15892332
 ] 

Christian Esken edited comment on CASSANDRA-13265 at 3/2/17 2:36 PM:
-

bq. use System.nanoTime() instead of System.currentTimeMillis().
Agreed, {{System.nanoTime()}} is slightly better here. In real life it won't 
make a terrific difference even with the worst clocks, but "_lets do things 
right"_. :-) . I never looked up the native code for nanoTIme(), but I bet on 
Unix it uses the POSIX {{clock_gettime(CLOCK_MONOTONIC, ...)}}.

bq. I don't think we want to traverse the entire backlog. [...]
Your argument  "reasonably in ascending timestamp order" makes sense, if all 
entries would have the same expiration time. But the Verbs have different 
timeouts, the defaults ranging from 2 to 60 seconds. Thus iterating the whole 
Queue should be done, as in the worst case we will remove nothing even though 
most entries are timed out.



was (Author: cesken):
bq. use System.nanoTime() instead of System.currentTimeMillis().
Agreed, {{System.nanoTime()}} is slightly better here. In real life it won't 
make a terrific difference even with the worst clocks, but "_lets do things 
right"_. :-) . I never looked up the native code for nanoTIme(), but I bet on 
Unix it uses the POSIX {{clock_gettime(CLOCK_MONOTONIC, ...)}}.

bq. I don't think we want to traverse the entire backlog. [...]
Your argument  "reasonably in ascending timestamp order" makes sense, if all 
entries would have the same expiration time. But the Verbs have different 
timeouts. Thus iterating the whole Queue should be done, as in the worst case 
we will remove nothing even though most entries are timed out.


> Epxiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-03-02 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892334#comment-15892334
 ] 

Ariel Weisberg commented on CASSANDRA-13241:


I can do it eventually. My spare time is spent reviewing #11471 right now.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13223) Unable to compute when histogram overflowed

2017-03-02 Thread Vladimir Bukhtoyarov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892326#comment-15892326
 ] 

Vladimir Bukhtoyarov edited comment on CASSANDRA-13223 at 3/2/17 2:31 PM:
--

[~eperott] very rarely reproduced in 3.9, I am unable to catch which concrete 
histogram is cause of error. The CASSANDRA-7 is not generic fix for 
overflow problem, it is just patch for specific case. My solution is solid and 
solves problem at all, because instead of fixing histogram-writers in multiple 
places, I fixed the histogram itself and [this sort of 
problems|https://issues.apache.org/jira/browse/CASSANDRA-8028?jql=text%20~%20%22Unable%20to%20compute%20when%20histogram%20overflowed%22]
  will never happen in future.


was (Author: vladimir.bukhtoyarov):
[~eperott] very rarely reproduced 3.9. I am unable to catch which concrete 
histogram is cause of error. The CASSANDRA-7 is not generic fix for 
overflow problem, it is just patch for specific case. My solution is solid and 
solves problem at all, because instead of fixing histogram-writers in multiple 
places, I fixed the histogram itself and [this sort of 
problems|https://issues.apache.org/jira/browse/CASSANDRA-8028?jql=text%20~%20%22Unable%20to%20compute%20when%20histogram%20overflowed%22]
  will never happen in future.

> Unable to compute when histogram overflowed
> ---
>
> Key: CASSANDRA-13223
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13223
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Vladimir Bukhtoyarov
>Priority: Minor
>
> DecayingEstimatedHistogramReservoir throws exception when value upper max 
> recorded to reservoir. It is very undesired behavior, because functionality 
> like logging or monitoring should never fail with exception. Current behavior 
> of DecayingEstimatedHistogramReservoir violates contract for 
> [Reservoir|https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/Reservoir.java],
>  as you can see javadocs for Reservoir says nothing that implementation can 
> throw exception in getSnapshot method. As result all Dropwizzard/Metrics 
> reporters are broken, because nobody expect that metric will throw exception 
> on get, for example our monitoring pipeline is broken with exception:
> {noformat}
> com.fasterxml.jackson.databind.JsonMappingException: Unable to compute when 
> histogram overflowed (through reference chain: 
> java.util.UnmodifiableSortedMap["org.apache.cassandra.metrics.Table
> .ColUpdateTimeDeltaHistogram.all"])
> at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:339)
> at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:299)
> at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:342)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serializeFields(MapSerializer.java:620)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serialize(MapSerializer.java:519)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serialize(MapSerializer.java:31)
> at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
> at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:2436)
> at 
> com.fasterxml.jackson.core.base.GeneratorBase.writeObject(GeneratorBase.java:355)
> at 
> com.fasterxml.jackson.core.JsonGenerator.writeObjectField(JsonGenerator.java:1442)
> at 
> com.codahale.metrics.json.MetricsModule$MetricRegistrySerializer.serialize(MetricsModule.java:188)
> at 
> com.codahale.metrics.json.MetricsModule$MetricRegistrySerializer.serialize(MetricsModule.java:171)
> at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
> at 
> com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1428)
> at 
> com.fasterxml.jackson.databind.ObjectWriter._configAndWriteValue(ObjectWriter.java:1129)
> at 
> com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:967)
> at 
> com.codahale.metrics.servlets.MetricsServlet.doGet(MetricsServlet.java:176)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
> at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689)
> at 
> com.ringcentral.slf4j.CleanMDCFilter.doFilter(Cl

[jira] [Commented] (CASSANDRA-13223) Unable to compute when histogram overflowed

2017-03-02 Thread Vladimir Bukhtoyarov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892326#comment-15892326
 ] 

Vladimir Bukhtoyarov commented on CASSANDRA-13223:
--

[~eperott] very rarely reproduced 3.9. I am unable to catch which concrete 
histogram is cause of error. The CASSANDRA-7 is not generic fix for 
overflow problem, it is just patch for specific case. My solution is solid and 
solves problem at all, because instead of fixing histogram-writers in multiple 
places, I fixed the histogram itself and [this sort of 
problems|https://issues.apache.org/jira/browse/CASSANDRA-8028?jql=text%20~%20%22Unable%20to%20compute%20when%20histogram%20overflowed%22]
  will never happen in future.

> Unable to compute when histogram overflowed
> ---
>
> Key: CASSANDRA-13223
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13223
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Vladimir Bukhtoyarov
>Priority: Minor
>
> DecayingEstimatedHistogramReservoir throws exception when value upper max 
> recorded to reservoir. It is very undesired behavior, because functionality 
> like logging or monitoring should never fail with exception. Current behavior 
> of DecayingEstimatedHistogramReservoir violates contract for 
> [Reservoir|https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/Reservoir.java],
>  as you can see javadocs for Reservoir says nothing that implementation can 
> throw exception in getSnapshot method. As result all Dropwizzard/Metrics 
> reporters are broken, because nobody expect that metric will throw exception 
> on get, for example our monitoring pipeline is broken with exception:
> {noformat}
> com.fasterxml.jackson.databind.JsonMappingException: Unable to compute when 
> histogram overflowed (through reference chain: 
> java.util.UnmodifiableSortedMap["org.apache.cassandra.metrics.Table
> .ColUpdateTimeDeltaHistogram.all"])
> at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:339)
> at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:299)
> at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:342)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serializeFields(MapSerializer.java:620)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serialize(MapSerializer.java:519)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serialize(MapSerializer.java:31)
> at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
> at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:2436)
> at 
> com.fasterxml.jackson.core.base.GeneratorBase.writeObject(GeneratorBase.java:355)
> at 
> com.fasterxml.jackson.core.JsonGenerator.writeObjectField(JsonGenerator.java:1442)
> at 
> com.codahale.metrics.json.MetricsModule$MetricRegistrySerializer.serialize(MetricsModule.java:188)
> at 
> com.codahale.metrics.json.MetricsModule$MetricRegistrySerializer.serialize(MetricsModule.java:171)
> at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
> at 
> com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1428)
> at 
> com.fasterxml.jackson.databind.ObjectWriter._configAndWriteValue(ObjectWriter.java:1129)
> at 
> com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:967)
> at 
> com.codahale.metrics.servlets.MetricsServlet.doGet(MetricsServlet.java:176)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
> at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689)
> at 
> com.ringcentral.slf4j.CleanMDCFilter.doFilter(CleanMDCFilter.java:18)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at 
> org.ecli

[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread

2017-03-02 Thread Christian Esken (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892332#comment-15892332
 ] 

Christian Esken commented on CASSANDRA-13265:
-

bq. use System.nanoTime() instead of System.currentTimeMillis().
Agreed, {{System.nanoTime()}} is slightly better here. In real life it won't 
make a terrific difference even with the worst clocks, but "_lets do things 
right"_. :-) . I never looked up the native code for nanoTIme(), but I bet on 
Unix it uses the POSIX {{clock_gettime(CLOCK_MONOTONIC, ...)}}.

bq. I don't think we want to traverse the entire backlog. [...]
Your argument  "reasonably in ascendingascending timestamp order" makes sense, 
if all entries would have the same expiration time. But the Verbs have 
different timeouts. Thus iterating the whole Queue should be done, as in the 
worst case we will remove nothing even though most entries are timed out.


> Epxiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread

2017-03-02 Thread Christian Esken (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892332#comment-15892332
 ] 

Christian Esken edited comment on CASSANDRA-13265 at 3/2/17 2:34 PM:
-

bq. use System.nanoTime() instead of System.currentTimeMillis().
Agreed, {{System.nanoTime()}} is slightly better here. In real life it won't 
make a terrific difference even with the worst clocks, but "_lets do things 
right"_. :-) . I never looked up the native code for nanoTIme(), but I bet on 
Unix it uses the POSIX {{clock_gettime(CLOCK_MONOTONIC, ...)}}.

bq. I don't think we want to traverse the entire backlog. [...]
Your argument  "reasonably in ascending timestamp order" makes sense, if all 
entries would have the same expiration time. But the Verbs have different 
timeouts. Thus iterating the whole Queue should be done, as in the worst case 
we will remove nothing even though most entries are timed out.



was (Author: cesken):
bq. use System.nanoTime() instead of System.currentTimeMillis().
Agreed, {{System.nanoTime()}} is slightly better here. In real life it won't 
make a terrific difference even with the worst clocks, but "_lets do things 
right"_. :-) . I never looked up the native code for nanoTIme(), but I bet on 
Unix it uses the POSIX {{clock_gettime(CLOCK_MONOTONIC, ...)}}.

bq. I don't think we want to traverse the entire backlog. [...]
Your argument  "reasonably in ascendingascending timestamp order" makes sense, 
if all entries would have the same expiration time. But the Verbs have 
different timeouts. Thus iterating the whole Queue should be done, as in the 
worst case we will remove nothing even though most entries are timed out.


> Epxiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread

2017-03-02 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892331#comment-15892331
 ] 

Ariel Weisberg commented on CASSANDRA-13265:


I don't want to traverse the whole backlog I just thought we were so we should 
avoid doing it to frequently. Then I realized we don't have to traverse the 
entire backlog and sure enough that is what it was already doing. I prefer the 
version from yesterday with Jason's suggestions.

> Epxiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13223) Unable to compute when histogram overflowed

2017-03-02 Thread Vladimir Bukhtoyarov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892328#comment-15892328
 ] 

Vladimir Bukhtoyarov commented on CASSANDRA-13223:
--

[~zznate] thanks for instruction, I am going to prepare patch in few days.

> Unable to compute when histogram overflowed
> ---
>
> Key: CASSANDRA-13223
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13223
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Vladimir Bukhtoyarov
>Priority: Minor
>
> DecayingEstimatedHistogramReservoir throws exception when value upper max 
> recorded to reservoir. It is very undesired behavior, because functionality 
> like logging or monitoring should never fail with exception. Current behavior 
> of DecayingEstimatedHistogramReservoir violates contract for 
> [Reservoir|https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/Reservoir.java],
>  as you can see javadocs for Reservoir says nothing that implementation can 
> throw exception in getSnapshot method. As result all Dropwizzard/Metrics 
> reporters are broken, because nobody expect that metric will throw exception 
> on get, for example our monitoring pipeline is broken with exception:
> {noformat}
> com.fasterxml.jackson.databind.JsonMappingException: Unable to compute when 
> histogram overflowed (through reference chain: 
> java.util.UnmodifiableSortedMap["org.apache.cassandra.metrics.Table
> .ColUpdateTimeDeltaHistogram.all"])
> at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:339)
> at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:299)
> at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:342)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serializeFields(MapSerializer.java:620)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serialize(MapSerializer.java:519)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serialize(MapSerializer.java:31)
> at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
> at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:2436)
> at 
> com.fasterxml.jackson.core.base.GeneratorBase.writeObject(GeneratorBase.java:355)
> at 
> com.fasterxml.jackson.core.JsonGenerator.writeObjectField(JsonGenerator.java:1442)
> at 
> com.codahale.metrics.json.MetricsModule$MetricRegistrySerializer.serialize(MetricsModule.java:188)
> at 
> com.codahale.metrics.json.MetricsModule$MetricRegistrySerializer.serialize(MetricsModule.java:171)
> at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
> at 
> com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1428)
> at 
> com.fasterxml.jackson.databind.ObjectWriter._configAndWriteValue(ObjectWriter.java:1129)
> at 
> com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:967)
> at 
> com.codahale.metrics.servlets.MetricsServlet.doGet(MetricsServlet.java:176)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
> at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689)
> at 
> com.ringcentral.slf4j.CleanMDCFilter.doFilter(CleanMDCFilter.java:18)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at 
> org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:524)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:319)
> at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)
> at 
> org.eclipse.jetty.io.AbstractConnection$Re

[jira] [Commented] (CASSANDRA-13271) Reduce lock contention on instance factories of ListType and SetType

2017-03-02 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892297#comment-15892297
 ] 

Robert Stupp commented on CASSANDRA-13271:
--

Can you please clean up the imports as they were before and remove the new, 
superfluous ones?
The default Eclipse "organize imports" is not how we are used to do it in the 
C* code base (see the developer/how-to-contribute docs on cassandra.apache.org).
All you need in your cassandra-fork on GitHub is a branch for this patch - not 
a new fork.

> Reduce lock contention on instance factories of ListType and SetType
> 
>
> Key: CASSANDRA-13271
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13271
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: vincent royer
>Priority: Minor
>  Labels: performance
> Fix For: 4.x
>
> Attachments: 0001-CASSANDRA-13271-computeIfAbsent.patch, 
> 0001-CASSANDRA-13271-singleton-factory-concurrency-opimiz.patch
>
>
> By doing some performance tests, i noticed that getInstance() in 
> org.apache.cassandra.db.marshal.ListType and SetType could suffer from lock 
> contention on the singleton factory getInstance(). Here is a proposal to 
> reduce lock contention by using a ConcurrentMap and the putIfAbsent method 
> rather than a synchronized method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13223) Unable to compute when histogram overflowed

2017-03-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892228#comment-15892228
 ] 

Per Otterström commented on CASSANDRA-13223:


I wonder if this could be a duplicate of CASSANDRA-7. 
[~vladimir.bukhtoyarov], in what version did you observe this?

> Unable to compute when histogram overflowed
> ---
>
> Key: CASSANDRA-13223
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13223
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Vladimir Bukhtoyarov
>Priority: Minor
>
> DecayingEstimatedHistogramReservoir throws exception when value upper max 
> recorded to reservoir. It is very undesired behavior, because functionality 
> like logging or monitoring should never fail with exception. Current behavior 
> of DecayingEstimatedHistogramReservoir violates contract for 
> [Reservoir|https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/Reservoir.java],
>  as you can see javadocs for Reservoir says nothing that implementation can 
> throw exception in getSnapshot method. As result all Dropwizzard/Metrics 
> reporters are broken, because nobody expect that metric will throw exception 
> on get, for example our monitoring pipeline is broken with exception:
> {noformat}
> com.fasterxml.jackson.databind.JsonMappingException: Unable to compute when 
> histogram overflowed (through reference chain: 
> java.util.UnmodifiableSortedMap["org.apache.cassandra.metrics.Table
> .ColUpdateTimeDeltaHistogram.all"])
> at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:339)
> at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:299)
> at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:342)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serializeFields(MapSerializer.java:620)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serialize(MapSerializer.java:519)
> at 
> com.fasterxml.jackson.databind.ser.std.MapSerializer.serialize(MapSerializer.java:31)
> at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
> at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:2436)
> at 
> com.fasterxml.jackson.core.base.GeneratorBase.writeObject(GeneratorBase.java:355)
> at 
> com.fasterxml.jackson.core.JsonGenerator.writeObjectField(JsonGenerator.java:1442)
> at 
> com.codahale.metrics.json.MetricsModule$MetricRegistrySerializer.serialize(MetricsModule.java:188)
> at 
> com.codahale.metrics.json.MetricsModule$MetricRegistrySerializer.serialize(MetricsModule.java:171)
> at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
> at 
> com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1428)
> at 
> com.fasterxml.jackson.databind.ObjectWriter._configAndWriteValue(ObjectWriter.java:1129)
> at 
> com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:967)
> at 
> com.codahale.metrics.servlets.MetricsServlet.doGet(MetricsServlet.java:176)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
> at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689)
> at 
> com.ringcentral.slf4j.CleanMDCFilter.doFilter(CleanMDCFilter.java:18)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at 
> org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:524)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:319)
> at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)
> at 
> org.ecl

[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread

2017-03-02 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892221#comment-15892221
 ] 

Jason Brown commented on CASSANDRA-13265:
-

tbh, I don't think we want to traverse the entire {{backlog}}. If we go with 
the assumption that items are added to the backlog 'reasonably' in ascending 
timestamp order, then even a few out of order entries won't make much of a 
difference to the overload load. Thus, bailing out on the first non-expired 
message is still a sensible thing.

> Epxiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread

2017-03-02 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892217#comment-15892217
 ] 

Jason Brown commented on CASSANDRA-13265:
-

Some comments on the patch

- use {{System.nanoTime()}} instead of {{System.currentTimeMillis()}}. nanoTime 
is monotonic, currentTimeMillis is not.
- for testing, I'd give {{expireMessages}} default visibility, and write a test 
that enqueues a bunch of messages the expire in n seconds, immediately calls 
{{expireMessages}} and ensure no messages were purged, wait the 
{{BACKLOG_EXPIRATION_INTERVAL_MILLIS}}, then invoke {{expireMessages}} and make 
sure everything was purged. You don't need to run OTC as a thread or with the 
consumer part of the class executing.
- then you can remove the {[BACKLOG_EXPIRATION_DEBUG}} :)

> Epxiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread

2017-03-02 Thread Christian Esken (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891905#comment-15891905
 ] 

Christian Esken edited comment on CASSANDRA-13265 at 3/2/17 12:36 PM:
--

Ariel wrote:
{quote}
Expiration is based on time. There is no point in attempting expiration again 
immediately because almost nothing will have expired. It allows one bad 
connection to consume resources it shouldn't in the form of hijacking a thread 
to iterate a list.

I don't see the downside of switching from a boolean to a long and CASing that 
instead. If we aren't confident in it we can set a small interval so that it 
still checks for expiration often although I think that just generates useless 
work. We can't make timeouts pass faster.
{quote}

[~aweisberg], I understand that you want to CAS on "lastExpirationTime", right? 
I am also for doing this. Its fitting better and still keeps the change simple. 
In that case the Thread should iterate the whole Queue, and not bail out on the 
first hit. I will change it in the PR.


was (Author: cesken):
Ariel wrote:
{quote}
Expiration is based on time. There is no point in attempting expiration again 
immediately because almost nothing will have expired. It allows one bad 
connection to consume resources it shouldn't in the form of hijacking a thread 
to iterate a list.

I don't see the downside of switching from a boolean to a long and CASing that 
instead. If we aren't confident in it we can set a small interval so that it 
still checks for expiration often although I think that just generates useless 
work. We can't make timeouts pass faster.
{quote}

[~aweisberg]: I understand that you want to CAS on "lastExpirationTime", right? 
I am also for doing this. Its fitting better and still keeps the change simple. 
In that case the Thread should iterate the whole Queue, and not bail out on the 
first hit. I will change it in the PR.

> Epxiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread

2017-03-02 Thread Christian Esken (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891905#comment-15891905
 ] 

Christian Esken edited comment on CASSANDRA-13265 at 3/2/17 12:35 PM:
--

Ariel wrote:
{quote}
Expiration is based on time. There is no point in attempting expiration again 
immediately because almost nothing will have expired. It allows one bad 
connection to consume resources it shouldn't in the form of hijacking a thread 
to iterate a list.

I don't see the downside of switching from a boolean to a long and CASing that 
instead. If we aren't confident in it we can set a small interval so that it 
still checks for expiration often although I think that just generates useless 
work. We can't make timeouts pass faster.
{quote}

[~aweisberg]: I understand that you want to CAS on "lastExpirationTime", right? 
I am also for doing this. Its fitting better and still keeps the change simple. 
In that case the Thread should iterate the whole Queue, and not bail out on the 
first hit. I will change it in the PR.


was (Author: cesken):
Ariel wrote:
{quote}
Expiration is based on time. There is no point in attempting expiration again 
immediately because almost nothing will have expired. It allows one bad 
connection to consume resources it shouldn't in the form of hijacking a thread 
to iterate a list.

I don't see the downside of switching from a boolean to a long and CASing that 
instead. If we aren't confident in it we can set a small interval so that it 
still checks for expiration often although I think that just generates useless 
work. We can't make timeouts pass faster.
{quote}

I understand that you want to CAS on "lastExpirationTime", right? I am also for 
doing this. Its fitting better and still keeps the change simple. In that case 
the Thread should iterate the whole Queue, and not bail out on the first hit. I 
will change it in the PR.

> Epxiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (CASSANDRA-12985) Update MV repair documentation

2017-03-02 Thread Benjamin Roth (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Roth resolved CASSANDRA-12985.
---
Resolution: Resolved

MV repairs won't be changed as proposed

> Update MV repair documentation
> --
>
> Key: CASSANDRA-12985
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12985
> Project: Cassandra
>  Issue Type: Task
>Reporter: Benjamin Roth
> Fix For: 3.0.x, 3.11.x
>
>
> Due to CASSANDRA-12888 the way MVs are being repaired changes.
> Before:
> MV has been repaired by repairing the base table. Repairing the MV separately 
> has been discouraged. Also repairing a whole KS containing a MV has been 
> discouraged.
> After:
> MVs are treated like any other table in repairs. They also MUST be repaired 
> as any other table. Base table does NOT repair MV any more.
> Repairing a whole keyspace is encouraged.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

2017-03-02 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892143#comment-15892143
 ] 

Benjamin Roth commented on CASSANDRA-13241:
---

So... who's gonna do it?

> Lower default chunk_length_in_kb from 64kb to 4kb
> -
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths

2017-03-02 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892040#comment-15892040
 ] 

Benjamin Roth edited comment on CASSANDRA-13065 at 3/2/17 12:06 PM:


[~pauloricardomg] Please also look at follow-up commits on review. I added 2 
more commits (for 13064+13065) with tiny fixes. Depending on your feedback, I 
can rearrange them if required.

https://github.com/Jaumo/cassandra/tree/CASSANDRA-13064


was (Author: brstgt):
[~pauloricardomg] Please also look at follow-up commits on review. I added 2 
more commits (13064+13065) with tiny fixes. Depending on your feedback, I can 
rearrange them if required.

> Consistent range movements to not require MV updates to go through write 
> paths 
> ---
>
> Key: CASSANDRA-13065
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13065
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>Priority: Critical
> Fix For: 4.0
>
>
> Booting or decommisioning nodes with MVs is unbearably slow as all streams go 
> through the regular write paths. This causes read-before-writes for every 
> mutation and during bootstrap it causes them to be sent to batchlog.
> The makes it virtually impossible to boot a new node in an acceptable amount 
> of time.
> Using the regular streaming behaviour for consistent range movements works 
> much better in this case and does not break the MV local consistency contract.
> Already tested on own cluster.
> Bootstrap case is super easy to handle, decommission case requires 
> CASSANDRA-13064



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread

2017-03-02 Thread Christian Esken (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892114#comment-15892114
 ] 

Christian Esken edited comment on CASSANDRA-13265 at 3/2/17 11:47 AM:
--

I updated the PR with the following changes:
 - Variable names / modifiers (static)
 - Expiration is based on time
 - Expiration inspects the whole Queue (no bailing out)

This is really hard to reproduce and to test. Because of that I did not yet 
remove the BACKLOG_EXPIRATION_DEBUG. If you have a hint about test 
possibilities, let me know.


was (Author: cesken):
I updated the PR with the following changes:
 - Variable names / modifiers (static)
 - Expiration is based on time
 - Expiration inspects the whole Queue (no bailing out)


> Epxiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread

2017-03-02 Thread Christian Esken (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892114#comment-15892114
 ] 

Christian Esken commented on CASSANDRA-13265:
-

I updated the PR with the following changes:
 - Variable names / modifiers (static)
 - Expiration is based on time
 - Expiration inspects the whole Queue (no bailing out)


> Epxiration in OutboundTcpConnection can block the reader Thread
> ---
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>Reporter: Christian Esken
>Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (10 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths

2017-03-02 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892040#comment-15892040
 ] 

Benjamin Roth commented on CASSANDRA-13065:
---

[~pauloricardomg] Please also look at follow-up commits on review. I added 2 
more commits (13064+13065) with tiny fixes. Depending on your feedback, I can 
rearrange them if required.

> Consistent range movements to not require MV updates to go through write 
> paths 
> ---
>
> Key: CASSANDRA-13065
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13065
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>Priority: Critical
> Fix For: 4.0
>
>
> Booting or decommisioning nodes with MVs is unbearably slow as all streams go 
> through the regular write paths. This causes read-before-writes for every 
> mutation and during bootstrap it causes them to be sent to batchlog.
> The makes it virtually impossible to boot a new node in an acceptable amount 
> of time.
> Using the regular streaming behaviour for consistent range movements works 
> much better in this case and does not break the MV local consistency contract.
> Already tested on own cluster.
> Bootstrap case is super easy to handle, decommission case requires 
> CASSANDRA-13064



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >