[jira] [Commented] (CASSANDRA-14280) Fix timeout test - org.apache.cassandra.cql3.ViewTest

2018-02-27 Thread Dinesh Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379818#comment-16379818
 ] 

Dinesh Joshi commented on CASSANDRA-14280:
--

I still see ViewBuildTaskTest, TombstoneTest and ViewTest failing. Is this 
expected?

> Fix timeout test - org.apache.cassandra.cql3.ViewTest
> -
>
> Key: CASSANDRA-14280
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14280
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
> Fix For: 4.0
>
>
> The test timeout very often, it seems too big, try to split it into multiple 
> tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[3/3] cassandra git commit: Merge branch 'cassandra-3.11' into trunk

2018-02-27 Thread jzhuang
Merge branch 'cassandra-3.11' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f7d140e2
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f7d140e2
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f7d140e2

Branch: refs/heads/trunk
Commit: f7d140e2a934e343dedc7d4057784551d4adac48
Parents: b86801e c494696
Author: Jay Zhuang 
Authored: Tue Feb 27 21:47:24 2018 -0800
Committer: Jay Zhuang 
Committed: Tue Feb 27 21:47:24 2018 -0800

--
 build.xml | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/f7d140e2/build.xml
--


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[2/3] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11

2018-02-27 Thread jzhuang
Merge branch 'cassandra-3.0' into cassandra-3.11


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c4946960
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c4946960
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c4946960

Branch: refs/heads/trunk
Commit: c4946960a20e12f0f574b5608c886467466ee3b9
Parents: abd9be1 79cead0
Author: Jay Zhuang 
Authored: Tue Feb 27 21:37:59 2018 -0800
Committer: Jay Zhuang 
Committed: Tue Feb 27 21:40:25 2018 -0800

--
 build.xml | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/c4946960/build.xml
--


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[1/3] cassandra git commit: Add new developer to build.xml file

2018-02-27 Thread jzhuang
Repository: cassandra
Updated Branches:
  refs/heads/trunk b86801e95 -> f7d140e2a


Add new developer to build.xml file


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/79cead09
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/79cead09
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/79cead09

Branch: refs/heads/trunk
Commit: 79cead093e9a2fe8273f9c2ea85e7d8d9f8fabf2
Parents: d73f45b
Author: Jay Zhuang 
Authored: Tue Feb 27 18:07:14 2018 -0800
Committer: Jay Zhuang 
Committed: Tue Feb 27 19:12:02 2018 -0800

--
 build.xml | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/79cead09/build.xml
--
diff --git a/build.xml b/build.xml
index 6f98242..7bab97c 100644
--- a/build.xml
+++ b/build.xml
@@ -443,6 +443,7 @@
 
 
 
+
 
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[3/3] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11

2018-02-27 Thread jzhuang
Merge branch 'cassandra-3.0' into cassandra-3.11


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c4946960
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c4946960
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c4946960

Branch: refs/heads/cassandra-3.11
Commit: c4946960a20e12f0f574b5608c886467466ee3b9
Parents: abd9be1 79cead0
Author: Jay Zhuang 
Authored: Tue Feb 27 21:37:59 2018 -0800
Committer: Jay Zhuang 
Committed: Tue Feb 27 21:40:25 2018 -0800

--
 build.xml | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/c4946960/build.xml
--


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[2/3] cassandra git commit: Add new developer to build.xml file

2018-02-27 Thread jzhuang
Add new developer to build.xml file


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/79cead09
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/79cead09
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/79cead09

Branch: refs/heads/cassandra-3.11
Commit: 79cead093e9a2fe8273f9c2ea85e7d8d9f8fabf2
Parents: d73f45b
Author: Jay Zhuang 
Authored: Tue Feb 27 18:07:14 2018 -0800
Committer: Jay Zhuang 
Committed: Tue Feb 27 19:12:02 2018 -0800

--
 build.xml | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/79cead09/build.xml
--
diff --git a/build.xml b/build.xml
index 6f98242..7bab97c 100644
--- a/build.xml
+++ b/build.xml
@@ -443,6 +443,7 @@
 
 
 
+
 
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[1/3] cassandra git commit: Add new developer to build.xml file

2018-02-27 Thread jzhuang
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.0 d73f45bad -> 79cead093
  refs/heads/cassandra-3.11 abd9be1e4 -> c4946960a


Add new developer to build.xml file


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/79cead09
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/79cead09
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/79cead09

Branch: refs/heads/cassandra-3.0
Commit: 79cead093e9a2fe8273f9c2ea85e7d8d9f8fabf2
Parents: d73f45b
Author: Jay Zhuang 
Authored: Tue Feb 27 18:07:14 2018 -0800
Committer: Jay Zhuang 
Committed: Tue Feb 27 19:12:02 2018 -0800

--
 build.xml | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/79cead09/build.xml
--
diff --git a/build.xml b/build.xml
index 6f98242..7bab97c 100644
--- a/build.xml
+++ b/build.xml
@@ -443,6 +443,7 @@
 
 
 
+
 
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14280) Fix timeout test - org.apache.cassandra.cql3.ViewTest

2018-02-27 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous updated CASSANDRA-14280:
--
Status: Ready to Commit  (was: Patch Available)

> Fix timeout test - org.apache.cassandra.cql3.ViewTest
> -
>
> Key: CASSANDRA-14280
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14280
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
> Fix For: 4.0
>
>
> The test timeout very often, it seems too big, try to split it into multiple 
> tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14280) Fix timeout test - org.apache.cassandra.cql3.ViewTest

2018-02-27 Thread Dikang Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dikang Gu updated CASSANDRA-14280:
--
Status: Patch Available  (was: Open)

Fix the timeout of ViewTest by changing from updateView("TRUNCATE %s") to 
execute("TRUNCATE %s").

Also split ViewTest to smaller unit tests.

|[trunk| 
https://github.com/DikangGu/cassandra/commit/ae1b9695de7d3f20e52e93a6cdae4a25cdc2f19b]|[unit
 test | https://circleci.com/gh/DikangGu/cassandra/22] |

> Fix timeout test - org.apache.cassandra.cql3.ViewTest
> -
>
> Key: CASSANDRA-14280
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14280
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
> Fix For: 4.0
>
>
> The test timeout very often, it seems too big, try to split it into multiple 
> tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling

2018-02-27 Thread Kurt Greaves (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379739#comment-16379739
 ] 

Kurt Greaves commented on CASSANDRA-14210:
--

[~krummas] I've set as RTC, but if you want to get another reviewer feel free 
to.

[~oshulgin] that would be unrelated to this patch. This will only affect any 
tool where you can specify # of jobs (cleanups, upgradesstable, scrub). That 
sounds like a bug though and if you can get more info might be worth another 
JIRA.

> Optimize SSTables upgrade task scheduling
> -
>
> Key: CASSANDRA-14210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14210
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Oleksandr Shulgin
>Assignee: Kurt Greaves
>Priority: Major
> Fix For: 4.x
>
>
> When starting the SSTable-rewrite process by running {{nodetool 
> upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are 
> used.
> For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}.  
> What we observed both for version 2.2 and 3.0, is that initially all 4 
> provided slots are used for "Upgrade sstables" compactions, but later when 
> some of the 4 tasks are finished, no new tasks are scheduled immediately.  It 
> takes the last of the 4 tasks to finish before new 4 tasks would be 
> scheduled.  This happens on every node we've observed.
> This doesn't utilize available resources to the full extent allowed by the 
> --jobs N parameter.  In the field, on a cluster of 12 nodes with 4-5 TiB data 
> each, we've seen that the whole process was taking more than 7 days, instead 
> of estimated 1.5-2 days (provided there would be close to full N slots 
> utilization).
> Instead, new tasks should be scheduled as soon as there is a free compaction 
> slot.
> Additionally, starting from the biggest SSTables could further reduce the 
> total time required for the whole process to finish on any given node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling

2018-02-27 Thread Kurt Greaves (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Greaves updated CASSANDRA-14210:
-
Status: Ready to Commit  (was: Patch Available)

> Optimize SSTables upgrade task scheduling
> -
>
> Key: CASSANDRA-14210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14210
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Oleksandr Shulgin
>Assignee: Kurt Greaves
>Priority: Major
> Fix For: 4.x
>
>
> When starting the SSTable-rewrite process by running {{nodetool 
> upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are 
> used.
> For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}.  
> What we observed both for version 2.2 and 3.0, is that initially all 4 
> provided slots are used for "Upgrade sstables" compactions, but later when 
> some of the 4 tasks are finished, no new tasks are scheduled immediately.  It 
> takes the last of the 4 tasks to finish before new 4 tasks would be 
> scheduled.  This happens on every node we've observed.
> This doesn't utilize available resources to the full extent allowed by the 
> --jobs N parameter.  In the field, on a cluster of 12 nodes with 4-5 TiB data 
> each, we've seen that the whole process was taking more than 7 days, instead 
> of estimated 1.5-2 days (provided there would be close to full N slots 
> utilization).
> Instead, new tasks should be scheduled as soon as there is a free compaction 
> slot.
> Additionally, starting from the biggest SSTables could further reduce the 
> total time required for the whole process to finish on any given node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling

2018-02-27 Thread Kurt Greaves (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Greaves updated CASSANDRA-14210:
-
Reproduced In: 3.0.15, 2.2.11  (was: 2.2.11, 3.0.15)
   Status: Patch Available  (was: Awaiting Feedback)

> Optimize SSTables upgrade task scheduling
> -
>
> Key: CASSANDRA-14210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14210
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Oleksandr Shulgin
>Assignee: Kurt Greaves
>Priority: Major
> Fix For: 4.x
>
>
> When starting the SSTable-rewrite process by running {{nodetool 
> upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are 
> used.
> For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}.  
> What we observed both for version 2.2 and 3.0, is that initially all 4 
> provided slots are used for "Upgrade sstables" compactions, but later when 
> some of the 4 tasks are finished, no new tasks are scheduled immediately.  It 
> takes the last of the 4 tasks to finish before new 4 tasks would be 
> scheduled.  This happens on every node we've observed.
> This doesn't utilize available resources to the full extent allowed by the 
> --jobs N parameter.  In the field, on a cluster of 12 nodes with 4-5 TiB data 
> each, we've seen that the whole process was taking more than 7 days, instead 
> of estimated 1.5-2 days (provided there would be close to full N slots 
> utilization).
> Instead, new tasks should be scheduled as soon as there is a free compaction 
> slot.
> Additionally, starting from the biggest SSTables could further reduce the 
> total time required for the whole process to finish on any given node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14280) Fix timeout test - org.apache.cassandra.cql3.ViewTest

2018-02-27 Thread Dikang Gu (JIRA)
Dikang Gu created CASSANDRA-14280:
-

 Summary: Fix timeout test - org.apache.cassandra.cql3.ViewTest
 Key: CASSANDRA-14280
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14280
 Project: Cassandra
  Issue Type: Bug
  Components: Testing
Reporter: Dikang Gu
Assignee: Dikang Gu
 Fix For: 4.0


The test timeout very often, it seems too big, try to split it into multiple 
tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries

2018-02-27 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379662#comment-16379662
 ] 

mck commented on CASSANDRA-14247:
-

[~mkjellman], agree with all your points. I did do one profile run against an 
infinite loop on the cities test csv file which is 1.3M, so it'll be easy to do 
it again to validate your improvement in (2).

> SASI tokenizer for simple delimiter based entries
> -
>
> Key: CASSANDRA-14247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14247
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: mck
>Assignee: mck
>Priority: Major
> Fix For: 4.0, 3.11.x
>
>
> Currently SASI offers only two tokenizer options:
>  - NonTokenizerAnalyser
>  - StandardAnalyzer
> The latter is built upon Snowball, powerful for human languages but overkill 
> for simple tokenization.
> A simple tokenizer is proposed here. The need for this arose as a workaround 
> of CASSANDRA-11182, and to avoid the disk usage explosion when having to 
> resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861
> Example use of this would be:
> {code}
> CREATE CUSTOM INDEX span_annotation_query_idx 
> ON zipkin2.span (annotation_query) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' 
> WITH OPTIONS = {
> 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', 
> 'delimiter': '░',
> 'case_sensitive': 'true', 
> 'mode': 'prefix', 
> 'analyzed': 'true'};
> {code}
> Original credit for this work goes to https://github.com/zuochangan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries

2018-02-27 Thread mck (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mck updated CASSANDRA-14247:

Status: In Progress  (was: Patch Available)

> SASI tokenizer for simple delimiter based entries
> -
>
> Key: CASSANDRA-14247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14247
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: mck
>Assignee: mck
>Priority: Major
> Fix For: 4.0, 3.11.x
>
>
> Currently SASI offers only two tokenizer options:
>  - NonTokenizerAnalyser
>  - StandardAnalyzer
> The latter is built upon Snowball, powerful for human languages but overkill 
> for simple tokenization.
> A simple tokenizer is proposed here. The need for this arose as a workaround 
> of CASSANDRA-11182, and to avoid the disk usage explosion when having to 
> resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861
> Example use of this would be:
> {code}
> CREATE CUSTOM INDEX span_annotation_query_idx 
> ON zipkin2.span (annotation_query) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' 
> WITH OPTIONS = {
> 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', 
> 'delimiter': '░',
> 'case_sensitive': 'true', 
> 'mode': 'prefix', 
> 'analyzed': 'true'};
> {code}
> Original credit for this work goes to https://github.com/zuochangan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries

2018-02-27 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379642#comment-16379642
 ] 

Michael Kjellman commented on CASSANDRA-14247:
--

1) I think it would be better if we used a "," or " " for the default delimiter

2) I think it would be better if we do the work inside the iterator itself vs. 
using the split() function on the entire contents of the string in reset(). If 
we can do it iteratively we can then potentially reuse buffers and just go 
character by character until we hit the delimiter vs. needing to process the 
whole thing, no? Or did you benchmark this and find even with potentially large 
strings there wasn't a win?

3) When you hit a MarshalException you're logging the whole thing.. if the 
value is a 30MB text blob – the logger would get slammed so not sure logging 
the entire thing by default is ideal. thoughts?

> SASI tokenizer for simple delimiter based entries
> -
>
> Key: CASSANDRA-14247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14247
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: mck
>Assignee: mck
>Priority: Major
> Fix For: 4.0, 3.11.x
>
>
> Currently SASI offers only two tokenizer options:
>  - NonTokenizerAnalyser
>  - StandardAnalyzer
> The latter is built upon Snowball, powerful for human languages but overkill 
> for simple tokenization.
> A simple tokenizer is proposed here. The need for this arose as a workaround 
> of CASSANDRA-11182, and to avoid the disk usage explosion when having to 
> resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861
> Example use of this would be:
> {code}
> CREATE CUSTOM INDEX span_annotation_query_idx 
> ON zipkin2.span (annotation_query) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' 
> WITH OPTIONS = {
> 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', 
> 'delimiter': '░',
> 'case_sensitive': 'true', 
> 'mode': 'prefix', 
> 'analyzed': 'true'};
> {code}
> Original credit for this work goes to https://github.com/zuochangan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries

2018-02-27 Thread mck (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mck updated CASSANDRA-14247:

Status: Patch Available  (was: In Progress)

> SASI tokenizer for simple delimiter based entries
> -
>
> Key: CASSANDRA-14247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14247
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: mck
>Assignee: mck
>Priority: Major
> Fix For: 4.0, 3.11.x
>
>
> Currently SASI offers only two tokenizer options:
>  - NonTokenizerAnalyser
>  - StandardAnalyzer
> The latter is built upon Snowball, powerful for human languages but overkill 
> for simple tokenization.
> A simple tokenizer is proposed here. The need for this arose as a workaround 
> of CASSANDRA-11182, and to avoid the disk usage explosion when having to 
> resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861
> Example use of this would be:
> {code}
> CREATE CUSTOM INDEX span_annotation_query_idx 
> ON zipkin2.span (annotation_query) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' 
> WITH OPTIONS = {
> 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', 
> 'delimiter': '░',
> 'case_sensitive': 'true', 
> 'mode': 'prefix', 
> 'analyzed': 'true'};
> {code}
> Original credit for this work goes to https://github.com/zuochangan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries

2018-02-27 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379509#comment-16379509
 ] 

mck commented on CASSANDRA-14247:
-

{quote}what's the reasoning for using "░" as the delimiter?{quote}
[~mkjellman]. Nothing. Just the use-case that came over from zipkin (we used 
there a character that really was unlikely to be used otherwise). It could well 
make more sense to use the comma character?

> SASI tokenizer for simple delimiter based entries
> -
>
> Key: CASSANDRA-14247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14247
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: mck
>Assignee: mck
>Priority: Major
> Fix For: 4.0, 3.11.x
>
>
> Currently SASI offers only two tokenizer options:
>  - NonTokenizerAnalyser
>  - StandardAnalyzer
> The latter is built upon Snowball, powerful for human languages but overkill 
> for simple tokenization.
> A simple tokenizer is proposed here. The need for this arose as a workaround 
> of CASSANDRA-11182, and to avoid the disk usage explosion when having to 
> resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861
> Example use of this would be:
> {code}
> CREATE CUSTOM INDEX span_annotation_query_idx 
> ON zipkin2.span (annotation_query) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' 
> WITH OPTIONS = {
> 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', 
> 'delimiter': '░',
> 'case_sensitive': 'true', 
> 'mode': 'prefix', 
> 'analyzed': 'true'};
> {code}
> Original credit for this work goes to https://github.com/zuochangan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14275) Cassandra Driver should send identification information to Server

2018-02-27 Thread Dinesh Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-14275:
-
Fix Version/s: (was: 3.11.x)
   (was: 3.0.x)
   (was: 2.2.x)
   (was: 2.1.x)

> Cassandra Driver should send identification information to Server
> -
>
> Key: CASSANDRA-14275
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14275
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Major
> Fix For: 4.x
>
>
> Currently there doesn't seem to be any way to readily identify the driver 
> that clients are using to connect to Cassandra. Add the capability of 
> identifying the driver through metadata information much like how HTTP 
> Clients identify themselves through User-Agent HTTP header. This is useful 
> for debugging in large deployments where clients tend to use different 
> drivers, wrappers and language bindings to connect to Cassandra. This can 
> help surface issues as well as help detect clients which are using older or 
> unsupported drivers.
> The identification information should be a string that unambiguously 
> identifies the driver. It should include information such as the name of the 
> driver, it's version, CQL version, Platform (Linux, macOS, Windows, etc.) and 
> architecture (x86, x86_64).
> We should surface this information in `nodetool clientstats` command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14275) Cassandra Driver should send identification information to Server

2018-02-27 Thread Dinesh Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379474#comment-16379474
 ] 

Dinesh Joshi commented on CASSANDRA-14275:
--

Heres a PoC - 

# 
https://github.com/apache/cassandra/compare/trunk...dineshjoshi:add-client-string
# 
https://github.com/datastax/java-driver/compare/3.x...dineshjoshi:enhance-java-driver-to-send-metadata

This allows me to get stats like this - 

{noformat}
bin/nodetool clientstats --all
Address  SSL   Version User  Keyspace Requests Driver
/127.0.0.1:59305 false 4   anonymous  12   
datastaxjavadriver-cql3.0.0-v3.0
/127.0.0.1:59306 false 4   anonymous  2
datastaxjavadriver-cql3.0.0-v3.0

Total connected clients: 2

User  Connections
anonymous 2
{noformat}

> Cassandra Driver should send identification information to Server
> -
>
> Key: CASSANDRA-14275
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14275
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Major
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> Currently there doesn't seem to be any way to readily identify the driver 
> that clients are using to connect to Cassandra. Add the capability of 
> identifying the driver through metadata information much like how HTTP 
> Clients identify themselves through User-Agent HTTP header. This is useful 
> for debugging in large deployments where clients tend to use different 
> drivers, wrappers and language bindings to connect to Cassandra. This can 
> help surface issues as well as help detect clients which are using older or 
> unsupported drivers.
> The identification information should be a string that unambiguously 
> identifies the driver. It should include information such as the name of the 
> driver, it's version, CQL version, Platform (Linux, macOS, Windows, etc.) and 
> architecture (x86, x86_64).
> We should surface this information in `nodetool clientstats` command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12848) Nodetool proxyhistograms/cfhistograms still report latency as flat result

2018-02-27 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379363#comment-16379363
 ] 

Chris Lohfink commented on CASSANDRA-12848:
---

I think this can be reported as not a problem, just a artifact of using 
histograms to store metrics vs a sampling reservoir

> Nodetool proxyhistograms/cfhistograms still report latency as flat result
> -
>
> Key: CASSANDRA-12848
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12848
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Nutchanon Leelapornudom
>Priority: Major
>  Labels: metrics
> Attachments: clientrequest-latency.png, image001.png
>
>
> Even patched in CASSANDRA-11752, nodetool 
> proxyhistograms/cfhistograms(2.2)/tablehistograms(3.0,3.x) still report 
> read/write latency as flat result. That cause Cassandra metric 
> org.apache.cassandra.metrics.ClientRequest.Read/Write.Latency.xxpercentile 
> report incorrect pattern.
> I have attached the result which I tested on cassandra 3.0.9. It indicate 
> read latency as flat line whereas read count has a movement normally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12848) Nodetool proxyhistograms/cfhistograms still report latency as flat result

2018-02-27 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379360#comment-16379360
 ] 

Chris Lohfink commented on CASSANDRA-12848:
---

The buckets for the EH go in 20% jumps, so everything in a bucket from 700-840 
for example will be reported as "840". This means that it will round up to 
nearest 20% and all the variance within the 20% is lost. 2.1 was lossy 
(randomly threw away latency recordings regardless of how important they are) 
but 2.2+ has a 20% error threshold (will in worse case report as 20% worse, not 
better)

> Nodetool proxyhistograms/cfhistograms still report latency as flat result
> -
>
> Key: CASSANDRA-12848
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12848
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Nutchanon Leelapornudom
>Priority: Major
>  Labels: metrics
> Attachments: clientrequest-latency.png, image001.png
>
>
> Even patched in CASSANDRA-11752, nodetool 
> proxyhistograms/cfhistograms(2.2)/tablehistograms(3.0,3.x) still report 
> read/write latency as flat result. That cause Cassandra metric 
> org.apache.cassandra.metrics.ClientRequest.Read/Write.Latency.xxpercentile 
> report incorrect pattern.
> I have attached the result which I tested on cassandra 3.0.9. It indicate 
> read latency as flat line whereas read count has a movement normally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13762) Ensure views created during (or just before) range movements are properly built

2018-02-27 Thread Duarte Nunes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379277#comment-16379277
 ] 

Duarte Nunes commented on CASSANDRA-13762:
--

I think this patch assumes the base node receiving the streamed data will be 
paired with the same view replica as the source base node. Is this true? If so, 
how is that guaranteed?

 

If not, then we might still want to send out view updates. We would need to 
check whether all replicas for the range in question have finished building 
their views, not just the source node.

> Ensure views created during (or just before) range movements are properly 
> built
> ---
>
> Key: CASSANDRA-13762
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13762
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Minor
>  Labels: materializedviews
> Attachments: trunk-13762-dtest.png, trunk-13762-testall.png
>
>
> CASSANDRA-13065 assumes the source node has its views built to skip running 
> base mutations through the write path during range movements.
> It is possible that the source node has not finished building the view, or 
> that a new view is created during a range movement, in which case the view 
> may be wrongly marked as built on the destination node.
> The former problem was introduced by #13065, but even before that a view 
> created during a range movement may not be correctly built on the destination 
> node because the view builder will be triggered before it has finished 
> streaming the source data, wrongly marking the view as built on that node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14279) Row Tombstones in separate sstables / separate compaction path

2018-02-27 Thread Constance Eustace (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Constance Eustace updated CASSANDRA-14279:
--
Description: 
In my experience if data is not well organized into time windowed sstables, 
cassandra has enormous difficulty in actually deleting data if the data has a 
"medium term" lifetime and is commingled with data that isn't marked for death, 
as would happen with compactions or intermingled write patterns. Or for 
example, you might have an active working set and be archiving "unused" data to 
other tables or clusters. Or you may be purging data. Or you may be 
migrating/sharding/restructuring data. Whatever the case, you want that disk 
space back, and you might not be able to truncate.

In STCS and LCS, row tombstones are intermingled with column data and column 
tombstones. But a row tombstone represents a significant event in data 
lifecycle: large amounts of "droppable" data during compaction and a shortcut 
from reading data from other sstables. It could also enable writes to be 
discarded in rare data patterns if the row tombstone is ahead in time. 

I am wondering that if row tombstones were isolated in their own sstables, 
separately compacted and merged, that it might enable compaction to work more 
efficiently: 

reads can prioritize bloom filter lookups that indicate a row tombstone, 
getting the timestamp of the deletion first, then can use that in the data 
sstables to filter data or shortcircuit the data if the row data had an overall 
"most recent data timestamp". 

compaction could be forced to reference all the row tombstone sstables, such 
that every time two or more "data" sstables are compacted, they must reference 
the row tombstones to purge data. 

In LCS, this would be particularly useful in getting data out of the upper 
levels without having to wait for data to trickle up the tree. The row 
tombstones, being read-only inputs into the data sstable compactions, can be 
referenced in each of the LCS levels' parallel compactors. 

Based on discussions in the dev list, this would appear to require some sort of 
customization to the memtable->sstable flushing process, and perhaps a 
different set of bloom filters. 

Since the row tombstone sstables are all ,, they 
should be comparitively smaller and take less time to compact. They could be 
aggressively compacted on a different schedule than "data" sstables. 

In addition, it may be easier to repair/synchronize row tombstones across the 
cluster if they have already been separated into their own sstables.

Column/range tombstones may also benefit from a similar separation, but my 
guess is those are much more numerous and large and fine-grained that they 
might as well coexist with the data.

  was:
In my experience if data is not well organized into time windowed sstables, 
cassandra has enormous difficulty in actually deleting data if the data has a 
"medium term" lifetime and is commingled with data that isn't marked for death, 
as would happen with compactions or intermingled write patterns. Or for 
example, you might have an active working set and be archiving "unused" data to 
other tables or clusters. Or you may be purging data. Or you may be 
migrating/sharding/restructuring data. Whatever the case, you want that disk 
space back, and you might not be able to truncate.

In STCS and LCS, row tombstones are intermingled with column data and column 
tombstones. But a row tombstone represents a big event: large amounts of 
"droppable" data from an sstable, or even a shortcut from reading data from 
other sstables.

I am wondering that if row tombstones were isolated in their own sstables, 
separately compacted and merged, that it might enable compaction to work more 
efficiently: 

reads can prioritize bloom filter lookups that indicate a row tombstone, 
getting the timestamp of the deletion first, then can use that in the data 
sstables to filter data or shortcircuit the data if the row data had an overall 
"most recent data timestamp". 

compaction could be forced to reference all the row tombstone sstables, such 
that every time two or more "data" sstables are compacted, they must reference 
the row tombstones to purge data. 

In LCS, this would be particularly useful in getting data out of the upper 
levels without having to wait for data to trickle up the tree. The row 
tombstones, being read-only inputs into the data sstable compactions, can be 
referenced in each of the LCS levels' parallel compactors. 

Based on discussions in the dev list, this would appear to require some sort of 
customization to the memtable->sstable flushing process, and perhaps a 
different set of bloom filters. 

Since the row tombstone sstables are all ,, they 
should be comparitively smaller and take less time to compact. They could be 
aggressively compacted on a different schedule than "data" sstables. 

In addition, i

[jira] [Updated] (CASSANDRA-14279) Row Tombstones in separate sstables / separate compaction path

2018-02-27 Thread Constance Eustace (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Constance Eustace updated CASSANDRA-14279:
--
Description: 
In my experience if data is not well organized into time windowed sstables, 
cassandra has enormous difficulty in actually deleting data if the data has a 
"medium term" lifetime and is commingled with data that isn't marked for death, 
as would happen with compactions or intermingled write patterns. Or for 
example, you might have an active working set and be archiving "unused" data to 
other tables or clusters. Or you may be purging data. Or you may be 
migrating/sharding/restructuring data. Whatever the case, you want that disk 
space back, and you might not be able to truncate.

In STCS and LCS, row tombstones are intermingled with column data and column 
tombstones. But a row tombstone represents a big event: large amounts of 
"droppable" data from an sstable, or even a shortcut from reading data from 
other sstables.

I am wondering that if row tombstones were isolated in their own sstables, 
separately compacted and merged, that it might enable compaction to work more 
efficiently: 

reads can prioritize bloom filter lookups that indicate a row tombstone, 
getting the timestamp of the deletion first, then can use that in the data 
sstables to filter data or shortcircuit the data if the row data had an overall 
"most recent data timestamp". 

compaction could be forced to reference all the row tombstone sstables, such 
that every time two or more "data" sstables are compacted, they must reference 
the row tombstones to purge data. 

In LCS, this would be particularly useful in getting data out of the upper 
levels without having to wait for data to trickle up the tree. The row 
tombstones, being read-only inputs into the data sstable compactions, can be 
referenced in each of the LCS levels' parallel compactors. 

Based on discussions in the dev list, this would appear to require some sort of 
customization to the memtable->sstable flushing process, and perhaps a 
different set of bloom filters. 

Since the row tombstone sstables are all ,, they 
should be comparitively smaller and take less time to compact. They could be 
aggressively compacted on a different schedule than "data" sstables. 

In addition, it may be easier to repair/synchronize row tombstones across the 
cluster if they have already been separated into their own sstables.

Column/range tombstones may also benefit from a similar separation, but my 
guess is those are much more numerous and large and fine-grained that they 
might as well coexist with the data.

  was:
In my experience if data is not well organized into time windowed sstables, 
cassandra has enormous difficulty in actually deleting data if the data has a 
"medium term" lifetime. Or for example, you might have an active working set 
and be archiving "unused" data to other tables or clusters. Or you may be 
purging data. Or you may be migrating/sharding data. Whatever the case, you 
want that disk space back. 

In STCS and LCS, row tombstones are intermingled with column data and column 
tombstones. But a row tombstone represents a big event: large amounts of 
"droppable" data from an sstable, or even a shortcut from reading data from 
other sstables.

I am wondering that if row tombstones were isolated in their own sstables, 
separately compacted and merged, that it might enable compaction to work more 
efficiently: 

reads can prioritize bloom filter lookups that indicate a row tombstone, 
getting the timestamp of the deletion first, then can use that in the data 
sstables to filter data or shortcircuit the data if the row data had an overall 
"most recent data timestamp". 

compaction could be forced to reference all the row tombstone sstables, such 
that every time two or more "data" sstables are compacted, they must reference 
the row tombstones to purge data. 

In LCS, this would be particularly useful in getting data out of the upper 
levels without having to wait for data to trickle up the tree. The row 
tombstones, being read-only inputs into the data sstable compactions, can be 
referenced in each of the LCS levels' parallel compactors. 

Based on discussions in the dev list, this would appear to require some sort of 
customization to the memtable->sstable flushing process, and perhaps a 
different set of bloom filters. 

Since the row tombstone sstables are all ,, they 
should be comparitively smaller and take less time to compact. They could be 
aggressively compacted on a different schedule than "data" sstables. 

In addition, it may be easier to repair/synchronize row tombstones across the 
cluster if they have already been separated into their own sstables.

Column/range tombstones may also benefit from a similar separation, but my 
guess is those are much more numerous and large and fine-grained that they 
might as well coex

[jira] [Updated] (CASSANDRA-14279) Row Tombstones in separate sstables / separate compaction path

2018-02-27 Thread Constance Eustace (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Constance Eustace updated CASSANDRA-14279:
--
Component/s: (was: Lifecycle)
 Repair

> Row Tombstones in separate sstables / separate compaction path
> --
>
> Key: CASSANDRA-14279
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14279
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Local Write-Read Paths, Repair
>Reporter: Constance Eustace
>Priority: Major
>
> In my experience if data is not well organized into time windowed sstables, 
> cassandra has enormous difficulty in actually deleting data if the data has a 
> "medium term" lifetime. Or for example, you might have an active working set 
> and be archiving "unused" data to other tables or clusters. Or you may be 
> purging data. Or you may be migrating/sharding data. Whatever the case, you 
> want that disk space back. 
> In STCS and LCS, row tombstones are intermingled with column data and column 
> tombstones. But a row tombstone represents a big event: large amounts of 
> "droppable" data from an sstable, or even a shortcut from reading data from 
> other sstables.
> I am wondering that if row tombstones were isolated in their own sstables, 
> separately compacted and merged, that it might enable compaction to work more 
> efficiently: 
> reads can prioritize bloom filter lookups that indicate a row tombstone, 
> getting the timestamp of the deletion first, then can use that in the data 
> sstables to filter data or shortcircuit the data if the row data had an 
> overall "most recent data timestamp". 
> compaction could be forced to reference all the row tombstone sstables, such 
> that every time two or more "data" sstables are compacted, they must 
> reference the row tombstones to purge data. 
> In LCS, this would be particularly useful in getting data out of the upper 
> levels without having to wait for data to trickle up the tree. The row 
> tombstones, being read-only inputs into the data sstable compactions, can be 
> referenced in each of the LCS levels' parallel compactors. 
> Based on discussions in the dev list, this would appear to require some sort 
> of customization to the memtable->sstable flushing process, and perhaps a 
> different set of bloom filters. 
> Since the row tombstone sstables are all ,, they 
> should be comparitively smaller and take less time to compact. They could be 
> aggressively compacted on a different schedule than "data" sstables. 
> In addition, it may be easier to repair/synchronize row tombstones across the 
> cluster if they have already been separated into their own sstables.
> Column/range tombstones may also benefit from a similar separation, but my 
> guess is those are much more numerous and large and fine-grained that they 
> might as well coexist with the data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14279) Row Tombstones in separate sstables / separate compaction path

2018-02-27 Thread Constance Eustace (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Constance Eustace updated CASSANDRA-14279:
--
Component/s: Local Write-Read Paths
 Lifecycle
 Compaction

> Row Tombstones in separate sstables / separate compaction path
> --
>
> Key: CASSANDRA-14279
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14279
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Lifecycle, Local Write-Read Paths
>Reporter: Constance Eustace
>Priority: Major
>
> In my experience if data is not well organized into time windowed sstables, 
> cassandra has enormous difficulty in actually deleting data if the data has a 
> "medium term" lifetime. Or for example, you might have an active working set 
> and be archiving "unused" data to other tables or clusters. Or you may be 
> purging data. Or you may be migrating/sharding data. Whatever the case, you 
> want that disk space back. 
> In STCS and LCS, row tombstones are intermingled with column data and column 
> tombstones. But a row tombstone represents a big event: large amounts of 
> "droppable" data from an sstable, or even a shortcut from reading data from 
> other sstables.
> I am wondering that if row tombstones were isolated in their own sstables, 
> separately compacted and merged, that it might enable compaction to work more 
> efficiently: 
> reads can prioritize bloom filter lookups that indicate a row tombstone, 
> getting the timestamp of the deletion first, then can use that in the data 
> sstables to filter data or shortcircuit the data if the row data had an 
> overall "most recent data timestamp". 
> compaction could be forced to reference all the row tombstone sstables, such 
> that every time two or more "data" sstables are compacted, they must 
> reference the row tombstones to purge data. 
> In LCS, this would be particularly useful in getting data out of the upper 
> levels without having to wait for data to trickle up the tree. The row 
> tombstones, being read-only inputs into the data sstable compactions, can be 
> referenced in each of the LCS levels' parallel compactors. 
> Based on discussions in the dev list, this would appear to require some sort 
> of customization to the memtable->sstable flushing process, and perhaps a 
> different set of bloom filters. 
> Since the row tombstone sstables are all ,, they 
> should be comparitively smaller and take less time to compact. They could be 
> aggressively compacted on a different schedule than "data" sstables. 
> In addition, it may be easier to repair/synchronize row tombstones across the 
> cluster if they have already been separated into their own sstables.
> Column/range tombstones may also benefit from a similar separation, but my 
> guess is those are much more numerous and large and fine-grained that they 
> might as well coexist with the data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14279) Row Tombstones in separate sstables / separate compaction path

2018-02-27 Thread Constance Eustace (JIRA)
Constance Eustace created CASSANDRA-14279:
-

 Summary: Row Tombstones in separate sstables / separate compaction 
path
 Key: CASSANDRA-14279
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14279
 Project: Cassandra
  Issue Type: Improvement
Reporter: Constance Eustace


In my experience if data is not well organized into time windowed sstables, 
cassandra has enormous difficulty in actually deleting data if the data has a 
"medium term" lifetime. Or for example, you might have an active working set 
and be archiving "unused" data to other tables or clusters. Or you may be 
purging data. Or you may be migrating/sharding data. Whatever the case, you 
want that disk space back. 

In STCS and LCS, row tombstones are intermingled with column data and column 
tombstones. But a row tombstone represents a big event: large amounts of 
"droppable" data from an sstable, or even a shortcut from reading data from 
other sstables.

I am wondering that if row tombstones were isolated in their own sstables, 
separately compacted and merged, that it might enable compaction to work more 
efficiently: 

reads can prioritize bloom filter lookups that indicate a row tombstone, 
getting the timestamp of the deletion first, then can use that in the data 
sstables to filter data or shortcircuit the data if the row data had an overall 
"most recent data timestamp". 

compaction could be forced to reference all the row tombstone sstables, such 
that every time two or more "data" sstables are compacted, they must reference 
the row tombstones to purge data. 

In LCS, this would be particularly useful in getting data out of the upper 
levels without having to wait for data to trickle up the tree. The row 
tombstones, being read-only inputs into the data sstable compactions, can be 
referenced in each of the LCS levels' parallel compactors. 

Based on discussions in the dev list, this would appear to require some sort of 
customization to the memtable->sstable flushing process, and perhaps a 
different set of bloom filters. 

Since the row tombstone sstables are all ,, they 
should be comparitively smaller and take less time to compact. They could be 
aggressively compacted on a different schedule than "data" sstables. 

In addition, it may be easier to repair/synchronize row tombstones across the 
cluster if they have already been separated into their own sstables.

Column/range tombstones may also benefit from a similar separation, but my 
guess is those are much more numerous and large and fine-grained that they 
might as well coexist with the data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14278) Testing

2018-02-27 Thread Sumant Sahney (JIRA)
Sumant Sahney created CASSANDRA-14278:
-

 Summary: Testing
 Key: CASSANDRA-14278
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14278
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Sumant Sahney


Test to see if all the logs are written correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14277) Update Log Files in Patches / Modularly

2018-02-27 Thread Sumant Sahney (JIRA)
Sumant Sahney created CASSANDRA-14277:
-

 Summary: Update Log Files in Patches / Modularly 
 Key: CASSANDRA-14277
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14277
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Sumant Sahney


Make changes in the Logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries

2018-02-27 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379161#comment-16379161
 ] 

Michael Kjellman commented on CASSANDRA-14247:
--

[~michaelsembwever]: what's the reasoning for using "░" as the delimiter?

> SASI tokenizer for simple delimiter based entries
> -
>
> Key: CASSANDRA-14247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14247
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: mck
>Assignee: mck
>Priority: Major
> Fix For: 4.0, 3.11.x
>
>
> Currently SASI offers only two tokenizer options:
>  - NonTokenizerAnalyser
>  - StandardAnalyzer
> The latter is built upon Snowball, powerful for human languages but overkill 
> for simple tokenization.
> A simple tokenizer is proposed here. The need for this arose as a workaround 
> of CASSANDRA-11182, and to avoid the disk usage explosion when having to 
> resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861
> Example use of this would be:
> {code}
> CREATE CUSTOM INDEX span_annotation_query_idx 
> ON zipkin2.span (annotation_query) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' 
> WITH OPTIONS = {
> 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', 
> 'delimiter': '░',
> 'case_sensitive': 'true', 
> 'mode': 'prefix', 
> 'analyzed': 'true'};
> {code}
> Original credit for this work goes to https://github.com/zuochangan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14276) Walk through the code

2018-02-27 Thread Sumant Sahney (JIRA)
Sumant Sahney created CASSANDRA-14276:
-

 Summary:  Walk through the code
 Key: CASSANDRA-14276
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14276
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Sumant Sahney


1. Walk through the code and understand the modules logging size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12151) Audit logging for database activity

2018-02-27 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379130#comment-16379130
 ] 

Jeremiah Jordan commented on CASSANDRA-12151:
-

One of the goals should be recording queries with  minimal impact on workloads, 
which was also a goal of CASSANDRA-13983, so I would think some re-use might be 
a good idea rather than coming up with a new way of doing that.

> Audit logging for database activity
> ---
>
> Key: CASSANDRA-12151
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12151
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: stefan setyadi
>Assignee: Anuj Wadehra
>Priority: Major
> Fix For: 4.x
>
> Attachments: 12151.txt, 
> DesignProposal_AuditingFeature_ApacheCassandra_v1.docx
>
>
> we would like a way to enable cassandra to log database activity being done 
> on our server.
> It should show username, remote address, timestamp, action type, keyspace, 
> column family, and the query statement.
> it should also be able to log connection attempt and changes to the 
> user/roles.
> I was thinking of making a new keyspace and insert an entry for every 
> activity that occurs.
> Then It would be possible to query for specific activity or a query targeting 
> a specific keyspace and column family.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14252) Use zero as default score in DynamicEndpointSnitch

2018-02-27 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379131#comment-16379131
 ] 

Dikang Gu commented on CASSANDRA-14252:
---

[~szhou], Yes, it's the warm up phase. We have to know the distance/latency 
differences between different replicas, otherwise we will have no way to fall 
back to remote replicas. One idea to limit unnecessary requests to remote 
replica is to only fall back when local node is really bad. Something like this:

if ({color:red}subsnitchScore > 0.5{color} && subsnitchScore > 
(sortedScoreIterator.next() * (1.0 + dynamicBadnessThreshold)))
{
sortByProximityWithScore(address, addresses);
return;
}

of course, the param 0.5 can be tunable.


> Use zero as default score in DynamicEndpointSnitch
> --
>
> Key: CASSANDRA-14252
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14252
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
> Fix For: 4.0, 3.0.17, 3.11.3
>
>
> The problem I want to solve is that I found in our deployment, one slow but 
> alive data node can slow down the whole cluster, even caused timeout of our 
> requests. 
> We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the 
> DynamicEndpointSnitch switch to sortByProximityWithScore, if local data node 
> latency is too high.
> I added some debug log, and figured out that in a lot of cases, the score 
> from remote data node was not populated, so the fallback to 
> sortByProximityWithScore never happened. That's why a single slow data node, 
> can cause huge problems to the whole cluster.
> In this jira, I'd like to use zero as default score, so that we will get a 
> chance to try remote data node, if local one is slow. 
> I tested it in our test cluster, it improved the client latency in single 
> slow data node case significantly.  
> I flag this as a Bug, because it caused problems to our use cases multiple 
> times.
>   logs ===
> _2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  _2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [0.0]_
>  _2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  _2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: 
> sortByProximityWithBadness: after sorting by proximity, addresses order 
> change to [ip1, ip2], with scores [1.0]_
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-8341) Expose time spent in each thread pool

2018-02-27 Thread Chris Lohfink (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink updated CASSANDRA-8341:
-
Status: Patch Available  (was: Open)

> Expose time spent in each thread pool
> -
>
> Key: CASSANDRA-8341
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8341
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Minor
>  Labels: metrics
> Attachments: 8341.patch, 8341v2.txt
>
>
> Can increment a counter with time spent in each queue.  This can provide 
> context on how much time is spent percentage wise in each stage.  
> Additionally can be used with littles law in future if ever want to try to 
> tune the size of the pools.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8341) Expose time spent in each thread pool

2018-02-27 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379093#comment-16379093
 ] 

Chris Lohfink commented on CASSANDRA-8341:
--

Patch changed output of tpstats to look like:

{code}
Pool NameActive Pending Completed Blocked AllTimeBlocked 
CPU[ms/sec] Allocations[mb/s]
AntiEntropyStage 0  0   0 0   0  0  
 0
CacheCleanupExecutor 0  0   0 0   0  0  
 0
CompactionExecutor   0  0   1013  0   0  0  
 0
CounterMutationStage 0  0   0 0   0  0  
 0
GossipStage  0  0   0 0   0  0  
 0
HintsDispatcher  0  0   0 0   0  0  
 0
InternalResponseStage0  0   0 0   0  0  
 0
MemtableFlushWriter  0  0   1 0   0  0  
 0
MemtablePostFlush0  0   2 0   0  0  
 0
MemtableReclaimMemory0  0   1 0   0  0  
 0
MigrationStage   0  0   0 0   0  0  
 0
MiscStage0  0   0 0   0  0  
 0
MutationStage0  0   1367500   0  27 
 2
Native-Transport-Requests11 0   36191532  0   0  
1839566  
PendingRangeCalculator   0  0   2 0   0  0  
 0
PerDiskMemtableFlushWriter_0 0  0   1 0   0  0  
 0
ReadRepairStage  0  0   0 0   0  0  
 0
ReadStage0  0   22662142  0   0  
349 58   
Repair-Task  0  0   0 0   0  0  
 0
RequestResponseStage 0  0   0 0   0  0  
 0
Sampler  0  0   0 0   0  0  
 0
SecondaryIndexManagement 0  0   0 0   0  0  
 0
ValidationExecutor   0  0   0 0   0  0  
 0
ViewBuildExecutor0  0   0 0   0  0  
 0
ViewMutationStage0  0   0 0   0  0  
 0
{code}

> Expose time spent in each thread pool
> -
>
> Key: CASSANDRA-8341
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8341
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Minor
>  Labels: metrics
> Attachments: 8341.patch, 8341v2.txt
>
>
> Can increment a counter with time spent in each queue.  This can provide 
> context on how much time is spent percentage wise in each stage.  
> Additionally can be used with littles law in future if ever want to try to 
> tune the size of the pools.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8341) Expose time spent in each thread pool

2018-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379090#comment-16379090
 ] 

ASF GitHub Bot commented on CASSANDRA-8341:
---

GitHub user clohfink opened a pull request:

https://github.com/apache/cassandra/pull/200

Add tpstats cpu and alloc rate tracking for CASSANDRA-8341



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/clohfink/cassandra 8341

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cassandra/pull/200.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #200


commit 1d47a63f5c334998cb8e948f5114c1e3cbc32103
Author: Chris Lohfink 
Date:   2018-02-27T18:04:39Z

Add tpstats cpu and alloc rate tracking for CASSANDRA-8341




> Expose time spent in each thread pool
> -
>
> Key: CASSANDRA-8341
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8341
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Minor
>  Labels: metrics
> Attachments: 8341.patch, 8341v2.txt
>
>
> Can increment a counter with time spent in each queue.  This can provide 
> context on how much time is spent percentage wise in each stage.  
> Additionally can be used with littles law in future if ever want to try to 
> tune the size of the pools.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-8341) Expose time spent in each thread pool

2018-02-27 Thread Chris Lohfink (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink reassigned CASSANDRA-8341:


Assignee: Chris Lohfink

> Expose time spent in each thread pool
> -
>
> Key: CASSANDRA-8341
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8341
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Minor
>  Labels: metrics
> Attachments: 8341.patch, 8341v2.txt
>
>
> Can increment a counter with time spent in each queue.  This can provide 
> context on how much time is spent percentage wise in each stage.  
> Additionally can be used with littles law in future if ever want to try to 
> tune the size of the pools.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12151) Audit logging for database activity

2018-02-27 Thread Anuj Wadehra (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379076#comment-16379076
 ] 

Anuj Wadehra commented on CASSANDRA-12151:
--

Thanks for the review comments !

[~vinaykumarcse]  I will go through your patch and share my comments.

[~jjordan] I had a look at  -CASSANDRA-13983-  The use cases are quite 
different. Yes we can have a chronicle-queue variant of Audit logger like 
[~vinaykumarcse] said but I think we should start with a simple logger 
implementation unless we have real good reasons to go with chronicle-queue for 
Audit logging.

> Audit logging for database activity
> ---
>
> Key: CASSANDRA-12151
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12151
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: stefan setyadi
>Assignee: Anuj Wadehra
>Priority: Major
> Fix For: 4.x
>
> Attachments: 12151.txt, 
> DesignProposal_AuditingFeature_ApacheCassandra_v1.docx
>
>
> we would like a way to enable cassandra to log database activity being done 
> on our server.
> It should show username, remote address, timestamp, action type, keyspace, 
> column family, and the query statement.
> it should also be able to log connection attempt and changes to the 
> user/roles.
> I was thinking of making a new keyspace and insert an entry for every 
> activity that occurs.
> Then It would be possible to query for specific activity or a query targeting 
> a specific keyspace and column family.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



cassandra-builds git commit: can't use --depth with apache git repo

2018-02-27 Thread marcuse
Repository: cassandra-builds
Updated Branches:
  refs/heads/master 2c1842cef -> f6079f9eb


can't use --depth with apache git repo


Project: http://git-wip-us.apache.org/repos/asf/cassandra-builds/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra-builds/commit/f6079f9e
Tree: http://git-wip-us.apache.org/repos/asf/cassandra-builds/tree/f6079f9e
Diff: http://git-wip-us.apache.org/repos/asf/cassandra-builds/diff/f6079f9e

Branch: refs/heads/master
Commit: f6079f9eba752b76fc34371f344332afa46e6026
Parents: 2c1842c
Author: Marcus Eriksson 
Authored: Tue Feb 27 09:14:32 2018 -0800
Committer: Marcus Eriksson 
Committed: Tue Feb 27 09:14:32 2018 -0800

--
 docker/jenkins/jenkinscommand.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra-builds/blob/f6079f9e/docker/jenkins/jenkinscommand.sh
--
diff --git a/docker/jenkins/jenkinscommand.sh b/docker/jenkins/jenkinscommand.sh
index 5e4741d..8a29dd7 100644
--- a/docker/jenkins/jenkinscommand.sh
+++ b/docker/jenkins/jenkinscommand.sh
@@ -8,8 +8,8 @@ BRANCH=$2
 DTEST_REPO=$3
 DTEST_BRANCH=$4
 EOF
-echo "jenkinscommand.sh: running: git clone --depth=1 --branch $BUILDSBRANCH 
$BUILDSREPO; sh ./cassandra-builds/docker/jenkins/dtest.sh $7"
-ID=$(docker run --env-file env.list -dt $DOCKER_IMAGE dumb-init bash -ilc "git 
clone --depth=1 --branch $BUILDSBRANCH $BUILDSREPO; sh 
./cassandra-builds/docker/jenkins/dtest.sh $7")
+echo "jenkinscommand.sh: running: git clone --branch $BUILDSBRANCH 
$BUILDSREPO; sh ./cassandra-builds/docker/jenkins/dtest.sh $7"
+ID=$(docker run --env-file env.list -dt $DOCKER_IMAGE dumb-init bash -ilc "git 
clone --branch $BUILDSBRANCH $BUILDSREPO; sh 
./cassandra-builds/docker/jenkins/dtest.sh $7")
 # use docker attach instead of docker wait to get output
 docker attach --no-stdin $ID
 echo "$ID done, copying files"


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14264) Quick Tour Document for dev's that want to get oriented on the code efficiently.

2018-02-27 Thread Constance Eustace (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378947#comment-16378947
 ] 

Constance Eustace commented on CASSANDRA-14264:
---

I am currently trying to understand the internals of SStables. It would be 
really nice to have a walkthrough of viewing, filtering, parsing, merging, 
manipulating, etc small-scale sstables via cassandra code. If I can figure it 
out myself I'll try to provide some writeup.

A detailed explanation of what the system tables represent would be nice. 

As for the code, there are a couple critical paths to cassandra data:

1) incoming mutations going to commit log and memtable
2) flushing mutations going from memtable to sstable
3) sstables being compacted and organized
4) coordinator node actions on queries/writes and ensuring consistency levels 
are adhered to
5) queries processing against memtable and bloom filters/sstable lookup

We could write code path explanations for those, might be very very helpful. 

 

> Quick Tour Document for dev's that want to get oriented on the code 
> efficiently.
> 
>
> Key: CASSANDRA-14264
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14264
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Kenneth Brotman
>Priority: Major
>
> Create a Quick Tour Document for dev's that want to get oriented on the code 
> efficiently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14002) Don't use SHA256 when building merkle trees

2018-02-27 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378941#comment-16378941
 ] 

Michael Kjellman commented on CASSANDRA-14002:
--

+1 to rebase.

> Don't use SHA256 when building merkle trees
> ---
>
> Key: CASSANDRA-14002
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14002
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.x
>
>
> We should avoid using SHA-2 when building merkle trees as we don't need a 
> cryptographic hash function for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13480) nodetool repair can hang forever if we lose the notification for the repair completing/failing

2018-02-27 Thread Tania S Engel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378919#comment-16378919
 ] 

Tania S Engel commented on CASSANDRA-13480:
---

[~mbyrd] : I have reason to believe I just hit this in 3.11.1, I at the very 
least ran into a repair which has never completed on an 11 node cluster. Is 
there a way to get this fix in 3.11?

> nodetool repair can hang forever if we lose the notification for the repair 
> completing/failing
> --
>
> Key: CASSANDRA-13480
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13480
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Minor
>  Labels: repair
> Fix For: 4.0
>
>
> When a Jmx lost notification occurs, sometimes the lost notification in 
> question is the notification which let's RepairRunner know that the repair is 
> finished (ProgressEventType.COMPLETE or even ERROR for that matter).
> This results in nodetool process running the repair hanging forever. 
> I have a test which reproduces the issue here:
> https://github.com/Jollyplum/cassandra-dtest/tree/repair_hang_test
> To fix this, If on receiving a notification that notifications have been lost 
> (JMXConnectionNotification.NOTIFS_LOST), we instead query a new endpoint via 
> Jmx to receive all the relevant notifications we're interested in, we can 
> replay those we missed and avoid this scenario.
> It's possible also that the JMXConnectionNotification.NOTIFS_LOST itself 
> might be lost and so for good measure I have made RepairRunner poll 
> periodically to see if there were any notifications that had been sent but we 
> didn't receive (scoped just to the particular tag for the given repair).
> Users who don't use nodetool but go via jmx directly, can still use this new 
> endpoint and implement similar behaviour in their clients as desired.
> I'm also expiring the notifications which have been kept on the server side.
> Please let me know if you've any questions or can think of a different 
> approach, I also tried setting:
>  JVM_OPTS="$JVM_OPTS -Djmx.remote.x.notification.buffer.size=5000"
> but this didn't fix the test. I suppose it might help under certain scenarios 
> but in this test we don't even send that many notifications so I'm not 
> surprised it doesn't fix it.
> It seems like getting lost notifications is always a potential problem with 
> jmx as far as I can tell.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12151) Audit logging for database activity

2018-02-27 Thread Vinay Chella (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377534#comment-16377534
 ] 

Vinay Chella edited comment on CASSANDRA-12151 at 2/27/18 4:48 PM:
---

Hi [~eanujwa]  [~jasobrown],

I’m excited to see the design document and it looks good to us!

Netflix had a similar requirement recently for our internal 2.1 clusters and we 
implemented a simple version (no query categories, etc…) for sox auditing. As 
your design is very close to what we implemented, just a few differently named 
classes for the most part, can we work together on the trunk 
[patchset|https://github.com/vinaykumarchella/cassandra/pull/2] to add the 
missing components from your design? Alternatively, we could take an 
incremental approach, review what we have on the trunk branch of the simple 
version and get it committed and then add in some of the more advanced features 
next. I believe this patch follows the design goals that you put together.

Please review and let me know if you have any questions or concerns about the 
first iteration. If folks are interested in the 3.x/2.x branches I can put 
those up on my github as well.

[~jhb]
{quote}I just have one question, do you think enabling/updating/disabling audit 
require a node restart?
{quote}
The posted patch allows online auditlog enable/disable via JMX.

[~jjordan]
{quote}You should take a look at the infrastructure added in CASSANDRA-13983 
for query logging
{quote}
Yes, we looked and that certainly looks interesting, perhaps this design allows 
us to use it as another implementation of {{IAuditLogger}}?

Here is the patch location:

||[trunk|https://github.com/vinaykumarchella/cassandra]||
|[PR for Trunk|https://github.com/vinaykumarchella/cassandra/pull/2]|






was (Author: vinaykumarcse):
Hi [~eanujwa]  [~jasobrown],

I’m excited to see the design document and it looks good to us!

Netflix had a similar requirement recently for our internal 2.1 clusters and we 
implemented a simple version (no query categories, etc…) for sox auditing. As 
your design is very close to what we implemented, just a few differently named 
classes for the most part, can we work together on the trunk 
[patchset|https://github.com/vinaykumarchella/cassandra/pull/2] to add the 
missing components from your design? Alternatively, we could take an 
incremental approach, review what we have on the trunk branch of the simple 
version and get it committed and then add in some of the more advanced features 
next. I believe this patch follows the design goals that you put together.

Please review and let me know if you have any questions or concerns about the 
first iteration. If folks are interested in the 3.x/2.x branches I can put 
those up on my github as well.

[~jhb]
{quote}I just have one question, do you think enabling/updating/disabling audit 
require a node restart?
{quote}
The posted patch allows online auditlog enable/disable via JMX.

[~jjordan]
{quote}You should take a look at the infrastructure added in CASSANDRA-13983 
for query logging
{quote}
Yes, we looked and that certainly looks interesting, perhaps this design allows 
us to use it as another implementation of {{IAuditLogger}}?

> Audit logging for database activity
> ---
>
> Key: CASSANDRA-12151
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12151
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: stefan setyadi
>Assignee: Anuj Wadehra
>Priority: Major
> Fix For: 4.x
>
> Attachments: 12151.txt, 
> DesignProposal_AuditingFeature_ApacheCassandra_v1.docx
>
>
> we would like a way to enable cassandra to log database activity being done 
> on our server.
> It should show username, remote address, timestamp, action type, keyspace, 
> column family, and the query statement.
> it should also be able to log connection attempt and changes to the 
> user/roles.
> I was thinking of making a new keyspace and insert an entry for every 
> activity that occurs.
> Then It would be possible to query for specific activity or a query targeting 
> a specific keyspace and column family.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13668) Database user auditing events

2018-02-27 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-13668:
---
Status: Patch Available  (was: In Progress)

> Database user auditing events
> -
>
> Key: CASSANDRA-13668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13668
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Major
> Fix For: 4.x
>
>
> With the availability of CASSANDRA-13459, any native transport enabled client 
> will be able to subscribe to internal Cassandra events. External tools can 
> take advantage by monitoring these events in various ways. Use-cases for this 
> can be e.g. auditing tools for compliance and security purposes.
> The scope of this ticket is to add diagnostic events that are raised around 
> authentication and CQL operations. These events can then be consumed and used 
> by external tools to implement a Cassandra user auditing solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling

2018-02-27 Thread Oleksandr Shulgin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378235#comment-16378235
 ] 

Oleksandr Shulgin edited comment on CASSANDRA-14210 at 2/27/18 8:35 AM:


We are observing a very similar problem with ordinary compaction.  Not sure if 
the proposed change could cover both (with the difference that in compaction 
you likely want to start with the smallest tables first, but this is up to the 
actual compaction strategy).

A node runs with {{concurrent_compactors=2}} and is doing a rather big 
compaction (> 200 GB) on a table.  At the same time, a lot of small files are 
streamed in by repair, for a different table.  Number of {{*-Data.db}} files 
for that other table grows as high as 5,500 and estimated number of pending 
compaction tasks for this node jumps to over 180.  But no compaction is started 
for the table with a lot of small data files, up until the only current 
compaction task finishes.  Why is that so?  I would expect that a free 
compaction slot is utilized immediately for new tasks.



was (Author: oshulgin):
We are observing a very similar problem with ordinary compaction.  Not sure if 
the proposed change could cover both (with the difference that in compaction 
you likely want to start with the smallest tables first, but this is up to the 
actual compaction strategy).

A node runs with {{concurrent_compactors=2}} and is doing a rather big 
compaction (> 200 GB) on a table.  At the same time, a lot of small files are 
streamed in by repair, for a different table.  Number of {{*-Data.db}} for that 
other table grows as high as 5,500 and estimated number of pending compaction 
tasks for this node jumps to over 180.  But no compaction is started for the 
table with a lot of small data files, up until the only current compaction task 
finishes.  Why is that so?  I would expect that a free compaction slot is 
utilized immediately for new tasks.


> Optimize SSTables upgrade task scheduling
> -
>
> Key: CASSANDRA-14210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14210
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Oleksandr Shulgin
>Assignee: Kurt Greaves
>Priority: Major
> Fix For: 4.x
>
>
> When starting the SSTable-rewrite process by running {{nodetool 
> upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are 
> used.
> For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}.  
> What we observed both for version 2.2 and 3.0, is that initially all 4 
> provided slots are used for "Upgrade sstables" compactions, but later when 
> some of the 4 tasks are finished, no new tasks are scheduled immediately.  It 
> takes the last of the 4 tasks to finish before new 4 tasks would be 
> scheduled.  This happens on every node we've observed.
> This doesn't utilize available resources to the full extent allowed by the 
> --jobs N parameter.  In the field, on a cluster of 12 nodes with 4-5 TiB data 
> each, we've seen that the whole process was taking more than 7 days, instead 
> of estimated 1.5-2 days (provided there would be close to full N slots 
> utilization).
> Instead, new tasks should be scheduled as soon as there is a free compaction 
> slot.
> Additionally, starting from the biggest SSTables could further reduce the 
> total time required for the whole process to finish on any given node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling

2018-02-27 Thread Oleksandr Shulgin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378235#comment-16378235
 ] 

Oleksandr Shulgin commented on CASSANDRA-14210:
---

We are observing a very similar problem with ordinary compaction.  Not sure if 
the proposed change could cover both (with the difference that in compaction 
you likely want to start with the smallest tables first, but this is up to the 
actual compaction strategy).

A node runs with {{concurrent_compactors=2}} and is doing a rather big 
compaction (> 200 GB) on a table.  At the same time, a lot of small files are 
streamed in by repair, for a different table.  Number of {{*-Data.db}} for that 
other table grows as high as 5,500 and estimated number of pending compaction 
tasks for this node jumps to over 180.  But no compaction is started for the 
table with a lot of small data files, up until the only current compaction task 
finishes.  Why is that so?  I would expect that a free compaction slot is 
utilized immediately for new tasks.


> Optimize SSTables upgrade task scheduling
> -
>
> Key: CASSANDRA-14210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14210
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Oleksandr Shulgin
>Assignee: Kurt Greaves
>Priority: Major
> Fix For: 4.x
>
>
> When starting the SSTable-rewrite process by running {{nodetool 
> upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are 
> used.
> For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}.  
> What we observed both for version 2.2 and 3.0, is that initially all 4 
> provided slots are used for "Upgrade sstables" compactions, but later when 
> some of the 4 tasks are finished, no new tasks are scheduled immediately.  It 
> takes the last of the 4 tasks to finish before new 4 tasks would be 
> scheduled.  This happens on every node we've observed.
> This doesn't utilize available resources to the full extent allowed by the 
> --jobs N parameter.  In the field, on a cluster of 12 nodes with 4-5 TiB data 
> each, we've seen that the whole process was taking more than 7 days, instead 
> of estimated 1.5-2 days (provided there would be close to full N slots 
> utilization).
> Instead, new tasks should be scheduled as soon as there is a free compaction 
> slot.
> Additionally, starting from the biggest SSTables could further reduce the 
> total time required for the whole process to finish on any given node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-5836) Seed nodes should be able to bootstrap without manual intervention

2018-02-27 Thread Oleksandr Shulgin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378207#comment-16378207
 ] 

Oleksandr Shulgin commented on CASSANDRA-5836:
--

[~jjirsa] thanks for reopening this.  Before suggesting a fix I'd like to have 
a better understanding of what the bootstrap process really is.
[~jbellis] could you please elaborate on the "special cases" you've mentioned?

In the literature I can find definitions akin to "Bootstrapping is the process 
of claiming token ranges and streaming in the data from other nodes".  This 
cannot be accurate, because the nodes which don't bootstrap (seeds or the ones 
having {{auto_bootstrap}} set to {{false}} explicitly) they also claim token 
ranges, the just don't stream the data in and are immediately responsible for 
handling read requests.

If I understand it correctly, the above definition is what really "joining the 
ring" is, i.e. "claiming token ranges and (optionally) streaming in the data".  
By this reasoning bootstrapping is only about "streaming in the data".  Is 
there anything else to the bootstrap process that I'm not aware of?  Please 
clarify.


> Seed nodes should be able to bootstrap without manual intervention
> --
>
> Key: CASSANDRA-5836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Bill Hathaway
>Priority: Minor
>
> The current logic doesn't allow a seed node to be bootstrapped.  If a user 
> wants to bootstrap a node configured as a seed (for example to replace a seed 
> node via replace_token), they first need to remove the node's own IP from the 
> seed list, and then start the bootstrap process.  This seems like an 
> unnecessary step since a node never uses itself as a seed.
> I think it would be a better experience if the logic was changed to allow a 
> seed node to bootstrap without manual intervention when there are other seed 
> nodes up in a ring.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10765) add RangeIterator interface and QueryPlan for SI

2018-02-27 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378193#comment-16378193
 ] 

Corentin Chary commented on CASSANDRA-10765:


Note: after having troubles like this with SASI, we ended moving to 
[https://github.com/Stratio/stratio-cassandra.] IMHO leveraging lucene instead 
of building yet another index makes much more sense. Would be great to see SASI 
using Lucene internally (even if it's somewhat against the current design).

Before using stratio we starting experimenting with a SASI-Like Lucene enabled 
index, see https://github.com/criteo/biggraphite/tree/master/tools/graphiteindex

> add RangeIterator interface and QueryPlan for SI
> 
>
> Key: CASSANDRA-10765
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10765
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>Priority: Major
>  Labels: 2i, sasi
> Fix For: 4.x
>
> Attachments: server-load.png
>
>
> Currently built-in indexes have only one way of handling 
> intersections/unions: pick the highest selectivity predicate and filter on 
> other index expressions. This is not always the most efficient approach. 
> Dynamic query planning based on the different index characteristics would be 
> more optimal. Query Plan should be able to choose how to do intersections, 
> unions based on the metadata provided by indexes (returned by RangeIterator) 
> and RangeIterator would became a base for cross index interactions and should 
> have information such as min/max token, estimate number of wrapped tokens etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org