[jira] [Commented] (CASSANDRA-14280) Fix timeout test - org.apache.cassandra.cql3.ViewTest
[ https://issues.apache.org/jira/browse/CASSANDRA-14280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379818#comment-16379818 ] Dinesh Joshi commented on CASSANDRA-14280: -- I still see ViewBuildTaskTest, TombstoneTest and ViewTest failing. Is this expected? > Fix timeout test - org.apache.cassandra.cql3.ViewTest > - > > Key: CASSANDRA-14280 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14280 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > Fix For: 4.0 > > > The test timeout very often, it seems too big, try to split it into multiple > tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[3/3] cassandra git commit: Merge branch 'cassandra-3.11' into trunk
Merge branch 'cassandra-3.11' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f7d140e2 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f7d140e2 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f7d140e2 Branch: refs/heads/trunk Commit: f7d140e2a934e343dedc7d4057784551d4adac48 Parents: b86801e c494696 Author: Jay Zhuang Authored: Tue Feb 27 21:47:24 2018 -0800 Committer: Jay Zhuang Committed: Tue Feb 27 21:47:24 2018 -0800 -- build.xml | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/f7d140e2/build.xml -- - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[2/3] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11
Merge branch 'cassandra-3.0' into cassandra-3.11 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c4946960 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c4946960 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c4946960 Branch: refs/heads/trunk Commit: c4946960a20e12f0f574b5608c886467466ee3b9 Parents: abd9be1 79cead0 Author: Jay Zhuang Authored: Tue Feb 27 21:37:59 2018 -0800 Committer: Jay Zhuang Committed: Tue Feb 27 21:40:25 2018 -0800 -- build.xml | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/c4946960/build.xml -- - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[1/3] cassandra git commit: Add new developer to build.xml file
Repository: cassandra Updated Branches: refs/heads/trunk b86801e95 -> f7d140e2a Add new developer to build.xml file Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/79cead09 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/79cead09 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/79cead09 Branch: refs/heads/trunk Commit: 79cead093e9a2fe8273f9c2ea85e7d8d9f8fabf2 Parents: d73f45b Author: Jay Zhuang Authored: Tue Feb 27 18:07:14 2018 -0800 Committer: Jay Zhuang Committed: Tue Feb 27 19:12:02 2018 -0800 -- build.xml | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/79cead09/build.xml -- diff --git a/build.xml b/build.xml index 6f98242..7bab97c 100644 --- a/build.xml +++ b/build.xml @@ -443,6 +443,7 @@ + - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[3/3] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11
Merge branch 'cassandra-3.0' into cassandra-3.11 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c4946960 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c4946960 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c4946960 Branch: refs/heads/cassandra-3.11 Commit: c4946960a20e12f0f574b5608c886467466ee3b9 Parents: abd9be1 79cead0 Author: Jay Zhuang Authored: Tue Feb 27 21:37:59 2018 -0800 Committer: Jay Zhuang Committed: Tue Feb 27 21:40:25 2018 -0800 -- build.xml | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/c4946960/build.xml -- - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[2/3] cassandra git commit: Add new developer to build.xml file
Add new developer to build.xml file Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/79cead09 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/79cead09 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/79cead09 Branch: refs/heads/cassandra-3.11 Commit: 79cead093e9a2fe8273f9c2ea85e7d8d9f8fabf2 Parents: d73f45b Author: Jay Zhuang Authored: Tue Feb 27 18:07:14 2018 -0800 Committer: Jay Zhuang Committed: Tue Feb 27 19:12:02 2018 -0800 -- build.xml | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/79cead09/build.xml -- diff --git a/build.xml b/build.xml index 6f98242..7bab97c 100644 --- a/build.xml +++ b/build.xml @@ -443,6 +443,7 @@ + - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[1/3] cassandra git commit: Add new developer to build.xml file
Repository: cassandra Updated Branches: refs/heads/cassandra-3.0 d73f45bad -> 79cead093 refs/heads/cassandra-3.11 abd9be1e4 -> c4946960a Add new developer to build.xml file Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/79cead09 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/79cead09 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/79cead09 Branch: refs/heads/cassandra-3.0 Commit: 79cead093e9a2fe8273f9c2ea85e7d8d9f8fabf2 Parents: d73f45b Author: Jay Zhuang Authored: Tue Feb 27 18:07:14 2018 -0800 Committer: Jay Zhuang Committed: Tue Feb 27 19:12:02 2018 -0800 -- build.xml | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/79cead09/build.xml -- diff --git a/build.xml b/build.xml index 6f98242..7bab97c 100644 --- a/build.xml +++ b/build.xml @@ -443,6 +443,7 @@ + - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14280) Fix timeout test - org.apache.cassandra.cql3.ViewTest
[ https://issues.apache.org/jira/browse/CASSANDRA-14280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous updated CASSANDRA-14280: -- Status: Ready to Commit (was: Patch Available) > Fix timeout test - org.apache.cassandra.cql3.ViewTest > - > > Key: CASSANDRA-14280 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14280 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > Fix For: 4.0 > > > The test timeout very often, it seems too big, try to split it into multiple > tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14280) Fix timeout test - org.apache.cassandra.cql3.ViewTest
[ https://issues.apache.org/jira/browse/CASSANDRA-14280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dikang Gu updated CASSANDRA-14280: -- Status: Patch Available (was: Open) Fix the timeout of ViewTest by changing from updateView("TRUNCATE %s") to execute("TRUNCATE %s"). Also split ViewTest to smaller unit tests. |[trunk| https://github.com/DikangGu/cassandra/commit/ae1b9695de7d3f20e52e93a6cdae4a25cdc2f19b]|[unit test | https://circleci.com/gh/DikangGu/cassandra/22] | > Fix timeout test - org.apache.cassandra.cql3.ViewTest > - > > Key: CASSANDRA-14280 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14280 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > Fix For: 4.0 > > > The test timeout very often, it seems too big, try to split it into multiple > tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379739#comment-16379739 ] Kurt Greaves commented on CASSANDRA-14210: -- [~krummas] I've set as RTC, but if you want to get another reviewer feel free to. [~oshulgin] that would be unrelated to this patch. This will only affect any tool where you can specify # of jobs (cleanups, upgradesstable, scrub). That sounds like a bug though and if you can get more info might be worth another JIRA. > Optimize SSTables upgrade task scheduling > - > > Key: CASSANDRA-14210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14210 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Oleksandr Shulgin >Assignee: Kurt Greaves >Priority: Major > Fix For: 4.x > > > When starting the SSTable-rewrite process by running {{nodetool > upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are > used. > For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}. > What we observed both for version 2.2 and 3.0, is that initially all 4 > provided slots are used for "Upgrade sstables" compactions, but later when > some of the 4 tasks are finished, no new tasks are scheduled immediately. It > takes the last of the 4 tasks to finish before new 4 tasks would be > scheduled. This happens on every node we've observed. > This doesn't utilize available resources to the full extent allowed by the > --jobs N parameter. In the field, on a cluster of 12 nodes with 4-5 TiB data > each, we've seen that the whole process was taking more than 7 days, instead > of estimated 1.5-2 days (provided there would be close to full N slots > utilization). > Instead, new tasks should be scheduled as soon as there is a free compaction > slot. > Additionally, starting from the biggest SSTables could further reduce the > total time required for the whole process to finish on any given node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Greaves updated CASSANDRA-14210: - Status: Ready to Commit (was: Patch Available) > Optimize SSTables upgrade task scheduling > - > > Key: CASSANDRA-14210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14210 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Oleksandr Shulgin >Assignee: Kurt Greaves >Priority: Major > Fix For: 4.x > > > When starting the SSTable-rewrite process by running {{nodetool > upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are > used. > For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}. > What we observed both for version 2.2 and 3.0, is that initially all 4 > provided slots are used for "Upgrade sstables" compactions, but later when > some of the 4 tasks are finished, no new tasks are scheduled immediately. It > takes the last of the 4 tasks to finish before new 4 tasks would be > scheduled. This happens on every node we've observed. > This doesn't utilize available resources to the full extent allowed by the > --jobs N parameter. In the field, on a cluster of 12 nodes with 4-5 TiB data > each, we've seen that the whole process was taking more than 7 days, instead > of estimated 1.5-2 days (provided there would be close to full N slots > utilization). > Instead, new tasks should be scheduled as soon as there is a free compaction > slot. > Additionally, starting from the biggest SSTables could further reduce the > total time required for the whole process to finish on any given node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Greaves updated CASSANDRA-14210: - Reproduced In: 3.0.15, 2.2.11 (was: 2.2.11, 3.0.15) Status: Patch Available (was: Awaiting Feedback) > Optimize SSTables upgrade task scheduling > - > > Key: CASSANDRA-14210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14210 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Oleksandr Shulgin >Assignee: Kurt Greaves >Priority: Major > Fix For: 4.x > > > When starting the SSTable-rewrite process by running {{nodetool > upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are > used. > For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}. > What we observed both for version 2.2 and 3.0, is that initially all 4 > provided slots are used for "Upgrade sstables" compactions, but later when > some of the 4 tasks are finished, no new tasks are scheduled immediately. It > takes the last of the 4 tasks to finish before new 4 tasks would be > scheduled. This happens on every node we've observed. > This doesn't utilize available resources to the full extent allowed by the > --jobs N parameter. In the field, on a cluster of 12 nodes with 4-5 TiB data > each, we've seen that the whole process was taking more than 7 days, instead > of estimated 1.5-2 days (provided there would be close to full N slots > utilization). > Instead, new tasks should be scheduled as soon as there is a free compaction > slot. > Additionally, starting from the biggest SSTables could further reduce the > total time required for the whole process to finish on any given node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14280) Fix timeout test - org.apache.cassandra.cql3.ViewTest
Dikang Gu created CASSANDRA-14280: - Summary: Fix timeout test - org.apache.cassandra.cql3.ViewTest Key: CASSANDRA-14280 URL: https://issues.apache.org/jira/browse/CASSANDRA-14280 Project: Cassandra Issue Type: Bug Components: Testing Reporter: Dikang Gu Assignee: Dikang Gu Fix For: 4.0 The test timeout very often, it seems too big, try to split it into multiple tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries
[ https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379662#comment-16379662 ] mck commented on CASSANDRA-14247: - [~mkjellman], agree with all your points. I did do one profile run against an infinite loop on the cities test csv file which is 1.3M, so it'll be easy to do it again to validate your improvement in (2). > SASI tokenizer for simple delimiter based entries > - > > Key: CASSANDRA-14247 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14247 > Project: Cassandra > Issue Type: Improvement > Components: sasi >Reporter: mck >Assignee: mck >Priority: Major > Fix For: 4.0, 3.11.x > > > Currently SASI offers only two tokenizer options: > - NonTokenizerAnalyser > - StandardAnalyzer > The latter is built upon Snowball, powerful for human languages but overkill > for simple tokenization. > A simple tokenizer is proposed here. The need for this arose as a workaround > of CASSANDRA-11182, and to avoid the disk usage explosion when having to > resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861 > Example use of this would be: > {code} > CREATE CUSTOM INDEX span_annotation_query_idx > ON zipkin2.span (annotation_query) USING > 'org.apache.cassandra.index.sasi.SASIIndex' > WITH OPTIONS = { > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', > 'delimiter': '░', > 'case_sensitive': 'true', > 'mode': 'prefix', > 'analyzed': 'true'}; > {code} > Original credit for this work goes to https://github.com/zuochangan -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries
[ https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mck updated CASSANDRA-14247: Status: In Progress (was: Patch Available) > SASI tokenizer for simple delimiter based entries > - > > Key: CASSANDRA-14247 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14247 > Project: Cassandra > Issue Type: Improvement > Components: sasi >Reporter: mck >Assignee: mck >Priority: Major > Fix For: 4.0, 3.11.x > > > Currently SASI offers only two tokenizer options: > - NonTokenizerAnalyser > - StandardAnalyzer > The latter is built upon Snowball, powerful for human languages but overkill > for simple tokenization. > A simple tokenizer is proposed here. The need for this arose as a workaround > of CASSANDRA-11182, and to avoid the disk usage explosion when having to > resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861 > Example use of this would be: > {code} > CREATE CUSTOM INDEX span_annotation_query_idx > ON zipkin2.span (annotation_query) USING > 'org.apache.cassandra.index.sasi.SASIIndex' > WITH OPTIONS = { > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', > 'delimiter': '░', > 'case_sensitive': 'true', > 'mode': 'prefix', > 'analyzed': 'true'}; > {code} > Original credit for this work goes to https://github.com/zuochangan -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries
[ https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379642#comment-16379642 ] Michael Kjellman commented on CASSANDRA-14247: -- 1) I think it would be better if we used a "," or " " for the default delimiter 2) I think it would be better if we do the work inside the iterator itself vs. using the split() function on the entire contents of the string in reset(). If we can do it iteratively we can then potentially reuse buffers and just go character by character until we hit the delimiter vs. needing to process the whole thing, no? Or did you benchmark this and find even with potentially large strings there wasn't a win? 3) When you hit a MarshalException you're logging the whole thing.. if the value is a 30MB text blob – the logger would get slammed so not sure logging the entire thing by default is ideal. thoughts? > SASI tokenizer for simple delimiter based entries > - > > Key: CASSANDRA-14247 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14247 > Project: Cassandra > Issue Type: Improvement > Components: sasi >Reporter: mck >Assignee: mck >Priority: Major > Fix For: 4.0, 3.11.x > > > Currently SASI offers only two tokenizer options: > - NonTokenizerAnalyser > - StandardAnalyzer > The latter is built upon Snowball, powerful for human languages but overkill > for simple tokenization. > A simple tokenizer is proposed here. The need for this arose as a workaround > of CASSANDRA-11182, and to avoid the disk usage explosion when having to > resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861 > Example use of this would be: > {code} > CREATE CUSTOM INDEX span_annotation_query_idx > ON zipkin2.span (annotation_query) USING > 'org.apache.cassandra.index.sasi.SASIIndex' > WITH OPTIONS = { > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', > 'delimiter': '░', > 'case_sensitive': 'true', > 'mode': 'prefix', > 'analyzed': 'true'}; > {code} > Original credit for this work goes to https://github.com/zuochangan -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries
[ https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mck updated CASSANDRA-14247: Status: Patch Available (was: In Progress) > SASI tokenizer for simple delimiter based entries > - > > Key: CASSANDRA-14247 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14247 > Project: Cassandra > Issue Type: Improvement > Components: sasi >Reporter: mck >Assignee: mck >Priority: Major > Fix For: 4.0, 3.11.x > > > Currently SASI offers only two tokenizer options: > - NonTokenizerAnalyser > - StandardAnalyzer > The latter is built upon Snowball, powerful for human languages but overkill > for simple tokenization. > A simple tokenizer is proposed here. The need for this arose as a workaround > of CASSANDRA-11182, and to avoid the disk usage explosion when having to > resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861 > Example use of this would be: > {code} > CREATE CUSTOM INDEX span_annotation_query_idx > ON zipkin2.span (annotation_query) USING > 'org.apache.cassandra.index.sasi.SASIIndex' > WITH OPTIONS = { > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', > 'delimiter': '░', > 'case_sensitive': 'true', > 'mode': 'prefix', > 'analyzed': 'true'}; > {code} > Original credit for this work goes to https://github.com/zuochangan -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries
[ https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379509#comment-16379509 ] mck commented on CASSANDRA-14247: - {quote}what's the reasoning for using "░" as the delimiter?{quote} [~mkjellman]. Nothing. Just the use-case that came over from zipkin (we used there a character that really was unlikely to be used otherwise). It could well make more sense to use the comma character? > SASI tokenizer for simple delimiter based entries > - > > Key: CASSANDRA-14247 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14247 > Project: Cassandra > Issue Type: Improvement > Components: sasi >Reporter: mck >Assignee: mck >Priority: Major > Fix For: 4.0, 3.11.x > > > Currently SASI offers only two tokenizer options: > - NonTokenizerAnalyser > - StandardAnalyzer > The latter is built upon Snowball, powerful for human languages but overkill > for simple tokenization. > A simple tokenizer is proposed here. The need for this arose as a workaround > of CASSANDRA-11182, and to avoid the disk usage explosion when having to > resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861 > Example use of this would be: > {code} > CREATE CUSTOM INDEX span_annotation_query_idx > ON zipkin2.span (annotation_query) USING > 'org.apache.cassandra.index.sasi.SASIIndex' > WITH OPTIONS = { > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', > 'delimiter': '░', > 'case_sensitive': 'true', > 'mode': 'prefix', > 'analyzed': 'true'}; > {code} > Original credit for this work goes to https://github.com/zuochangan -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14275) Cassandra Driver should send identification information to Server
[ https://issues.apache.org/jira/browse/CASSANDRA-14275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi updated CASSANDRA-14275: - Fix Version/s: (was: 3.11.x) (was: 3.0.x) (was: 2.2.x) (was: 2.1.x) > Cassandra Driver should send identification information to Server > - > > Key: CASSANDRA-14275 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14275 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > Fix For: 4.x > > > Currently there doesn't seem to be any way to readily identify the driver > that clients are using to connect to Cassandra. Add the capability of > identifying the driver through metadata information much like how HTTP > Clients identify themselves through User-Agent HTTP header. This is useful > for debugging in large deployments where clients tend to use different > drivers, wrappers and language bindings to connect to Cassandra. This can > help surface issues as well as help detect clients which are using older or > unsupported drivers. > The identification information should be a string that unambiguously > identifies the driver. It should include information such as the name of the > driver, it's version, CQL version, Platform (Linux, macOS, Windows, etc.) and > architecture (x86, x86_64). > We should surface this information in `nodetool clientstats` command. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14275) Cassandra Driver should send identification information to Server
[ https://issues.apache.org/jira/browse/CASSANDRA-14275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379474#comment-16379474 ] Dinesh Joshi commented on CASSANDRA-14275: -- Heres a PoC - # https://github.com/apache/cassandra/compare/trunk...dineshjoshi:add-client-string # https://github.com/datastax/java-driver/compare/3.x...dineshjoshi:enhance-java-driver-to-send-metadata This allows me to get stats like this - {noformat} bin/nodetool clientstats --all Address SSL Version User Keyspace Requests Driver /127.0.0.1:59305 false 4 anonymous 12 datastaxjavadriver-cql3.0.0-v3.0 /127.0.0.1:59306 false 4 anonymous 2 datastaxjavadriver-cql3.0.0-v3.0 Total connected clients: 2 User Connections anonymous 2 {noformat} > Cassandra Driver should send identification information to Server > - > > Key: CASSANDRA-14275 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14275 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.11.x, 4.x > > > Currently there doesn't seem to be any way to readily identify the driver > that clients are using to connect to Cassandra. Add the capability of > identifying the driver through metadata information much like how HTTP > Clients identify themselves through User-Agent HTTP header. This is useful > for debugging in large deployments where clients tend to use different > drivers, wrappers and language bindings to connect to Cassandra. This can > help surface issues as well as help detect clients which are using older or > unsupported drivers. > The identification information should be a string that unambiguously > identifies the driver. It should include information such as the name of the > driver, it's version, CQL version, Platform (Linux, macOS, Windows, etc.) and > architecture (x86, x86_64). > We should surface this information in `nodetool clientstats` command. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12848) Nodetool proxyhistograms/cfhistograms still report latency as flat result
[ https://issues.apache.org/jira/browse/CASSANDRA-12848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379363#comment-16379363 ] Chris Lohfink commented on CASSANDRA-12848: --- I think this can be reported as not a problem, just a artifact of using histograms to store metrics vs a sampling reservoir > Nodetool proxyhistograms/cfhistograms still report latency as flat result > - > > Key: CASSANDRA-12848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12848 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Nutchanon Leelapornudom >Priority: Major > Labels: metrics > Attachments: clientrequest-latency.png, image001.png > > > Even patched in CASSANDRA-11752, nodetool > proxyhistograms/cfhistograms(2.2)/tablehistograms(3.0,3.x) still report > read/write latency as flat result. That cause Cassandra metric > org.apache.cassandra.metrics.ClientRequest.Read/Write.Latency.xxpercentile > report incorrect pattern. > I have attached the result which I tested on cassandra 3.0.9. It indicate > read latency as flat line whereas read count has a movement normally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12848) Nodetool proxyhistograms/cfhistograms still report latency as flat result
[ https://issues.apache.org/jira/browse/CASSANDRA-12848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379360#comment-16379360 ] Chris Lohfink commented on CASSANDRA-12848: --- The buckets for the EH go in 20% jumps, so everything in a bucket from 700-840 for example will be reported as "840". This means that it will round up to nearest 20% and all the variance within the 20% is lost. 2.1 was lossy (randomly threw away latency recordings regardless of how important they are) but 2.2+ has a 20% error threshold (will in worse case report as 20% worse, not better) > Nodetool proxyhistograms/cfhistograms still report latency as flat result > - > > Key: CASSANDRA-12848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12848 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Nutchanon Leelapornudom >Priority: Major > Labels: metrics > Attachments: clientrequest-latency.png, image001.png > > > Even patched in CASSANDRA-11752, nodetool > proxyhistograms/cfhistograms(2.2)/tablehistograms(3.0,3.x) still report > read/write latency as flat result. That cause Cassandra metric > org.apache.cassandra.metrics.ClientRequest.Read/Write.Latency.xxpercentile > report incorrect pattern. > I have attached the result which I tested on cassandra 3.0.9. It indicate > read latency as flat line whereas read count has a movement normally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13762) Ensure views created during (or just before) range movements are properly built
[ https://issues.apache.org/jira/browse/CASSANDRA-13762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379277#comment-16379277 ] Duarte Nunes commented on CASSANDRA-13762: -- I think this patch assumes the base node receiving the streamed data will be paired with the same view replica as the source base node. Is this true? If so, how is that guaranteed? If not, then we might still want to send out view updates. We would need to check whether all replicas for the range in question have finished building their views, not just the source node. > Ensure views created during (or just before) range movements are properly > built > --- > > Key: CASSANDRA-13762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13762 > Project: Cassandra > Issue Type: Bug > Components: Materialized Views >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Minor > Labels: materializedviews > Attachments: trunk-13762-dtest.png, trunk-13762-testall.png > > > CASSANDRA-13065 assumes the source node has its views built to skip running > base mutations through the write path during range movements. > It is possible that the source node has not finished building the view, or > that a new view is created during a range movement, in which case the view > may be wrongly marked as built on the destination node. > The former problem was introduced by #13065, but even before that a view > created during a range movement may not be correctly built on the destination > node because the view builder will be triggered before it has finished > streaming the source data, wrongly marking the view as built on that node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14279) Row Tombstones in separate sstables / separate compaction path
[ https://issues.apache.org/jira/browse/CASSANDRA-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Constance Eustace updated CASSANDRA-14279: -- Description: In my experience if data is not well organized into time windowed sstables, cassandra has enormous difficulty in actually deleting data if the data has a "medium term" lifetime and is commingled with data that isn't marked for death, as would happen with compactions or intermingled write patterns. Or for example, you might have an active working set and be archiving "unused" data to other tables or clusters. Or you may be purging data. Or you may be migrating/sharding/restructuring data. Whatever the case, you want that disk space back, and you might not be able to truncate. In STCS and LCS, row tombstones are intermingled with column data and column tombstones. But a row tombstone represents a significant event in data lifecycle: large amounts of "droppable" data during compaction and a shortcut from reading data from other sstables. It could also enable writes to be discarded in rare data patterns if the row tombstone is ahead in time. I am wondering that if row tombstones were isolated in their own sstables, separately compacted and merged, that it might enable compaction to work more efficiently: reads can prioritize bloom filter lookups that indicate a row tombstone, getting the timestamp of the deletion first, then can use that in the data sstables to filter data or shortcircuit the data if the row data had an overall "most recent data timestamp". compaction could be forced to reference all the row tombstone sstables, such that every time two or more "data" sstables are compacted, they must reference the row tombstones to purge data. In LCS, this would be particularly useful in getting data out of the upper levels without having to wait for data to trickle up the tree. The row tombstones, being read-only inputs into the data sstable compactions, can be referenced in each of the LCS levels' parallel compactors. Based on discussions in the dev list, this would appear to require some sort of customization to the memtable->sstable flushing process, and perhaps a different set of bloom filters. Since the row tombstone sstables are all ,, they should be comparitively smaller and take less time to compact. They could be aggressively compacted on a different schedule than "data" sstables. In addition, it may be easier to repair/synchronize row tombstones across the cluster if they have already been separated into their own sstables. Column/range tombstones may also benefit from a similar separation, but my guess is those are much more numerous and large and fine-grained that they might as well coexist with the data. was: In my experience if data is not well organized into time windowed sstables, cassandra has enormous difficulty in actually deleting data if the data has a "medium term" lifetime and is commingled with data that isn't marked for death, as would happen with compactions or intermingled write patterns. Or for example, you might have an active working set and be archiving "unused" data to other tables or clusters. Or you may be purging data. Or you may be migrating/sharding/restructuring data. Whatever the case, you want that disk space back, and you might not be able to truncate. In STCS and LCS, row tombstones are intermingled with column data and column tombstones. But a row tombstone represents a big event: large amounts of "droppable" data from an sstable, or even a shortcut from reading data from other sstables. I am wondering that if row tombstones were isolated in their own sstables, separately compacted and merged, that it might enable compaction to work more efficiently: reads can prioritize bloom filter lookups that indicate a row tombstone, getting the timestamp of the deletion first, then can use that in the data sstables to filter data or shortcircuit the data if the row data had an overall "most recent data timestamp". compaction could be forced to reference all the row tombstone sstables, such that every time two or more "data" sstables are compacted, they must reference the row tombstones to purge data. In LCS, this would be particularly useful in getting data out of the upper levels without having to wait for data to trickle up the tree. The row tombstones, being read-only inputs into the data sstable compactions, can be referenced in each of the LCS levels' parallel compactors. Based on discussions in the dev list, this would appear to require some sort of customization to the memtable->sstable flushing process, and perhaps a different set of bloom filters. Since the row tombstone sstables are all ,, they should be comparitively smaller and take less time to compact. They could be aggressively compacted on a different schedule than "data" sstables. In addition, i
[jira] [Updated] (CASSANDRA-14279) Row Tombstones in separate sstables / separate compaction path
[ https://issues.apache.org/jira/browse/CASSANDRA-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Constance Eustace updated CASSANDRA-14279: -- Description: In my experience if data is not well organized into time windowed sstables, cassandra has enormous difficulty in actually deleting data if the data has a "medium term" lifetime and is commingled with data that isn't marked for death, as would happen with compactions or intermingled write patterns. Or for example, you might have an active working set and be archiving "unused" data to other tables or clusters. Or you may be purging data. Or you may be migrating/sharding/restructuring data. Whatever the case, you want that disk space back, and you might not be able to truncate. In STCS and LCS, row tombstones are intermingled with column data and column tombstones. But a row tombstone represents a big event: large amounts of "droppable" data from an sstable, or even a shortcut from reading data from other sstables. I am wondering that if row tombstones were isolated in their own sstables, separately compacted and merged, that it might enable compaction to work more efficiently: reads can prioritize bloom filter lookups that indicate a row tombstone, getting the timestamp of the deletion first, then can use that in the data sstables to filter data or shortcircuit the data if the row data had an overall "most recent data timestamp". compaction could be forced to reference all the row tombstone sstables, such that every time two or more "data" sstables are compacted, they must reference the row tombstones to purge data. In LCS, this would be particularly useful in getting data out of the upper levels without having to wait for data to trickle up the tree. The row tombstones, being read-only inputs into the data sstable compactions, can be referenced in each of the LCS levels' parallel compactors. Based on discussions in the dev list, this would appear to require some sort of customization to the memtable->sstable flushing process, and perhaps a different set of bloom filters. Since the row tombstone sstables are all ,, they should be comparitively smaller and take less time to compact. They could be aggressively compacted on a different schedule than "data" sstables. In addition, it may be easier to repair/synchronize row tombstones across the cluster if they have already been separated into their own sstables. Column/range tombstones may also benefit from a similar separation, but my guess is those are much more numerous and large and fine-grained that they might as well coexist with the data. was: In my experience if data is not well organized into time windowed sstables, cassandra has enormous difficulty in actually deleting data if the data has a "medium term" lifetime. Or for example, you might have an active working set and be archiving "unused" data to other tables or clusters. Or you may be purging data. Or you may be migrating/sharding data. Whatever the case, you want that disk space back. In STCS and LCS, row tombstones are intermingled with column data and column tombstones. But a row tombstone represents a big event: large amounts of "droppable" data from an sstable, or even a shortcut from reading data from other sstables. I am wondering that if row tombstones were isolated in their own sstables, separately compacted and merged, that it might enable compaction to work more efficiently: reads can prioritize bloom filter lookups that indicate a row tombstone, getting the timestamp of the deletion first, then can use that in the data sstables to filter data or shortcircuit the data if the row data had an overall "most recent data timestamp". compaction could be forced to reference all the row tombstone sstables, such that every time two or more "data" sstables are compacted, they must reference the row tombstones to purge data. In LCS, this would be particularly useful in getting data out of the upper levels without having to wait for data to trickle up the tree. The row tombstones, being read-only inputs into the data sstable compactions, can be referenced in each of the LCS levels' parallel compactors. Based on discussions in the dev list, this would appear to require some sort of customization to the memtable->sstable flushing process, and perhaps a different set of bloom filters. Since the row tombstone sstables are all ,, they should be comparitively smaller and take less time to compact. They could be aggressively compacted on a different schedule than "data" sstables. In addition, it may be easier to repair/synchronize row tombstones across the cluster if they have already been separated into their own sstables. Column/range tombstones may also benefit from a similar separation, but my guess is those are much more numerous and large and fine-grained that they might as well coex
[jira] [Updated] (CASSANDRA-14279) Row Tombstones in separate sstables / separate compaction path
[ https://issues.apache.org/jira/browse/CASSANDRA-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Constance Eustace updated CASSANDRA-14279: -- Component/s: (was: Lifecycle) Repair > Row Tombstones in separate sstables / separate compaction path > -- > > Key: CASSANDRA-14279 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14279 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Local Write-Read Paths, Repair >Reporter: Constance Eustace >Priority: Major > > In my experience if data is not well organized into time windowed sstables, > cassandra has enormous difficulty in actually deleting data if the data has a > "medium term" lifetime. Or for example, you might have an active working set > and be archiving "unused" data to other tables or clusters. Or you may be > purging data. Or you may be migrating/sharding data. Whatever the case, you > want that disk space back. > In STCS and LCS, row tombstones are intermingled with column data and column > tombstones. But a row tombstone represents a big event: large amounts of > "droppable" data from an sstable, or even a shortcut from reading data from > other sstables. > I am wondering that if row tombstones were isolated in their own sstables, > separately compacted and merged, that it might enable compaction to work more > efficiently: > reads can prioritize bloom filter lookups that indicate a row tombstone, > getting the timestamp of the deletion first, then can use that in the data > sstables to filter data or shortcircuit the data if the row data had an > overall "most recent data timestamp". > compaction could be forced to reference all the row tombstone sstables, such > that every time two or more "data" sstables are compacted, they must > reference the row tombstones to purge data. > In LCS, this would be particularly useful in getting data out of the upper > levels without having to wait for data to trickle up the tree. The row > tombstones, being read-only inputs into the data sstable compactions, can be > referenced in each of the LCS levels' parallel compactors. > Based on discussions in the dev list, this would appear to require some sort > of customization to the memtable->sstable flushing process, and perhaps a > different set of bloom filters. > Since the row tombstone sstables are all ,, they > should be comparitively smaller and take less time to compact. They could be > aggressively compacted on a different schedule than "data" sstables. > In addition, it may be easier to repair/synchronize row tombstones across the > cluster if they have already been separated into their own sstables. > Column/range tombstones may also benefit from a similar separation, but my > guess is those are much more numerous and large and fine-grained that they > might as well coexist with the data. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14279) Row Tombstones in separate sstables / separate compaction path
[ https://issues.apache.org/jira/browse/CASSANDRA-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Constance Eustace updated CASSANDRA-14279: -- Component/s: Local Write-Read Paths Lifecycle Compaction > Row Tombstones in separate sstables / separate compaction path > -- > > Key: CASSANDRA-14279 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14279 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Lifecycle, Local Write-Read Paths >Reporter: Constance Eustace >Priority: Major > > In my experience if data is not well organized into time windowed sstables, > cassandra has enormous difficulty in actually deleting data if the data has a > "medium term" lifetime. Or for example, you might have an active working set > and be archiving "unused" data to other tables or clusters. Or you may be > purging data. Or you may be migrating/sharding data. Whatever the case, you > want that disk space back. > In STCS and LCS, row tombstones are intermingled with column data and column > tombstones. But a row tombstone represents a big event: large amounts of > "droppable" data from an sstable, or even a shortcut from reading data from > other sstables. > I am wondering that if row tombstones were isolated in their own sstables, > separately compacted and merged, that it might enable compaction to work more > efficiently: > reads can prioritize bloom filter lookups that indicate a row tombstone, > getting the timestamp of the deletion first, then can use that in the data > sstables to filter data or shortcircuit the data if the row data had an > overall "most recent data timestamp". > compaction could be forced to reference all the row tombstone sstables, such > that every time two or more "data" sstables are compacted, they must > reference the row tombstones to purge data. > In LCS, this would be particularly useful in getting data out of the upper > levels without having to wait for data to trickle up the tree. The row > tombstones, being read-only inputs into the data sstable compactions, can be > referenced in each of the LCS levels' parallel compactors. > Based on discussions in the dev list, this would appear to require some sort > of customization to the memtable->sstable flushing process, and perhaps a > different set of bloom filters. > Since the row tombstone sstables are all ,, they > should be comparitively smaller and take less time to compact. They could be > aggressively compacted on a different schedule than "data" sstables. > In addition, it may be easier to repair/synchronize row tombstones across the > cluster if they have already been separated into their own sstables. > Column/range tombstones may also benefit from a similar separation, but my > guess is those are much more numerous and large and fine-grained that they > might as well coexist with the data. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14279) Row Tombstones in separate sstables / separate compaction path
Constance Eustace created CASSANDRA-14279: - Summary: Row Tombstones in separate sstables / separate compaction path Key: CASSANDRA-14279 URL: https://issues.apache.org/jira/browse/CASSANDRA-14279 Project: Cassandra Issue Type: Improvement Reporter: Constance Eustace In my experience if data is not well organized into time windowed sstables, cassandra has enormous difficulty in actually deleting data if the data has a "medium term" lifetime. Or for example, you might have an active working set and be archiving "unused" data to other tables or clusters. Or you may be purging data. Or you may be migrating/sharding data. Whatever the case, you want that disk space back. In STCS and LCS, row tombstones are intermingled with column data and column tombstones. But a row tombstone represents a big event: large amounts of "droppable" data from an sstable, or even a shortcut from reading data from other sstables. I am wondering that if row tombstones were isolated in their own sstables, separately compacted and merged, that it might enable compaction to work more efficiently: reads can prioritize bloom filter lookups that indicate a row tombstone, getting the timestamp of the deletion first, then can use that in the data sstables to filter data or shortcircuit the data if the row data had an overall "most recent data timestamp". compaction could be forced to reference all the row tombstone sstables, such that every time two or more "data" sstables are compacted, they must reference the row tombstones to purge data. In LCS, this would be particularly useful in getting data out of the upper levels without having to wait for data to trickle up the tree. The row tombstones, being read-only inputs into the data sstable compactions, can be referenced in each of the LCS levels' parallel compactors. Based on discussions in the dev list, this would appear to require some sort of customization to the memtable->sstable flushing process, and perhaps a different set of bloom filters. Since the row tombstone sstables are all ,, they should be comparitively smaller and take less time to compact. They could be aggressively compacted on a different schedule than "data" sstables. In addition, it may be easier to repair/synchronize row tombstones across the cluster if they have already been separated into their own sstables. Column/range tombstones may also benefit from a similar separation, but my guess is those are much more numerous and large and fine-grained that they might as well coexist with the data. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14278) Testing
Sumant Sahney created CASSANDRA-14278: - Summary: Testing Key: CASSANDRA-14278 URL: https://issues.apache.org/jira/browse/CASSANDRA-14278 Project: Cassandra Issue Type: Sub-task Reporter: Sumant Sahney Test to see if all the logs are written correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14277) Update Log Files in Patches / Modularly
Sumant Sahney created CASSANDRA-14277: - Summary: Update Log Files in Patches / Modularly Key: CASSANDRA-14277 URL: https://issues.apache.org/jira/browse/CASSANDRA-14277 Project: Cassandra Issue Type: Sub-task Reporter: Sumant Sahney Make changes in the Logs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries
[ https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379161#comment-16379161 ] Michael Kjellman commented on CASSANDRA-14247: -- [~michaelsembwever]: what's the reasoning for using "░" as the delimiter? > SASI tokenizer for simple delimiter based entries > - > > Key: CASSANDRA-14247 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14247 > Project: Cassandra > Issue Type: Improvement > Components: sasi >Reporter: mck >Assignee: mck >Priority: Major > Fix For: 4.0, 3.11.x > > > Currently SASI offers only two tokenizer options: > - NonTokenizerAnalyser > - StandardAnalyzer > The latter is built upon Snowball, powerful for human languages but overkill > for simple tokenization. > A simple tokenizer is proposed here. The need for this arose as a workaround > of CASSANDRA-11182, and to avoid the disk usage explosion when having to > resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861 > Example use of this would be: > {code} > CREATE CUSTOM INDEX span_annotation_query_idx > ON zipkin2.span (annotation_query) USING > 'org.apache.cassandra.index.sasi.SASIIndex' > WITH OPTIONS = { > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', > 'delimiter': '░', > 'case_sensitive': 'true', > 'mode': 'prefix', > 'analyzed': 'true'}; > {code} > Original credit for this work goes to https://github.com/zuochangan -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14276) Walk through the code
Sumant Sahney created CASSANDRA-14276: - Summary: Walk through the code Key: CASSANDRA-14276 URL: https://issues.apache.org/jira/browse/CASSANDRA-14276 Project: Cassandra Issue Type: Sub-task Reporter: Sumant Sahney 1. Walk through the code and understand the modules logging size. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12151) Audit logging for database activity
[ https://issues.apache.org/jira/browse/CASSANDRA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379130#comment-16379130 ] Jeremiah Jordan commented on CASSANDRA-12151: - One of the goals should be recording queries with minimal impact on workloads, which was also a goal of CASSANDRA-13983, so I would think some re-use might be a good idea rather than coming up with a new way of doing that. > Audit logging for database activity > --- > > Key: CASSANDRA-12151 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12151 > Project: Cassandra > Issue Type: New Feature >Reporter: stefan setyadi >Assignee: Anuj Wadehra >Priority: Major > Fix For: 4.x > > Attachments: 12151.txt, > DesignProposal_AuditingFeature_ApacheCassandra_v1.docx > > > we would like a way to enable cassandra to log database activity being done > on our server. > It should show username, remote address, timestamp, action type, keyspace, > column family, and the query statement. > it should also be able to log connection attempt and changes to the > user/roles. > I was thinking of making a new keyspace and insert an entry for every > activity that occurs. > Then It would be possible to query for specific activity or a query targeting > a specific keyspace and column family. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14252) Use zero as default score in DynamicEndpointSnitch
[ https://issues.apache.org/jira/browse/CASSANDRA-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379131#comment-16379131 ] Dikang Gu commented on CASSANDRA-14252: --- [~szhou], Yes, it's the warm up phase. We have to know the distance/latency differences between different replicas, otherwise we will have no way to fall back to remote replicas. One idea to limit unnecessary requests to remote replica is to only fall back when local node is really bad. Something like this: if ({color:red}subsnitchScore > 0.5{color} && subsnitchScore > (sortedScoreIterator.next() * (1.0 + dynamicBadnessThreshold))) { sortByProximityWithScore(address, addresses); return; } of course, the param 0.5 can be tunable. > Use zero as default score in DynamicEndpointSnitch > -- > > Key: CASSANDRA-14252 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14252 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > Fix For: 4.0, 3.0.17, 3.11.3 > > > The problem I want to solve is that I found in our deployment, one slow but > alive data node can slow down the whole cluster, even caused timeout of our > requests. > We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the > DynamicEndpointSnitch switch to sortByProximityWithScore, if local data node > latency is too high. > I added some debug log, and figured out that in a lot of cases, the score > from remote data node was not populated, so the fallback to > sortByProximityWithScore never happened. That's why a single slow data node, > can cause huge problems to the whole cluster. > In this jira, I'd like to use zero as default score, so that we will get a > chance to try remote data node, if local one is slow. > I tested it in our test cluster, it improved the client latency in single > slow data node case significantly. > I flag this as a Bug, because it caused problems to our use cases multiple > times. > logs === > _2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [0.0]_ > _2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-8341) Expose time spent in each thread pool
[ https://issues.apache.org/jira/browse/CASSANDRA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Lohfink updated CASSANDRA-8341: - Status: Patch Available (was: Open) > Expose time spent in each thread pool > - > > Key: CASSANDRA-8341 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8341 > Project: Cassandra > Issue Type: New Feature >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > Labels: metrics > Attachments: 8341.patch, 8341v2.txt > > > Can increment a counter with time spent in each queue. This can provide > context on how much time is spent percentage wise in each stage. > Additionally can be used with littles law in future if ever want to try to > tune the size of the pools. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-8341) Expose time spent in each thread pool
[ https://issues.apache.org/jira/browse/CASSANDRA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379093#comment-16379093 ] Chris Lohfink commented on CASSANDRA-8341: -- Patch changed output of tpstats to look like: {code} Pool NameActive Pending Completed Blocked AllTimeBlocked CPU[ms/sec] Allocations[mb/s] AntiEntropyStage 0 0 0 0 0 0 0 CacheCleanupExecutor 0 0 0 0 0 0 0 CompactionExecutor 0 0 1013 0 0 0 0 CounterMutationStage 0 0 0 0 0 0 0 GossipStage 0 0 0 0 0 0 0 HintsDispatcher 0 0 0 0 0 0 0 InternalResponseStage0 0 0 0 0 0 0 MemtableFlushWriter 0 0 1 0 0 0 0 MemtablePostFlush0 0 2 0 0 0 0 MemtableReclaimMemory0 0 1 0 0 0 0 MigrationStage 0 0 0 0 0 0 0 MiscStage0 0 0 0 0 0 0 MutationStage0 0 1367500 0 27 2 Native-Transport-Requests11 0 36191532 0 0 1839566 PendingRangeCalculator 0 0 2 0 0 0 0 PerDiskMemtableFlushWriter_0 0 0 1 0 0 0 0 ReadRepairStage 0 0 0 0 0 0 0 ReadStage0 0 22662142 0 0 349 58 Repair-Task 0 0 0 0 0 0 0 RequestResponseStage 0 0 0 0 0 0 0 Sampler 0 0 0 0 0 0 0 SecondaryIndexManagement 0 0 0 0 0 0 0 ValidationExecutor 0 0 0 0 0 0 0 ViewBuildExecutor0 0 0 0 0 0 0 ViewMutationStage0 0 0 0 0 0 0 {code} > Expose time spent in each thread pool > - > > Key: CASSANDRA-8341 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8341 > Project: Cassandra > Issue Type: New Feature >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > Labels: metrics > Attachments: 8341.patch, 8341v2.txt > > > Can increment a counter with time spent in each queue. This can provide > context on how much time is spent percentage wise in each stage. > Additionally can be used with littles law in future if ever want to try to > tune the size of the pools. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-8341) Expose time spent in each thread pool
[ https://issues.apache.org/jira/browse/CASSANDRA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379090#comment-16379090 ] ASF GitHub Bot commented on CASSANDRA-8341: --- GitHub user clohfink opened a pull request: https://github.com/apache/cassandra/pull/200 Add tpstats cpu and alloc rate tracking for CASSANDRA-8341 You can merge this pull request into a Git repository by running: $ git pull https://github.com/clohfink/cassandra 8341 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/cassandra/pull/200.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #200 commit 1d47a63f5c334998cb8e948f5114c1e3cbc32103 Author: Chris Lohfink Date: 2018-02-27T18:04:39Z Add tpstats cpu and alloc rate tracking for CASSANDRA-8341 > Expose time spent in each thread pool > - > > Key: CASSANDRA-8341 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8341 > Project: Cassandra > Issue Type: New Feature >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > Labels: metrics > Attachments: 8341.patch, 8341v2.txt > > > Can increment a counter with time spent in each queue. This can provide > context on how much time is spent percentage wise in each stage. > Additionally can be used with littles law in future if ever want to try to > tune the size of the pools. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-8341) Expose time spent in each thread pool
[ https://issues.apache.org/jira/browse/CASSANDRA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Lohfink reassigned CASSANDRA-8341: Assignee: Chris Lohfink > Expose time spent in each thread pool > - > > Key: CASSANDRA-8341 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8341 > Project: Cassandra > Issue Type: New Feature >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > Labels: metrics > Attachments: 8341.patch, 8341v2.txt > > > Can increment a counter with time spent in each queue. This can provide > context on how much time is spent percentage wise in each stage. > Additionally can be used with littles law in future if ever want to try to > tune the size of the pools. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12151) Audit logging for database activity
[ https://issues.apache.org/jira/browse/CASSANDRA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379076#comment-16379076 ] Anuj Wadehra commented on CASSANDRA-12151: -- Thanks for the review comments ! [~vinaykumarcse] I will go through your patch and share my comments. [~jjordan] I had a look at -CASSANDRA-13983- The use cases are quite different. Yes we can have a chronicle-queue variant of Audit logger like [~vinaykumarcse] said but I think we should start with a simple logger implementation unless we have real good reasons to go with chronicle-queue for Audit logging. > Audit logging for database activity > --- > > Key: CASSANDRA-12151 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12151 > Project: Cassandra > Issue Type: New Feature >Reporter: stefan setyadi >Assignee: Anuj Wadehra >Priority: Major > Fix For: 4.x > > Attachments: 12151.txt, > DesignProposal_AuditingFeature_ApacheCassandra_v1.docx > > > we would like a way to enable cassandra to log database activity being done > on our server. > It should show username, remote address, timestamp, action type, keyspace, > column family, and the query statement. > it should also be able to log connection attempt and changes to the > user/roles. > I was thinking of making a new keyspace and insert an entry for every > activity that occurs. > Then It would be possible to query for specific activity or a query targeting > a specific keyspace and column family. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra-builds git commit: can't use --depth with apache git repo
Repository: cassandra-builds Updated Branches: refs/heads/master 2c1842cef -> f6079f9eb can't use --depth with apache git repo Project: http://git-wip-us.apache.org/repos/asf/cassandra-builds/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra-builds/commit/f6079f9e Tree: http://git-wip-us.apache.org/repos/asf/cassandra-builds/tree/f6079f9e Diff: http://git-wip-us.apache.org/repos/asf/cassandra-builds/diff/f6079f9e Branch: refs/heads/master Commit: f6079f9eba752b76fc34371f344332afa46e6026 Parents: 2c1842c Author: Marcus Eriksson Authored: Tue Feb 27 09:14:32 2018 -0800 Committer: Marcus Eriksson Committed: Tue Feb 27 09:14:32 2018 -0800 -- docker/jenkins/jenkinscommand.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra-builds/blob/f6079f9e/docker/jenkins/jenkinscommand.sh -- diff --git a/docker/jenkins/jenkinscommand.sh b/docker/jenkins/jenkinscommand.sh index 5e4741d..8a29dd7 100644 --- a/docker/jenkins/jenkinscommand.sh +++ b/docker/jenkins/jenkinscommand.sh @@ -8,8 +8,8 @@ BRANCH=$2 DTEST_REPO=$3 DTEST_BRANCH=$4 EOF -echo "jenkinscommand.sh: running: git clone --depth=1 --branch $BUILDSBRANCH $BUILDSREPO; sh ./cassandra-builds/docker/jenkins/dtest.sh $7" -ID=$(docker run --env-file env.list -dt $DOCKER_IMAGE dumb-init bash -ilc "git clone --depth=1 --branch $BUILDSBRANCH $BUILDSREPO; sh ./cassandra-builds/docker/jenkins/dtest.sh $7") +echo "jenkinscommand.sh: running: git clone --branch $BUILDSBRANCH $BUILDSREPO; sh ./cassandra-builds/docker/jenkins/dtest.sh $7" +ID=$(docker run --env-file env.list -dt $DOCKER_IMAGE dumb-init bash -ilc "git clone --branch $BUILDSBRANCH $BUILDSREPO; sh ./cassandra-builds/docker/jenkins/dtest.sh $7") # use docker attach instead of docker wait to get output docker attach --no-stdin $ID echo "$ID done, copying files" - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14264) Quick Tour Document for dev's that want to get oriented on the code efficiently.
[ https://issues.apache.org/jira/browse/CASSANDRA-14264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378947#comment-16378947 ] Constance Eustace commented on CASSANDRA-14264: --- I am currently trying to understand the internals of SStables. It would be really nice to have a walkthrough of viewing, filtering, parsing, merging, manipulating, etc small-scale sstables via cassandra code. If I can figure it out myself I'll try to provide some writeup. A detailed explanation of what the system tables represent would be nice. As for the code, there are a couple critical paths to cassandra data: 1) incoming mutations going to commit log and memtable 2) flushing mutations going from memtable to sstable 3) sstables being compacted and organized 4) coordinator node actions on queries/writes and ensuring consistency levels are adhered to 5) queries processing against memtable and bloom filters/sstable lookup We could write code path explanations for those, might be very very helpful. > Quick Tour Document for dev's that want to get oriented on the code > efficiently. > > > Key: CASSANDRA-14264 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14264 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Kenneth Brotman >Priority: Major > > Create a Quick Tour Document for dev's that want to get oriented on the code > efficiently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14002) Don't use SHA256 when building merkle trees
[ https://issues.apache.org/jira/browse/CASSANDRA-14002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378941#comment-16378941 ] Michael Kjellman commented on CASSANDRA-14002: -- +1 to rebase. > Don't use SHA256 when building merkle trees > --- > > Key: CASSANDRA-14002 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14002 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Fix For: 4.x > > > We should avoid using SHA-2 when building merkle trees as we don't need a > cryptographic hash function for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13480) nodetool repair can hang forever if we lose the notification for the repair completing/failing
[ https://issues.apache.org/jira/browse/CASSANDRA-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378919#comment-16378919 ] Tania S Engel commented on CASSANDRA-13480: --- [~mbyrd] : I have reason to believe I just hit this in 3.11.1, I at the very least ran into a repair which has never completed on an 11 node cluster. Is there a way to get this fix in 3.11? > nodetool repair can hang forever if we lose the notification for the repair > completing/failing > -- > > Key: CASSANDRA-13480 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13480 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Matt Byrd >Assignee: Matt Byrd >Priority: Minor > Labels: repair > Fix For: 4.0 > > > When a Jmx lost notification occurs, sometimes the lost notification in > question is the notification which let's RepairRunner know that the repair is > finished (ProgressEventType.COMPLETE or even ERROR for that matter). > This results in nodetool process running the repair hanging forever. > I have a test which reproduces the issue here: > https://github.com/Jollyplum/cassandra-dtest/tree/repair_hang_test > To fix this, If on receiving a notification that notifications have been lost > (JMXConnectionNotification.NOTIFS_LOST), we instead query a new endpoint via > Jmx to receive all the relevant notifications we're interested in, we can > replay those we missed and avoid this scenario. > It's possible also that the JMXConnectionNotification.NOTIFS_LOST itself > might be lost and so for good measure I have made RepairRunner poll > periodically to see if there were any notifications that had been sent but we > didn't receive (scoped just to the particular tag for the given repair). > Users who don't use nodetool but go via jmx directly, can still use this new > endpoint and implement similar behaviour in their clients as desired. > I'm also expiring the notifications which have been kept on the server side. > Please let me know if you've any questions or can think of a different > approach, I also tried setting: > JVM_OPTS="$JVM_OPTS -Djmx.remote.x.notification.buffer.size=5000" > but this didn't fix the test. I suppose it might help under certain scenarios > but in this test we don't even send that many notifications so I'm not > surprised it doesn't fix it. > It seems like getting lost notifications is always a potential problem with > jmx as far as I can tell. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12151) Audit logging for database activity
[ https://issues.apache.org/jira/browse/CASSANDRA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377534#comment-16377534 ] Vinay Chella edited comment on CASSANDRA-12151 at 2/27/18 4:48 PM: --- Hi [~eanujwa] [~jasobrown], I’m excited to see the design document and it looks good to us! Netflix had a similar requirement recently for our internal 2.1 clusters and we implemented a simple version (no query categories, etc…) for sox auditing. As your design is very close to what we implemented, just a few differently named classes for the most part, can we work together on the trunk [patchset|https://github.com/vinaykumarchella/cassandra/pull/2] to add the missing components from your design? Alternatively, we could take an incremental approach, review what we have on the trunk branch of the simple version and get it committed and then add in some of the more advanced features next. I believe this patch follows the design goals that you put together. Please review and let me know if you have any questions or concerns about the first iteration. If folks are interested in the 3.x/2.x branches I can put those up on my github as well. [~jhb] {quote}I just have one question, do you think enabling/updating/disabling audit require a node restart? {quote} The posted patch allows online auditlog enable/disable via JMX. [~jjordan] {quote}You should take a look at the infrastructure added in CASSANDRA-13983 for query logging {quote} Yes, we looked and that certainly looks interesting, perhaps this design allows us to use it as another implementation of {{IAuditLogger}}? Here is the patch location: ||[trunk|https://github.com/vinaykumarchella/cassandra]|| |[PR for Trunk|https://github.com/vinaykumarchella/cassandra/pull/2]| was (Author: vinaykumarcse): Hi [~eanujwa] [~jasobrown], I’m excited to see the design document and it looks good to us! Netflix had a similar requirement recently for our internal 2.1 clusters and we implemented a simple version (no query categories, etc…) for sox auditing. As your design is very close to what we implemented, just a few differently named classes for the most part, can we work together on the trunk [patchset|https://github.com/vinaykumarchella/cassandra/pull/2] to add the missing components from your design? Alternatively, we could take an incremental approach, review what we have on the trunk branch of the simple version and get it committed and then add in some of the more advanced features next. I believe this patch follows the design goals that you put together. Please review and let me know if you have any questions or concerns about the first iteration. If folks are interested in the 3.x/2.x branches I can put those up on my github as well. [~jhb] {quote}I just have one question, do you think enabling/updating/disabling audit require a node restart? {quote} The posted patch allows online auditlog enable/disable via JMX. [~jjordan] {quote}You should take a look at the infrastructure added in CASSANDRA-13983 for query logging {quote} Yes, we looked and that certainly looks interesting, perhaps this design allows us to use it as another implementation of {{IAuditLogger}}? > Audit logging for database activity > --- > > Key: CASSANDRA-12151 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12151 > Project: Cassandra > Issue Type: New Feature >Reporter: stefan setyadi >Assignee: Anuj Wadehra >Priority: Major > Fix For: 4.x > > Attachments: 12151.txt, > DesignProposal_AuditingFeature_ApacheCassandra_v1.docx > > > we would like a way to enable cassandra to log database activity being done > on our server. > It should show username, remote address, timestamp, action type, keyspace, > column family, and the query statement. > it should also be able to log connection attempt and changes to the > user/roles. > I was thinking of making a new keyspace and insert an entry for every > activity that occurs. > Then It would be possible to query for specific activity or a query targeting > a specific keyspace and column family. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13668) Database user auditing events
[ https://issues.apache.org/jira/browse/CASSANDRA-13668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Podkowinski updated CASSANDRA-13668: --- Status: Patch Available (was: In Progress) > Database user auditing events > - > > Key: CASSANDRA-13668 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13668 > Project: Cassandra > Issue Type: Improvement > Components: Observability >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Fix For: 4.x > > > With the availability of CASSANDRA-13459, any native transport enabled client > will be able to subscribe to internal Cassandra events. External tools can > take advantage by monitoring these events in various ways. Use-cases for this > can be e.g. auditing tools for compliance and security purposes. > The scope of this ticket is to add diagnostic events that are raised around > authentication and CQL operations. These events can then be consumed and used > by external tools to implement a Cassandra user auditing solution. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378235#comment-16378235 ] Oleksandr Shulgin edited comment on CASSANDRA-14210 at 2/27/18 8:35 AM: We are observing a very similar problem with ordinary compaction. Not sure if the proposed change could cover both (with the difference that in compaction you likely want to start with the smallest tables first, but this is up to the actual compaction strategy). A node runs with {{concurrent_compactors=2}} and is doing a rather big compaction (> 200 GB) on a table. At the same time, a lot of small files are streamed in by repair, for a different table. Number of {{*-Data.db}} files for that other table grows as high as 5,500 and estimated number of pending compaction tasks for this node jumps to over 180. But no compaction is started for the table with a lot of small data files, up until the only current compaction task finishes. Why is that so? I would expect that a free compaction slot is utilized immediately for new tasks. was (Author: oshulgin): We are observing a very similar problem with ordinary compaction. Not sure if the proposed change could cover both (with the difference that in compaction you likely want to start with the smallest tables first, but this is up to the actual compaction strategy). A node runs with {{concurrent_compactors=2}} and is doing a rather big compaction (> 200 GB) on a table. At the same time, a lot of small files are streamed in by repair, for a different table. Number of {{*-Data.db}} for that other table grows as high as 5,500 and estimated number of pending compaction tasks for this node jumps to over 180. But no compaction is started for the table with a lot of small data files, up until the only current compaction task finishes. Why is that so? I would expect that a free compaction slot is utilized immediately for new tasks. > Optimize SSTables upgrade task scheduling > - > > Key: CASSANDRA-14210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14210 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Oleksandr Shulgin >Assignee: Kurt Greaves >Priority: Major > Fix For: 4.x > > > When starting the SSTable-rewrite process by running {{nodetool > upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are > used. > For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}. > What we observed both for version 2.2 and 3.0, is that initially all 4 > provided slots are used for "Upgrade sstables" compactions, but later when > some of the 4 tasks are finished, no new tasks are scheduled immediately. It > takes the last of the 4 tasks to finish before new 4 tasks would be > scheduled. This happens on every node we've observed. > This doesn't utilize available resources to the full extent allowed by the > --jobs N parameter. In the field, on a cluster of 12 nodes with 4-5 TiB data > each, we've seen that the whole process was taking more than 7 days, instead > of estimated 1.5-2 days (provided there would be close to full N slots > utilization). > Instead, new tasks should be scheduled as soon as there is a free compaction > slot. > Additionally, starting from the biggest SSTables could further reduce the > total time required for the whole process to finish on any given node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378235#comment-16378235 ] Oleksandr Shulgin commented on CASSANDRA-14210: --- We are observing a very similar problem with ordinary compaction. Not sure if the proposed change could cover both (with the difference that in compaction you likely want to start with the smallest tables first, but this is up to the actual compaction strategy). A node runs with {{concurrent_compactors=2}} and is doing a rather big compaction (> 200 GB) on a table. At the same time, a lot of small files are streamed in by repair, for a different table. Number of {{*-Data.db}} for that other table grows as high as 5,500 and estimated number of pending compaction tasks for this node jumps to over 180. But no compaction is started for the table with a lot of small data files, up until the only current compaction task finishes. Why is that so? I would expect that a free compaction slot is utilized immediately for new tasks. > Optimize SSTables upgrade task scheduling > - > > Key: CASSANDRA-14210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14210 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Oleksandr Shulgin >Assignee: Kurt Greaves >Priority: Major > Fix For: 4.x > > > When starting the SSTable-rewrite process by running {{nodetool > upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are > used. > For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}. > What we observed both for version 2.2 and 3.0, is that initially all 4 > provided slots are used for "Upgrade sstables" compactions, but later when > some of the 4 tasks are finished, no new tasks are scheduled immediately. It > takes the last of the 4 tasks to finish before new 4 tasks would be > scheduled. This happens on every node we've observed. > This doesn't utilize available resources to the full extent allowed by the > --jobs N parameter. In the field, on a cluster of 12 nodes with 4-5 TiB data > each, we've seen that the whole process was taking more than 7 days, instead > of estimated 1.5-2 days (provided there would be close to full N slots > utilization). > Instead, new tasks should be scheduled as soon as there is a free compaction > slot. > Additionally, starting from the biggest SSTables could further reduce the > total time required for the whole process to finish on any given node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-5836) Seed nodes should be able to bootstrap without manual intervention
[ https://issues.apache.org/jira/browse/CASSANDRA-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378207#comment-16378207 ] Oleksandr Shulgin commented on CASSANDRA-5836: -- [~jjirsa] thanks for reopening this. Before suggesting a fix I'd like to have a better understanding of what the bootstrap process really is. [~jbellis] could you please elaborate on the "special cases" you've mentioned? In the literature I can find definitions akin to "Bootstrapping is the process of claiming token ranges and streaming in the data from other nodes". This cannot be accurate, because the nodes which don't bootstrap (seeds or the ones having {{auto_bootstrap}} set to {{false}} explicitly) they also claim token ranges, the just don't stream the data in and are immediately responsible for handling read requests. If I understand it correctly, the above definition is what really "joining the ring" is, i.e. "claiming token ranges and (optionally) streaming in the data". By this reasoning bootstrapping is only about "streaming in the data". Is there anything else to the bootstrap process that I'm not aware of? Please clarify. > Seed nodes should be able to bootstrap without manual intervention > -- > > Key: CASSANDRA-5836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5836 > Project: Cassandra > Issue Type: Bug >Reporter: Bill Hathaway >Priority: Minor > > The current logic doesn't allow a seed node to be bootstrapped. If a user > wants to bootstrap a node configured as a seed (for example to replace a seed > node via replace_token), they first need to remove the node's own IP from the > seed list, and then start the bootstrap process. This seems like an > unnecessary step since a node never uses itself as a seed. > I think it would be a better experience if the logic was changed to allow a > seed node to bootstrap without manual intervention when there are other seed > nodes up in a ring. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10765) add RangeIterator interface and QueryPlan for SI
[ https://issues.apache.org/jira/browse/CASSANDRA-10765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378193#comment-16378193 ] Corentin Chary commented on CASSANDRA-10765: Note: after having troubles like this with SASI, we ended moving to [https://github.com/Stratio/stratio-cassandra.] IMHO leveraging lucene instead of building yet another index makes much more sense. Would be great to see SASI using Lucene internally (even if it's somewhat against the current design). Before using stratio we starting experimenting with a SASI-Like Lucene enabled index, see https://github.com/criteo/biggraphite/tree/master/tools/graphiteindex > add RangeIterator interface and QueryPlan for SI > > > Key: CASSANDRA-10765 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10765 > Project: Cassandra > Issue Type: Sub-task > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich >Priority: Major > Labels: 2i, sasi > Fix For: 4.x > > Attachments: server-load.png > > > Currently built-in indexes have only one way of handling > intersections/unions: pick the highest selectivity predicate and filter on > other index expressions. This is not always the most efficient approach. > Dynamic query planning based on the different index characteristics would be > more optimal. Query Plan should be able to choose how to do intersections, > unions based on the metadata provided by indexes (returned by RangeIterator) > and RangeIterator would became a base for cross index interactions and should > have information such as min/max token, estimate number of wrapped tokens etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org