[jira] [Updated] (CASSANDRA-19981) [Analytics] Fix invalid prefix char produced by BundleNameGenerator
[ https://issues.apache.org/jira/browse/CASSANDRA-19981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19981: -- Fix Version/s: NA Since Version: NA Source Control Link: https://github.com/apache/cassandra-analytics/commit/6556d251bdddfbef3935da760bcda2b2387a4391 Resolution: Fixed Status: Resolved (was: Ready to Commit) > [Analytics] Fix invalid prefix char produced by BundleNameGenerator > --- > > Key: CASSANDRA-19981 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19981 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: NA > > > BundleNameGenerator can produce prefix char that is out of the range of > [0-9|a-z|A-Z]. It is a bug and fixed by this patch. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19981) [Analytics] Fix invalid prefix char produced by BundleNameGenerator
[ https://issues.apache.org/jira/browse/CASSANDRA-19981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19981: -- Reviewers: Doug Rohrer, Yifan Cai Status: Review In Progress (was: Patch Available) > [Analytics] Fix invalid prefix char produced by BundleNameGenerator > --- > > Key: CASSANDRA-19981 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19981 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > BundleNameGenerator can produce prefix char that is out of the range of > [0-9|a-z|A-Z]. It is a bug and fixed by this patch. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19981) [Analytics] Fix invalid prefix char produced by BundleNameGenerator
[ https://issues.apache.org/jira/browse/CASSANDRA-19981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19981: -- Status: Ready to Commit (was: Review In Progress) > [Analytics] Fix invalid prefix char produced by BundleNameGenerator > --- > > Key: CASSANDRA-19981 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19981 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > BundleNameGenerator can produce prefix char that is out of the range of > [0-9|a-z|A-Z]. It is a bug and fixed by this patch. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19981) [Analytics] Fix invalid prefix char produced by BundleNameGenerator
[ https://issues.apache.org/jira/browse/CASSANDRA-19981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19981: -- Reviewers: Doug Rohrer (was: Doug Rohrer, Yifan Cai) > [Analytics] Fix invalid prefix char produced by BundleNameGenerator > --- > > Key: CASSANDRA-19981 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19981 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > BundleNameGenerator can produce prefix char that is out of the range of > [0-9|a-z|A-Z]. It is a bug and fixed by this patch. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19981) [Analytics] Fix invalid prefix char produced by BundleNameGenerator
[ https://issues.apache.org/jira/browse/CASSANDRA-19981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19981: -- Test and Documentation Plan: ci; unit test Status: Patch Available (was: Open) > [Analytics] Fix invalid prefix char produced by BundleNameGenerator > --- > > Key: CASSANDRA-19981 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19981 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > BundleNameGenerator can produce prefix char that is out of the range of > [0-9|a-z|A-Z]. It is a bug and fixed by this patch. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19981) [Analytics] Fix invalid prefix char produced by BundleNameGenerator
[ https://issues.apache.org/jira/browse/CASSANDRA-19981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19981: -- Bug Category: Parent values: Availability(12983)Level 1 values: Process Crash(12992) Complexity: Low Hanging Fruit Discovered By: User Report Severity: Normal Status: Open (was: Triage Needed) PR: https://github.com/apache/cassandra-analytics/pull/89 CI: https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19981%2Ftrunk > [Analytics] Fix invalid prefix char produced by BundleNameGenerator > --- > > Key: CASSANDRA-19981 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19981 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > BundleNameGenerator can produce prefix char that is out of the range of > [0-9|a-z|A-Z]. It is a bug and fixed by this patch. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19981) [Analytics] Fix invalid prefix char produced by BundleNameGenerator
Yifan Cai created CASSANDRA-19981: - Summary: [Analytics] Fix invalid prefix char produced by BundleNameGenerator Key: CASSANDRA-19981 URL: https://issues.apache.org/jira/browse/CASSANDRA-19981 Project: Cassandra Issue Type: Bug Components: Analytics Library Reporter: Yifan Cai Assignee: Yifan Cai BundleNameGenerator can produce prefix char that is out of the range of [0-9|a-z|A-Z]. It is a bug and fixed by this patch. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19933) [Analytics] Support aggregated consistency validation for multiple clusters
[ https://issues.apache.org/jira/browse/CASSANDRA-19933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19933: -- Fix Version/s: NA Source Control Link: https://github.com/apache/cassandra-analytics/commit/4624a17098e055e0abf9a6025451d4352cb9c147 Resolution: Fixed Status: Resolved (was: Ready to Commit) > [Analytics] Support aggregated consistency validation for multiple clusters > --- > > Key: CASSANDRA-19933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19933 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: NA > > Time Spent: 1h > Remaining Estimate: 0h > > This patch adds the aggregated consistency validation of multiple clusters > for coordinated write. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19933) [Analytics] Support aggregated consistency validation for multiple clusters
[ https://issues.apache.org/jira/browse/CASSANDRA-19933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19933: -- Status: Ready to Commit (was: Review In Progress) > [Analytics] Support aggregated consistency validation for multiple clusters > --- > > Key: CASSANDRA-19933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19933 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 50m > Remaining Estimate: 0h > > This patch adds the aggregated consistency validation of multiple clusters > for coordinated write. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19933) [Analytics] Support aggregated consistency validation for multiple clusters
[ https://issues.apache.org/jira/browse/CASSANDRA-19933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19933: -- Test and Documentation Plan: ci; unit Status: Patch Available (was: Open) PR: https://github.com/apache/cassandra-analytics/pull/86 CI: https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19933%2Fmultiple-clusters-consistency-validation > [Analytics] Support aggregated consistency validation for multiple clusters > --- > > Key: CASSANDRA-19933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19933 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > This patch adds the aggregated consistency validation of multiple clusters > for coordinated write. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19933) [Analytics] Support aggregated consistency validation for multiple clusters
[ https://issues.apache.org/jira/browse/CASSANDRA-19933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19933: -- Change Category: Semantic Complexity: Normal Component/s: Analytics Library Status: Open (was: Triage Needed) > [Analytics] Support aggregated consistency validation for multiple clusters > --- > > Key: CASSANDRA-19933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19933 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > This patch adds the aggregated consistency validation of multiple clusters > for coordinated write. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19933) [Analytics] Support aggregated consistency validation for multiple clusters
Yifan Cai created CASSANDRA-19933: - Summary: [Analytics] Support aggregated consistency validation for multiple clusters Key: CASSANDRA-19933 URL: https://issues.apache.org/jira/browse/CASSANDRA-19933 Project: Cassandra Issue Type: New Feature Reporter: Yifan Cai Assignee: Yifan Cai This patch adds the aggregated consistency validation of multiple clusters for coordinated write. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19923) [Analytics] Add transport extension for coordinated write
[ https://issues.apache.org/jira/browse/CASSANDRA-19923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19923: -- Fix Version/s: NA Source Control Link: https://github.com/apache/cassandra-analytics/commit/ff9ac41b4695c1df59f5293f69e0d3a1ce0da9f4 Resolution: Fixed Status: Resolved (was: Ready to Commit) > [Analytics] Add transport extension for coordinated write > - > > Key: CASSANDRA-19923 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19923 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: NA > > Time Spent: 1h > Remaining Estimate: 0h > > This patch introduces the CoordinatedTransportExtension and > CoordinationSignalListener to define the contract for the external write > coordinator (who implements the extension) to conduct the 2 phase write. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19923) [Analytics] Add transport extension for coordinated write
[ https://issues.apache.org/jira/browse/CASSANDRA-19923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19923: -- Status: Ready to Commit (was: Review In Progress) > [Analytics] Add transport extension for coordinated write > - > > Key: CASSANDRA-19923 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19923 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 1h > Remaining Estimate: 0h > > This patch introduces the CoordinatedTransportExtension and > CoordinationSignalListener to define the contract for the external write > coordinator (who implements the extension) to conduct the 2 phase write. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19910) [Analytics] Support data partitioning for multiple clusters coordinated write
[ https://issues.apache.org/jira/browse/CASSANDRA-19910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19910: -- Fix Version/s: NA Source Control Link: https://github.com/apache/cassandra-analytics/commit/4fb1e7f47d640353cd57f7a3035c70099049b29c Resolution: Fixed Status: Resolved (was: Ready to Commit) > [Analytics] Support data partitioning for multiple clusters coordinated write > - > > Key: CASSANDRA-19910 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19910 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: NA > > Time Spent: 3.5h > Remaining Estimate: 0h > > In the coordinated write, data partitioning should consider the consolidated > ring topology from all write-target clusters. Thus, the produced (spark) > partitions do not span across multiple nodes, causing inefficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19910) [Analytics] Support data partitioning for multiple clusters coordinated write
[ https://issues.apache.org/jira/browse/CASSANDRA-19910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19910: -- Status: Ready to Commit (was: Review In Progress) > [Analytics] Support data partitioning for multiple clusters coordinated write > - > > Key: CASSANDRA-19910 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19910 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 3h 20m > Remaining Estimate: 0h > > In the coordinated write, data partitioning should consider the consolidated > ring topology from all write-target clusters. Thus, the produced (spark) > partitions do not span across multiple nodes, causing inefficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19910) [Analytics] Support data partitioning for multiple clusters coordinated write
[ https://issues.apache.org/jira/browse/CASSANDRA-19910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19910: -- Reviewers: Doug Rohrer Status: Review In Progress (was: Patch Available) > [Analytics] Support data partitioning for multiple clusters coordinated write > - > > Key: CASSANDRA-19910 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19910 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 3h 20m > Remaining Estimate: 0h > > In the coordinated write, data partitioning should consider the consolidated > ring topology from all write-target clusters. Thus, the produced (spark) > partitions do not span across multiple nodes, causing inefficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19927) [Analytics] Deprecate old compression cache and move to using cache of CompressionMetadata
[ https://issues.apache.org/jira/browse/CASSANDRA-19927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882519#comment-17882519 ] Yifan Cai commented on CASSANDRA-19927: --- +1 on the patch. Thanks for addressing the comments. > [Analytics] Deprecate old compression cache and move to using cache of > CompressionMetadata > -- > > Key: CASSANDRA-19927 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19927 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: James Berragan >Assignee: James Berragan >Priority: Normal > Time Spent: 50m > Remaining Estimate: 0h > > The compression cache currently caches a single byte array for the > CompressionInfo.db file, this is a problem for large files as it involves > allocating and garbage collecting large memory segments, but also means that > every consumer of the bytes will instantiate a CompressionMetadata object and > allocate an individual BigLongArray to store the chunk offsets. This is > unnecessary as the CompressionMetadata is immutable and can be re-used. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19927) [Analytics] Deprecate old compression cache and move to using cache of CompressionMetadata
[ https://issues.apache.org/jira/browse/CASSANDRA-19927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19927: -- Status: Review In Progress (was: Patch Available) > [Analytics] Deprecate old compression cache and move to using cache of > CompressionMetadata > -- > > Key: CASSANDRA-19927 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19927 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: James Berragan >Assignee: James Berragan >Priority: Normal > Time Spent: 50m > Remaining Estimate: 0h > > The compression cache currently caches a single byte array for the > CompressionInfo.db file, this is a problem for large files as it involves > allocating and garbage collecting large memory segments, but also means that > every consumer of the bytes will instantiate a CompressionMetadata object and > allocate an individual BigLongArray to store the chunk offsets. This is > unnecessary as the CompressionMetadata is immutable and can be re-used. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19923) [Analytics] Add transport extension for coordinated write
[ https://issues.apache.org/jira/browse/CASSANDRA-19923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19923: -- Change Category: Semantic Complexity: Normal Status: Open (was: Triage Needed) > [Analytics] Add transport extension for coordinated write > - > > Key: CASSANDRA-19923 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19923 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 20m > Remaining Estimate: 0h > > This patch introduces the CoordinatedTransportExtension and > CoordinationSignalListener to define the contract for the external write > coordinator (who implements the extension) to conduct the 2 phase write. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19923) [Analytics] Add transport extension for coordinated write
[ https://issues.apache.org/jira/browse/CASSANDRA-19923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19923: -- Test and Documentation Plan: ci; unit Status: Patch Available (was: Open) PR: https://github.com/apache/cassandra-analytics/pull/83 CI: https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19923%2Ftransport-extension-for-coordinated-write > [Analytics] Add transport extension for coordinated write > - > > Key: CASSANDRA-19923 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19923 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 20m > Remaining Estimate: 0h > > This patch introduces the CoordinatedTransportExtension and > CoordinationSignalListener to define the contract for the external write > coordinator (who implements the extension) to conduct the 2 phase write. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19923) [Analytics] Add transport extension for coordinated write
Yifan Cai created CASSANDRA-19923: - Summary: [Analytics] Add transport extension for coordinated write Key: CASSANDRA-19923 URL: https://issues.apache.org/jira/browse/CASSANDRA-19923 Project: Cassandra Issue Type: New Feature Components: Analytics Library Reporter: Yifan Cai Assignee: Yifan Cai This patch introduces the CoordinatedTransportExtension and CoordinationSignalListener to define the contract for the external write coordinator (who implements the extension) to conduct the 2 phase write. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17666) Option to disable write path during streaming for CDC enabled tables
[ https://issues.apache.org/jira/browse/CASSANDRA-17666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881670#comment-17881670 ] Yifan Cai commented on CASSANDRA-17666: --- Hi [~nikolailoginov], new feature is trunk only. 4.1 is a branch in maintenance. It would require PMC vote to justify the backport, according to [this|https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle]. Please feel free to start a DISCUSSION thread in dev mail list. > Option to disable write path during streaming for CDC enabled tables > > > Key: CASSANDRA-17666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17666 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Change Data Capture >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: 5.0-alpha1, 5.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > For the CDC-enabled tables, a special write path is employed during > streaming. The mutations streamed are written into commit log first. > There are scenarios that the commit logs can accumulate, which lead to > failure of streaming and blocking writes. > I'd like to propose adding a dynamic toggle to disable the special write path > for CDC during streaming. > Please note that the toggle is a trade-off. Because the special write path is > there in the hope to ensure data consistency. Turning it off allows the > streaming to pass, but in some extreme scenarios, the downstream CDC > consumers may have holes in the stream, depending on how they consumes the > commit logs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19909) [Analytics] Add writer option COORDINATED_WRITE_CONFIG to define coordinated write to multiple Cassandra clusters
[ https://issues.apache.org/jira/browse/CASSANDRA-19909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19909: -- Fix Version/s: NA Source Control Link: https://github.com/apache/cassandra-analytics/commit/f123406e458c0112145f37dcd3f8c20ba47c949d Resolution: Fixed Status: Resolved (was: Ready to Commit) > [Analytics] Add writer option COORDINATED_WRITE_CONFIG to define coordinated > write to multiple Cassandra clusters > - > > Key: CASSANDRA-19909 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19909 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: NA > > Time Spent: 50m > Remaining Estimate: 0h > > As the first step of implementing coordinated write to multiple Cassandra > clusters, this patch introduces the new writer option, > COORDINATED_WRITE_CONFIG and the optional clusterId to identify clusters. The > COORDINATED_WRITE_CONFIG value is a json string that defines the target > clusters for the bulk write. > The coordinated write feature requires the exact same table schema (not > including table properties) across clusters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19909) [Analytics] Add writer option COORDINATED_WRITE_CONFIG to define coordinated write to multiple Cassandra clusters
[ https://issues.apache.org/jira/browse/CASSANDRA-19909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19909: -- Status: Ready to Commit (was: Review In Progress) > [Analytics] Add writer option COORDINATED_WRITE_CONFIG to define coordinated > write to multiple Cassandra clusters > - > > Key: CASSANDRA-19909 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19909 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 40m > Remaining Estimate: 0h > > As the first step of implementing coordinated write to multiple Cassandra > clusters, this patch introduces the new writer option, > COORDINATED_WRITE_CONFIG and the optional clusterId to identify clusters. The > COORDINATED_WRITE_CONFIG value is a json string that defines the target > clusters for the bulk write. > The coordinated write feature requires the exact same table schema (not > including table properties) across clusters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19909) [Analytics] Add writer option COORDINATED_WRITE_CONFIG to define coordinated write to multiple Cassandra clusters
[ https://issues.apache.org/jira/browse/CASSANDRA-19909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19909: -- Reviewers: Doug Rohrer, Francisco Guerrero (was: Francisco Guerrero) > [Analytics] Add writer option COORDINATED_WRITE_CONFIG to define coordinated > write to multiple Cassandra clusters > - > > Key: CASSANDRA-19909 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19909 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 40m > Remaining Estimate: 0h > > As the first step of implementing coordinated write to multiple Cassandra > clusters, this patch introduces the new writer option, > COORDINATED_WRITE_CONFIG and the optional clusterId to identify clusters. The > COORDINATED_WRITE_CONFIG value is a json string that defines the target > clusters for the bulk write. > The coordinated write feature requires the exact same table schema (not > including table properties) across clusters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19815) [Analytics] Decouple Cassandra types from Spark types so Cassandra types can be used independently from Spark
[ https://issues.apache.org/jira/browse/CASSANDRA-19815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880827#comment-17880827 ] Yifan Cai commented on CASSANDRA-19815: --- +1 on the patch > [Analytics] Decouple Cassandra types from Spark types so Cassandra types can > be used independently from Spark > - > > Key: CASSANDRA-19815 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19815 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: James Berragan >Assignee: James Berragan >Priority: Normal > Time Spent: 2h 40m > Remaining Estimate: 0h > > The Cassandra types and Spark types are tightly coupled in the same classes, > making it difficult to deserialize Cassandra types without pulling in Spark > as a dependency, We can split out the Spark types into a separate module by > introducing a new TypeConverter that maps Cassandra types to Spark types. > This enables use of the Cassandra types without pulling in Spark and also > opens the possibility of other TypeConverters in the future beyond Spark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19910) [Analytics] Support data partitioning for multiple clusters coordinated write
[ https://issues.apache.org/jira/browse/CASSANDRA-19910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19910: -- Test and Documentation Plan: ci; unit Status: Patch Available (was: Open) PR: https://github.com/apache/cassandra-analytics/pull/80 CI: https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19910%2Fsupport-multiple-clusters-for-data-partitioning > [Analytics] Support data partitioning for multiple clusters coordinated write > - > > Key: CASSANDRA-19910 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19910 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 20m > Remaining Estimate: 0h > > In the coordinated write, data partitioning should consider the consolidated > ring topology from all write-target clusters. Thus, the produced (spark) > partitions do not span across multiple nodes, causing inefficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19910) [Analytics] Support data partitioning for multiple clusters coordinated write
Yifan Cai created CASSANDRA-19910: - Summary: [Analytics] Support data partitioning for multiple clusters coordinated write Key: CASSANDRA-19910 URL: https://issues.apache.org/jira/browse/CASSANDRA-19910 Project: Cassandra Issue Type: New Feature Components: Analytics Library Reporter: Yifan Cai Assignee: Yifan Cai In the coordinated write, data partitioning should consider the consolidated ring topology from all write-target clusters. Thus, the produced (spark) partitions do not span across multiple nodes, causing inefficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19910) [Analytics] Support data partitioning for multiple clusters coordinated write
[ https://issues.apache.org/jira/browse/CASSANDRA-19910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19910: -- Change Category: Semantic Complexity: Normal Status: Open (was: Triage Needed) > [Analytics] Support data partitioning for multiple clusters coordinated write > - > > Key: CASSANDRA-19910 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19910 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > In the coordinated write, data partitioning should consider the consolidated > ring topology from all write-target clusters. Thus, the produced (spark) > partitions do not span across multiple nodes, causing inefficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19909) [Analytics] Add writer option COORDINATED_WRITE_CONFIG to define coordinated write to multiple Cassandra clusters
[ https://issues.apache.org/jira/browse/CASSANDRA-19909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19909: -- Description: As the first step of implementing coordinated write to multiple Cassandra clusters, this patch introduces the new writer option, COORDINATED_WRITE_CONFIG and the optional clusterId to identify clusters. The COORDINATED_WRITE_CONFIG value is a json string that defines the target clusters for the bulk write. The coordinated write feature requires the exact same table schema (not including table properties) across clusters. was: As the first step of implementing coordinated write to multiple Cassandra clusters, this patch introduces the new writer option, COORDINATED_WRITE_CONF and the optional clusterId to identify clusters. The COORDINATED_WRITE_CONF value is a json string that defines the target clusters for the bulk write. The coordinated write feature requires the exact same table schema (not including table properties) across clusters. > [Analytics] Add writer option COORDINATED_WRITE_CONFIG to define coordinated > write to multiple Cassandra clusters > - > > Key: CASSANDRA-19909 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19909 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 0.5h > Remaining Estimate: 0h > > As the first step of implementing coordinated write to multiple Cassandra > clusters, this patch introduces the new writer option, > COORDINATED_WRITE_CONFIG and the optional clusterId to identify clusters. The > COORDINATED_WRITE_CONFIG value is a json string that defines the target > clusters for the bulk write. > The coordinated write feature requires the exact same table schema (not > including table properties) across clusters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19815) [Analytics] Decouple Cassandra types from Spark types so Cassandra types can be used independently from Spark
[ https://issues.apache.org/jira/browse/CASSANDRA-19815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19815: -- Status: Review In Progress (was: Patch Available) > [Analytics] Decouple Cassandra types from Spark types so Cassandra types can > be used independently from Spark > - > > Key: CASSANDRA-19815 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19815 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: James Berragan >Priority: Normal > Time Spent: 2h 40m > Remaining Estimate: 0h > > The Cassandra types and Spark types are tightly coupled in the same classes, > making it difficult to deserialize Cassandra types without pulling in Spark > as a dependency, We can split out the Spark types into a separate module by > introducing a new TypeConverter that maps Cassandra types to Spark types. > This enables use of the Cassandra types without pulling in Spark and also > opens the possibility of other TypeConverters in the future beyond Spark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19909) [Analytics] Add writer option COORDINATED_WRITE_CONF to define coordinated write to multiple Cassandra clusters
[ https://issues.apache.org/jira/browse/CASSANDRA-19909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19909: -- Test and Documentation Plan: ci; unit test Status: Patch Available (was: Open) PR: https://github.com/apache/cassandra-analytics/pull/79 CI: https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19909%2Ftrunk-writer-option-for-multiple-clusters > [Analytics] Add writer option COORDINATED_WRITE_CONF to define coordinated > write to multiple Cassandra clusters > --- > > Key: CASSANDRA-19909 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19909 > Project: Cassandra > Issue Type: New Feature > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > As the first step of implementing coordinated write to multiple Cassandra > clusters, this patch introduces the new writer option, COORDINATED_WRITE_CONF > and the optional clusterId to identify clusters. The COORDINATED_WRITE_CONF > value is a json string that defines the target clusters for the bulk write. > The coordinated write feature requires the exact same table schema (not > including table properties) across clusters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19909) [Analytics] Add writer option COORDINATED_WRITE_CONF to define coordinated write to multiple Cassandra clusters
Yifan Cai created CASSANDRA-19909: - Summary: [Analytics] Add writer option COORDINATED_WRITE_CONF to define coordinated write to multiple Cassandra clusters Key: CASSANDRA-19909 URL: https://issues.apache.org/jira/browse/CASSANDRA-19909 Project: Cassandra Issue Type: New Feature Components: Analytics Library Reporter: Yifan Cai Assignee: Yifan Cai As the first step of implementing coordinated write to multiple Cassandra clusters, this patch introduces the new writer option, COORDINATED_WRITE_CONF and the optional clusterId to identify clusters. The COORDINATED_WRITE_CONF value is a json string that defines the target clusters for the bulk write. The coordinated write feature requires the exact same table schema (not including table properties) across clusters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19901) [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings
[ https://issues.apache.org/jira/browse/CASSANDRA-19901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19901: -- Fix Version/s: NA Source Control Link: https://github.com/apache/cassandra-analytics/commit/8655ca54a5d0749fccb2ad6a06ec230e8b0de24e Resolution: Fixed Status: Resolved (was: Ready to Commit) > [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings > - > > Key: CASSANDRA-19901 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19901 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: NA > > Time Spent: 40m > Remaining Estimate: 0h > > Proposing the refactoring of TokenRangeMapping and the related classes to use > proper types instead of String to improve maintainability. As of now, String > are used to represent IP, IP with port, node name, etc. It is difficult to > distinguish the actual types. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19901) [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings
[ https://issues.apache.org/jira/browse/CASSANDRA-19901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19901: -- Status: Ready to Commit (was: Review In Progress) > [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings > - > > Key: CASSANDRA-19901 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19901 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 0.5h > Remaining Estimate: 0h > > Proposing the refactoring of TokenRangeMapping and the related classes to use > proper types instead of String to improve maintainability. As of now, String > are used to represent IP, IP with port, node name, etc. It is difficult to > distinguish the actual types. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19901) [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings
[ https://issues.apache.org/jira/browse/CASSANDRA-19901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19901: -- Test and Documentation Plan: ci Status: Patch Available (was: Open) PR: https://github.com/apache/cassandra-analytics/pull/78 CI: https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19901%2Ftrunk-refactor-token-range-mapping > [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings > - > > Key: CASSANDRA-19901 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19901 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > Proposing the refactoring of TokenRangeMapping and the related classes to use > proper types instead of String to improve maintainability. As of now, String > are used to represent IP, IP with port, node name, etc. It is difficult to > distinguish the actual types. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19901) [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings
Yifan Cai created CASSANDRA-19901: - Summary: [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings Key: CASSANDRA-19901 URL: https://issues.apache.org/jira/browse/CASSANDRA-19901 Project: Cassandra Issue Type: Task Components: Analytics Library Reporter: Yifan Cai Assignee: Yifan Cai Proposing the refactoring of TokenRangeMapping and the related classes to use proper types instead of String to improve maintainability. As of now, String are used to represent IP, IP with port, node name, etc. It is difficult to distinguish the actual types. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19901) [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings
[ https://issues.apache.org/jira/browse/CASSANDRA-19901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19901: -- Issue Type: Improvement (was: Task) > [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings > - > > Key: CASSANDRA-19901 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19901 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > Proposing the refactoring of TokenRangeMapping and the related classes to use > proper types instead of String to improve maintainability. As of now, String > are used to represent IP, IP with port, node name, etc. It is difficult to > distinguish the actual types. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19873) [Analytics] Removes checks for blocked instances from bulk-write path
[ https://issues.apache.org/jira/browse/CASSANDRA-19873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879680#comment-17879680 ] Yifan Cai commented on CASSANDRA-19873: --- +1 > [Analytics] Removes checks for blocked instances from bulk-write path > - > > Key: CASSANDRA-19873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19873 > Project: Cassandra > Issue Type: Task > Components: Analytics Library >Reporter: Arjun Ashok >Assignee: Arjun Ashok >Priority: Normal > Time Spent: 20m > Remaining Estimate: 0h > > The analytics bulk writer currently performs checks for blocked instances for > consistency-level validations prior-to and after the bulk-write. It also > takes all the blocked nodes into account for these validations instead of the > nodes in the specific range being written (addressed separately under > https://issues.apache.org/jira/browse/CASSANDRA-19842). > Â > This change removes the notion of blocked instances for bulk-writes, treating > such nodes as available, as the intended usage of "blocked" nodes is to > operationally prevent client CQL connections going into the node, but not > writes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19873) [Analytics] Removes checks for blocked instances from bulk-write path
[ https://issues.apache.org/jira/browse/CASSANDRA-19873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19873: -- Reviewers: Yifan Cai Status: Review In Progress (was: Patch Available) > [Analytics] Removes checks for blocked instances from bulk-write path > - > > Key: CASSANDRA-19873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19873 > Project: Cassandra > Issue Type: Task > Components: Analytics Library >Reporter: Arjun Ashok >Assignee: Arjun Ashok >Priority: Normal > Time Spent: 20m > Remaining Estimate: 0h > > The analytics bulk writer currently performs checks for blocked instances for > consistency-level validations prior-to and after the bulk-write. It also > takes all the blocked nodes into account for these validations instead of the > nodes in the specific range being written (addressed separately under > https://issues.apache.org/jira/browse/CASSANDRA-19842). > Â > This change removes the notion of blocked instances for bulk-writes, treating > such nodes as available, as the intended usage of "blocked" nodes is to > operationally prevent client CQL connections going into the node, but not > writes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19842) [Analytics] Consistency level check incorrectly passes when majority of the replica set is unavailable for write
[ https://issues.apache.org/jira/browse/CASSANDRA-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19842: -- Fix Version/s: NA Since Version: NA Source Control Link: https://github.com/apache/cassandra-analytics/commit/cfe293dadcf7a1d4491591cfd39fc410a8fa52ba Resolution: Fixed Status: Resolved (was: Ready to Commit) > [Analytics] Consistency level check incorrectly passes when majority of the > replica set is unavailable for write > > > Key: CASSANDRA-19842 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19842 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: NA > > Time Spent: 2h > Remaining Estimate: 0h > > Consistency level check is performed before proceeding to bulk writing data. > The check yields wrong results that when the majority of a replica set is > unavailable, it still passes. Leading to writing data to replicas that cannot > satisfy the desired consistency level. > The following is the test to prove the bug. The test sets all 3 instances in > the replica set as blocked (unavailable), so the validation is expected to > throw. But it does not. > {code:java} > @Test > void test() > { > BulkWriterContext mockWriterContext = mock(BulkWriterContext.class); > ClusterInfo mockClusterInfo = mock(ClusterInfo.class); > when(mockWriterContext.cluster()).thenReturn(mockClusterInfo); > CassandraContext mockCassandraContext = mock(CassandraContext.class); > > when(mockClusterInfo.getCassandraContext()).thenReturn(mockCassandraContext); > Map replicationOptions = new HashMap<>(); > replicationOptions.put("class", "SimpleStrategy"); > replicationOptions.put("replication_factor", "3"); > TokenRangeMapping topology = > CassandraClusterInfo.getTokenRangeReplicas(() -> > mockSimpleTokenRangeReplicasResponse(10, 3), > > () -> Partitioner.Murmur3Partitioner, > > () -> new ReplicationFactor(replicationOptions), > > ringInstance -> { > > int nodeId = > Integer.parseInt(ringInstance.ipAddress().replace("localhost", "")); > > return nodeId <= 2; // block nodes 0, 1, 2 > > }); > > when(mockClusterInfo.getTokenRangeMapping(anyBoolean())).thenReturn(topology); > JobInfo mockJobInfo = mock(JobInfo.class); > UUID jobId = UUID.randomUUID(); > when(mockJobInfo.getId()).thenReturn(jobId.toString()); > when(mockJobInfo.getRestoreJobId()).thenReturn(jobId); > when(mockJobInfo.qualifiedTableName()).thenReturn(new > QualifiedTableName("testkeyspace", "testtable")); > > when(mockJobInfo.getConsistencyLevel()).thenReturn(ConsistencyLevel.CL.QUORUM); > when(mockJobInfo.effectiveSidecarPort()).thenReturn(9043); > when(mockJobInfo.jobKeepAliveMinutes()).thenReturn(-1); > when(mockWriterContext.job()).thenReturn(mockJobInfo); > BulkWriteValidator writerValidator = new > BulkWriteValidator(mockWriterContext, new > ReplicaAwareFailureHandler<>(Partitioner.Murmur3Partitioner)); > assertThatThrownBy(() -> writerValidator.validateClOrFail(topology)) > .isExactlyInstanceOf(RuntimeException.class) > .hasMessageContaining("Failed to load"); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19842) [Analytics] Consistency level check incorrectly passes when majority of the replica set is unavailable for write
[ https://issues.apache.org/jira/browse/CASSANDRA-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19842: -- Status: Ready to Commit (was: Review In Progress) > [Analytics] Consistency level check incorrectly passes when majority of the > replica set is unavailable for write > > > Key: CASSANDRA-19842 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19842 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 1h 50m > Remaining Estimate: 0h > > Consistency level check is performed before proceeding to bulk writing data. > The check yields wrong results that when the majority of a replica set is > unavailable, it still passes. Leading to writing data to replicas that cannot > satisfy the desired consistency level. > The following is the test to prove the bug. The test sets all 3 instances in > the replica set as blocked (unavailable), so the validation is expected to > throw. But it does not. > {code:java} > @Test > void test() > { > BulkWriterContext mockWriterContext = mock(BulkWriterContext.class); > ClusterInfo mockClusterInfo = mock(ClusterInfo.class); > when(mockWriterContext.cluster()).thenReturn(mockClusterInfo); > CassandraContext mockCassandraContext = mock(CassandraContext.class); > > when(mockClusterInfo.getCassandraContext()).thenReturn(mockCassandraContext); > Map replicationOptions = new HashMap<>(); > replicationOptions.put("class", "SimpleStrategy"); > replicationOptions.put("replication_factor", "3"); > TokenRangeMapping topology = > CassandraClusterInfo.getTokenRangeReplicas(() -> > mockSimpleTokenRangeReplicasResponse(10, 3), > > () -> Partitioner.Murmur3Partitioner, > > () -> new ReplicationFactor(replicationOptions), > > ringInstance -> { > > int nodeId = > Integer.parseInt(ringInstance.ipAddress().replace("localhost", "")); > > return nodeId <= 2; // block nodes 0, 1, 2 > > }); > > when(mockClusterInfo.getTokenRangeMapping(anyBoolean())).thenReturn(topology); > JobInfo mockJobInfo = mock(JobInfo.class); > UUID jobId = UUID.randomUUID(); > when(mockJobInfo.getId()).thenReturn(jobId.toString()); > when(mockJobInfo.getRestoreJobId()).thenReturn(jobId); > when(mockJobInfo.qualifiedTableName()).thenReturn(new > QualifiedTableName("testkeyspace", "testtable")); > > when(mockJobInfo.getConsistencyLevel()).thenReturn(ConsistencyLevel.CL.QUORUM); > when(mockJobInfo.effectiveSidecarPort()).thenReturn(9043); > when(mockJobInfo.jobKeepAliveMinutes()).thenReturn(-1); > when(mockWriterContext.job()).thenReturn(mockJobInfo); > BulkWriteValidator writerValidator = new > BulkWriteValidator(mockWriterContext, new > ReplicaAwareFailureHandler<>(Partitioner.Murmur3Partitioner)); > assertThatThrownBy(() -> writerValidator.validateClOrFail(topology)) > .isExactlyInstanceOf(RuntimeException.class) > .hasMessageContaining("Failed to load"); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19842) [Analytics] Consistency level check incorrectly passes when majority of the replica set is unavailable for write
[ https://issues.apache.org/jira/browse/CASSANDRA-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19842: -- Reviewers: Doug Rohrer, Francisco Guerrero (was: Doug Rohrer, Francisco Guerrero) > [Analytics] Consistency level check incorrectly passes when majority of the > replica set is unavailable for write > > > Key: CASSANDRA-19842 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19842 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 1h 50m > Remaining Estimate: 0h > > Consistency level check is performed before proceeding to bulk writing data. > The check yields wrong results that when the majority of a replica set is > unavailable, it still passes. Leading to writing data to replicas that cannot > satisfy the desired consistency level. > The following is the test to prove the bug. The test sets all 3 instances in > the replica set as blocked (unavailable), so the validation is expected to > throw. But it does not. > {code:java} > @Test > void test() > { > BulkWriterContext mockWriterContext = mock(BulkWriterContext.class); > ClusterInfo mockClusterInfo = mock(ClusterInfo.class); > when(mockWriterContext.cluster()).thenReturn(mockClusterInfo); > CassandraContext mockCassandraContext = mock(CassandraContext.class); > > when(mockClusterInfo.getCassandraContext()).thenReturn(mockCassandraContext); > Map replicationOptions = new HashMap<>(); > replicationOptions.put("class", "SimpleStrategy"); > replicationOptions.put("replication_factor", "3"); > TokenRangeMapping topology = > CassandraClusterInfo.getTokenRangeReplicas(() -> > mockSimpleTokenRangeReplicasResponse(10, 3), > > () -> Partitioner.Murmur3Partitioner, > > () -> new ReplicationFactor(replicationOptions), > > ringInstance -> { > > int nodeId = > Integer.parseInt(ringInstance.ipAddress().replace("localhost", "")); > > return nodeId <= 2; // block nodes 0, 1, 2 > > }); > > when(mockClusterInfo.getTokenRangeMapping(anyBoolean())).thenReturn(topology); > JobInfo mockJobInfo = mock(JobInfo.class); > UUID jobId = UUID.randomUUID(); > when(mockJobInfo.getId()).thenReturn(jobId.toString()); > when(mockJobInfo.getRestoreJobId()).thenReturn(jobId); > when(mockJobInfo.qualifiedTableName()).thenReturn(new > QualifiedTableName("testkeyspace", "testtable")); > > when(mockJobInfo.getConsistencyLevel()).thenReturn(ConsistencyLevel.CL.QUORUM); > when(mockJobInfo.effectiveSidecarPort()).thenReturn(9043); > when(mockJobInfo.jobKeepAliveMinutes()).thenReturn(-1); > when(mockWriterContext.job()).thenReturn(mockJobInfo); > BulkWriteValidator writerValidator = new > BulkWriteValidator(mockWriterContext, new > ReplicaAwareFailureHandler<>(Partitioner.Murmur3Partitioner)); > assertThatThrownBy(() -> writerValidator.validateClOrFail(topology)) > .isExactlyInstanceOf(RuntimeException.class) > .hasMessageContaining("Failed to load"); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19842) [Analytics] Consistency level check incorrectly passes when majority of the replica set is unavailable for write
[ https://issues.apache.org/jira/browse/CASSANDRA-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19842: -- Reviewers: Doug Rohrer, Francisco Guerrero (was: Francisco Guerrero) > [Analytics] Consistency level check incorrectly passes when majority of the > replica set is unavailable for write > > > Key: CASSANDRA-19842 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19842 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 1h 50m > Remaining Estimate: 0h > > Consistency level check is performed before proceeding to bulk writing data. > The check yields wrong results that when the majority of a replica set is > unavailable, it still passes. Leading to writing data to replicas that cannot > satisfy the desired consistency level. > The following is the test to prove the bug. The test sets all 3 instances in > the replica set as blocked (unavailable), so the validation is expected to > throw. But it does not. > {code:java} > @Test > void test() > { > BulkWriterContext mockWriterContext = mock(BulkWriterContext.class); > ClusterInfo mockClusterInfo = mock(ClusterInfo.class); > when(mockWriterContext.cluster()).thenReturn(mockClusterInfo); > CassandraContext mockCassandraContext = mock(CassandraContext.class); > > when(mockClusterInfo.getCassandraContext()).thenReturn(mockCassandraContext); > Map replicationOptions = new HashMap<>(); > replicationOptions.put("class", "SimpleStrategy"); > replicationOptions.put("replication_factor", "3"); > TokenRangeMapping topology = > CassandraClusterInfo.getTokenRangeReplicas(() -> > mockSimpleTokenRangeReplicasResponse(10, 3), > > () -> Partitioner.Murmur3Partitioner, > > () -> new ReplicationFactor(replicationOptions), > > ringInstance -> { > > int nodeId = > Integer.parseInt(ringInstance.ipAddress().replace("localhost", "")); > > return nodeId <= 2; // block nodes 0, 1, 2 > > }); > > when(mockClusterInfo.getTokenRangeMapping(anyBoolean())).thenReturn(topology); > JobInfo mockJobInfo = mock(JobInfo.class); > UUID jobId = UUID.randomUUID(); > when(mockJobInfo.getId()).thenReturn(jobId.toString()); > when(mockJobInfo.getRestoreJobId()).thenReturn(jobId); > when(mockJobInfo.qualifiedTableName()).thenReturn(new > QualifiedTableName("testkeyspace", "testtable")); > > when(mockJobInfo.getConsistencyLevel()).thenReturn(ConsistencyLevel.CL.QUORUM); > when(mockJobInfo.effectiveSidecarPort()).thenReturn(9043); > when(mockJobInfo.jobKeepAliveMinutes()).thenReturn(-1); > when(mockWriterContext.job()).thenReturn(mockJobInfo); > BulkWriteValidator writerValidator = new > BulkWriteValidator(mockWriterContext, new > ReplicaAwareFailureHandler<>(Partitioner.Murmur3Partitioner)); > assertThatThrownBy(() -> writerValidator.validateClOrFail(topology)) > .isExactlyInstanceOf(RuntimeException.class) > .hasMessageContaining("Failed to load"); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-143) Enable github squash in asf.yaml
[ https://issues.apache.org/jira/browse/CASSANDRASC-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRASC-143: -- Fix Version/s: 1.0 Source Control Link: https://github.com/apache/cassandra-sidecar/commit/f07e248d0ce8303a06daf93b462190ef7be7304d Resolution: Fixed Status: Resolved (was: Ready to Commit) > Enable github squash in asf.yaml > > > Key: CASSANDRASC-143 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-143 > Project: Sidecar for Apache Cassandra > Issue Type: Task > Components: Configuration >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: 1.0 > > Time Spent: 10m > Remaining Estimate: 0h > > CASSANDRA-19854 added the asf.yaml that disabled "Squash and Merge" option. > It is the option that has been used in the Cassandra Sidecar project. I had a > discussion with Mick and Stefan, and we agreed on enable the squash option. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-143) Enable github squash in asf.yaml
[ https://issues.apache.org/jira/browse/CASSANDRASC-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRASC-143: -- Reviewers: Francisco Guerrero, Stefan Miklosovic (was: Francisco Guerrero) > Enable github squash in asf.yaml > > > Key: CASSANDRASC-143 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-143 > Project: Sidecar for Apache Cassandra > Issue Type: Task > Components: Configuration >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > CASSANDRA-19854 added the asf.yaml that disabled "Squash and Merge" option. > It is the option that has been used in the Cassandra Sidecar project. I had a > discussion with Mick and Stefan, and we agreed on enable the squash option. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-143) Enable github squash in asf.yaml
[ https://issues.apache.org/jira/browse/CASSANDRASC-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRASC-143: -- Status: Ready to Commit (was: Review In Progress) > Enable github squash in asf.yaml > > > Key: CASSANDRASC-143 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-143 > Project: Sidecar for Apache Cassandra > Issue Type: Task > Components: Configuration >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > CASSANDRA-19854 added the asf.yaml that disabled "Squash and Merge" option. > It is the option that has been used in the Cassandra Sidecar project. I had a > discussion with Mick and Stefan, and we agreed on enable the squash option. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-143) Enable github squash in asf.yaml
[ https://issues.apache.org/jira/browse/CASSANDRASC-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRASC-143: -- Authors: Yifan Cai Test and Documentation Plan: no test for GitHub configuration Status: Patch Available (was: Open) PR: https://github.com/apache/cassandra-sidecar/pull/134 > Enable github squash in asf.yaml > > > Key: CASSANDRASC-143 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-143 > Project: Sidecar for Apache Cassandra > Issue Type: Task > Components: Configuration >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > CASSANDRA-19854 added the asf.yaml that disabled "Squash and Merge" option. > It is the option that has been used in the Cassandra Sidecar project. I had a > discussion with Mick and Stefan, and we agreed on enable the squash option. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-143) Enable github squash in asf.yaml
[ https://issues.apache.org/jira/browse/CASSANDRASC-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRASC-143: -- Change Category: Operability Complexity: Low Hanging Fruit Component/s: Configuration Status: Open (was: Triage Needed) > Enable github squash in asf.yaml > > > Key: CASSANDRASC-143 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-143 > Project: Sidecar for Apache Cassandra > Issue Type: Task > Components: Configuration >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > CASSANDRA-19854 added the asf.yaml that disabled "Squash and Merge" option. > It is the option that has been used in the Cassandra Sidecar project. I had a > discussion with Mick and Stefan, and we agreed on enable the squash option. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRASC-143) Enable github squash in asf.yaml
Yifan Cai created CASSANDRASC-143: - Summary: Enable github squash in asf.yaml Key: CASSANDRASC-143 URL: https://issues.apache.org/jira/browse/CASSANDRASC-143 Project: Sidecar for Apache Cassandra Issue Type: Task Reporter: Yifan Cai Assignee: Yifan Cai CASSANDRA-19854 added the asf.yaml that disabled "Squash and Merge" option. It is the option that has been used in the Cassandra Sidecar project. I had a discussion with Mick and Stefan, and we agreed on enable the squash option. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-142) Improve S3 download throttling with range-GetObject
[ https://issues.apache.org/jira/browse/CASSANDRASC-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRASC-142: -- Fix Version/s: 1.0 Source Control Link: https://github.com/apache/cassandra-sidecar/commit/4601e28529996a3447e74093cc6cc35879143031 Resolution: Fixed Status: Resolved (was: Ready to Commit) > Improve S3 download throttling with range-GetObject > --- > > Key: CASSANDRASC-142 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-142 > Project: Sidecar for Apache Cassandra > Issue Type: Improvement > Components: Rest API >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Labels: pull-request-available > Fix For: 1.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The current s3 download throttling in sidecar is implemented by blocking the > streaming consumption. The block happens in Netty event loop threads. The > blocking prolongs each connection, leading to connection reset or suboptimal > concurrency. > This patch changes the throttling mechanism to be range-GetObject based. Each > request retrieves a data range of the object once permitted by rate limiter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-142) Improve S3 download throttling with range-GetObject
[ https://issues.apache.org/jira/browse/CASSANDRASC-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRASC-142: -- Reviewers: Doug Rohrer, Saranya Krishnakumar (was: Saranya Krishnakumar) > Improve S3 download throttling with range-GetObject > --- > > Key: CASSANDRASC-142 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-142 > Project: Sidecar for Apache Cassandra > Issue Type: Improvement > Components: Rest API >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > The current s3 download throttling in sidecar is implemented by blocking the > streaming consumption. The block happens in Netty event loop threads. The > blocking prolongs each connection, leading to connection reset or suboptimal > concurrency. > This patch changes the throttling mechanism to be range-GetObject based. Each > request retrieves a data range of the object once permitted by rate limiter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-142) Improve S3 download throttling with range-GetObject
[ https://issues.apache.org/jira/browse/CASSANDRASC-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRASC-142: -- Status: Ready to Commit (was: Review In Progress) > Improve S3 download throttling with range-GetObject > --- > > Key: CASSANDRASC-142 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-142 > Project: Sidecar for Apache Cassandra > Issue Type: Improvement > Components: Rest API >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > The current s3 download throttling in sidecar is implemented by blocking the > streaming consumption. The block happens in Netty event loop threads. The > blocking prolongs each connection, leading to connection reset or suboptimal > concurrency. > This patch changes the throttling mechanism to be range-GetObject based. Each > request retrieves a data range of the object once permitted by rate limiter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19842) [Analytics] Consistency level check incorrectly passes when majority of the replica set is unavailable for write
[ https://issues.apache.org/jira/browse/CASSANDRA-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19842: -- Test and Documentation Plan: ci; unit Status: Patch Available (was: Open) PR: https://github.com/apache/cassandra-analytics/pull/75 CI: https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19842%2Ftrunk > [Analytics] Consistency level check incorrectly passes when majority of the > replica set is unavailable for write > > > Key: CASSANDRA-19842 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19842 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > Consistency level check is performed before proceeding to bulk writing data. > The check yields wrong results that when the majority of a replica set is > unavailable, it still passes. Leading to writing data to replicas that cannot > satisfy the desired consistency level. > The following is the test to prove the bug. The test sets all 3 instances in > the replica set as blocked (unavailable), so the validation is expected to > throw. But it does not. > {code:java} > @Test > void test() > { > BulkWriterContext mockWriterContext = mock(BulkWriterContext.class); > ClusterInfo mockClusterInfo = mock(ClusterInfo.class); > when(mockWriterContext.cluster()).thenReturn(mockClusterInfo); > CassandraContext mockCassandraContext = mock(CassandraContext.class); > > when(mockClusterInfo.getCassandraContext()).thenReturn(mockCassandraContext); > Map replicationOptions = new HashMap<>(); > replicationOptions.put("class", "SimpleStrategy"); > replicationOptions.put("replication_factor", "3"); > TokenRangeMapping topology = > CassandraClusterInfo.getTokenRangeReplicas(() -> > mockSimpleTokenRangeReplicasResponse(10, 3), > > () -> Partitioner.Murmur3Partitioner, > > () -> new ReplicationFactor(replicationOptions), > > ringInstance -> { > > int nodeId = > Integer.parseInt(ringInstance.ipAddress().replace("localhost", "")); > > return nodeId <= 2; // block nodes 0, 1, 2 > > }); > > when(mockClusterInfo.getTokenRangeMapping(anyBoolean())).thenReturn(topology); > JobInfo mockJobInfo = mock(JobInfo.class); > UUID jobId = UUID.randomUUID(); > when(mockJobInfo.getId()).thenReturn(jobId.toString()); > when(mockJobInfo.getRestoreJobId()).thenReturn(jobId); > when(mockJobInfo.qualifiedTableName()).thenReturn(new > QualifiedTableName("testkeyspace", "testtable")); > > when(mockJobInfo.getConsistencyLevel()).thenReturn(ConsistencyLevel.CL.QUORUM); > when(mockJobInfo.effectiveSidecarPort()).thenReturn(9043); > when(mockJobInfo.jobKeepAliveMinutes()).thenReturn(-1); > when(mockWriterContext.job()).thenReturn(mockJobInfo); > BulkWriteValidator writerValidator = new > BulkWriteValidator(mockWriterContext, new > ReplicaAwareFailureHandler<>(Partitioner.Murmur3Partitioner)); > assertThatThrownBy(() -> writerValidator.validateClOrFail(topology)) > .isExactlyInstanceOf(RuntimeException.class) > .hasMessageContaining("Failed to load"); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19836) [Analytics] Fix NPE when writing UDT values
[ https://issues.apache.org/jira/browse/CASSANDRA-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19836: -- Fix Version/s: NA Since Version: NA Source Control Link: https://github.com/apache/cassandra-analytics/commit/555e8494d3ca27a7b35aebabb1f669eede20cc53 Resolution: Fixed Status: Resolved (was: Ready to Commit) > [Analytics] Fix NPE when writing UDT values > --- > > Key: CASSANDRA-19836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19836 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Labels: pull-request-available > Fix For: NA > > Time Spent: 0.5h > Remaining Estimate: 0h > > When UDT field values are set to null, the bulk writer throws NPE, e.g. the > stacktrace below. Although it is on the boolean type, the NPE can be thrown > on all other types whenever the value is null. > {code:java} > Caused by: java.lang.NullPointerException > at > org.apache.cassandra.spark.data.types.Boolean.setInnerValue(Boolean.java:91) > at > org.apache.cassandra.spark.data.complex.CqlUdt.setInnerValue(CqlUdt.java:534) > at > org.apache.cassandra.spark.data.complex.CqlUdt.toUserTypeValue(CqlUdt.java:522) > at > org.apache.cassandra.spark.data.complex.CqlUdt.convertForCqlWriter(CqlUdt.java:169) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.maybeConvertUdt(RecordWriter.java:450) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.getBindValuesForColumns(RecordWriter.java:432) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.writeRow(RecordWriter.java:415) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.write(RecordWriter.java:202) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19836) [Analytics] Fix NPE when writing UDT values
[ https://issues.apache.org/jira/browse/CASSANDRA-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19836: -- Reviewers: Dinesh Joshi, Doug Rohrer (was: Doug Rohrer, Yifan Cai) > [Analytics] Fix NPE when writing UDT values > --- > > Key: CASSANDRA-19836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19836 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When UDT field values are set to null, the bulk writer throws NPE, e.g. the > stacktrace below. Although it is on the boolean type, the NPE can be thrown > on all other types whenever the value is null. > {code:java} > Caused by: java.lang.NullPointerException > at > org.apache.cassandra.spark.data.types.Boolean.setInnerValue(Boolean.java:91) > at > org.apache.cassandra.spark.data.complex.CqlUdt.setInnerValue(CqlUdt.java:534) > at > org.apache.cassandra.spark.data.complex.CqlUdt.toUserTypeValue(CqlUdt.java:522) > at > org.apache.cassandra.spark.data.complex.CqlUdt.convertForCqlWriter(CqlUdt.java:169) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.maybeConvertUdt(RecordWriter.java:450) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.getBindValuesForColumns(RecordWriter.java:432) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.writeRow(RecordWriter.java:415) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.write(RecordWriter.java:202) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19836) [Analytics] Fix NPE when writing UDT values
[ https://issues.apache.org/jira/browse/CASSANDRA-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19836: -- Status: Ready to Commit (was: Review In Progress) > [Analytics] Fix NPE when writing UDT values > --- > > Key: CASSANDRA-19836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19836 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When UDT field values are set to null, the bulk writer throws NPE, e.g. the > stacktrace below. Although it is on the boolean type, the NPE can be thrown > on all other types whenever the value is null. > {code:java} > Caused by: java.lang.NullPointerException > at > org.apache.cassandra.spark.data.types.Boolean.setInnerValue(Boolean.java:91) > at > org.apache.cassandra.spark.data.complex.CqlUdt.setInnerValue(CqlUdt.java:534) > at > org.apache.cassandra.spark.data.complex.CqlUdt.toUserTypeValue(CqlUdt.java:522) > at > org.apache.cassandra.spark.data.complex.CqlUdt.convertForCqlWriter(CqlUdt.java:169) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.maybeConvertUdt(RecordWriter.java:450) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.getBindValuesForColumns(RecordWriter.java:432) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.writeRow(RecordWriter.java:415) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.write(RecordWriter.java:202) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19836) [Analytics] Fix NPE when writing UDT values
[ https://issues.apache.org/jira/browse/CASSANDRA-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19836: -- Reviewers: Doug Rohrer, Yifan Cai Status: Review In Progress (was: Patch Available) > [Analytics] Fix NPE when writing UDT values > --- > > Key: CASSANDRA-19836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19836 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 20m > Remaining Estimate: 0h > > When UDT field values are set to null, the bulk writer throws NPE, e.g. the > stacktrace below. Although it is on the boolean type, the NPE can be thrown > on all other types whenever the value is null. > {code:java} > Caused by: java.lang.NullPointerException > at > org.apache.cassandra.spark.data.types.Boolean.setInnerValue(Boolean.java:91) > at > org.apache.cassandra.spark.data.complex.CqlUdt.setInnerValue(CqlUdt.java:534) > at > org.apache.cassandra.spark.data.complex.CqlUdt.toUserTypeValue(CqlUdt.java:522) > at > org.apache.cassandra.spark.data.complex.CqlUdt.convertForCqlWriter(CqlUdt.java:169) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.maybeConvertUdt(RecordWriter.java:450) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.getBindValuesForColumns(RecordWriter.java:432) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.writeRow(RecordWriter.java:415) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.write(RecordWriter.java:202) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19842) [Analytics] Consistency level check incorrectly passes when majority of the replica set is unavailable for write
[ https://issues.apache.org/jira/browse/CASSANDRA-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19842: -- Bug Category: Parent values: Correctness(12982)Level 1 values: Consistency(12989) Complexity: Normal Discovered By: Code Inspection Severity: Critical Status: Open (was: Triage Needed) > [Analytics] Consistency level check incorrectly passes when majority of the > replica set is unavailable for write > > > Key: CASSANDRA-19842 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19842 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > Consistency level check is performed before proceeding to bulk writing data. > The check yields wrong results that when the majority of a replica set is > unavailable, it still passes. Leading to writing data to replicas that cannot > satisfy the desired consistency level. > The following is the test to prove the bug. The test sets all 3 instances in > the replica set as blocked (unavailable), so the validation is expected to > throw. But it does not. > {code:java} > @Test > void test() > { > BulkWriterContext mockWriterContext = mock(BulkWriterContext.class); > ClusterInfo mockClusterInfo = mock(ClusterInfo.class); > when(mockWriterContext.cluster()).thenReturn(mockClusterInfo); > CassandraContext mockCassandraContext = mock(CassandraContext.class); > > when(mockClusterInfo.getCassandraContext()).thenReturn(mockCassandraContext); > Map replicationOptions = new HashMap<>(); > replicationOptions.put("class", "SimpleStrategy"); > replicationOptions.put("replication_factor", "3"); > TokenRangeMapping topology = > CassandraClusterInfo.getTokenRangeReplicas(() -> > mockSimpleTokenRangeReplicasResponse(10, 3), > > () -> Partitioner.Murmur3Partitioner, > > () -> new ReplicationFactor(replicationOptions), > > ringInstance -> { > > int nodeId = > Integer.parseInt(ringInstance.ipAddress().replace("localhost", "")); > > return nodeId <= 2; // block nodes 0, 1, 2 > > }); > > when(mockClusterInfo.getTokenRangeMapping(anyBoolean())).thenReturn(topology); > JobInfo mockJobInfo = mock(JobInfo.class); > UUID jobId = UUID.randomUUID(); > when(mockJobInfo.getId()).thenReturn(jobId.toString()); > when(mockJobInfo.getRestoreJobId()).thenReturn(jobId); > when(mockJobInfo.qualifiedTableName()).thenReturn(new > QualifiedTableName("testkeyspace", "testtable")); > > when(mockJobInfo.getConsistencyLevel()).thenReturn(ConsistencyLevel.CL.QUORUM); > when(mockJobInfo.effectiveSidecarPort()).thenReturn(9043); > when(mockJobInfo.jobKeepAliveMinutes()).thenReturn(-1); > when(mockWriterContext.job()).thenReturn(mockJobInfo); > BulkWriteValidator writerValidator = new > BulkWriteValidator(mockWriterContext, new > ReplicaAwareFailureHandler<>(Partitioner.Murmur3Partitioner)); > assertThatThrownBy(() -> writerValidator.validateClOrFail(topology)) > .isExactlyInstanceOf(RuntimeException.class) > .hasMessageContaining("Failed to load"); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19842) [Analytics] Consistency level check incorrectly passes when majority of the replica set is unavailable for write
Yifan Cai created CASSANDRA-19842: - Summary: [Analytics] Consistency level check incorrectly passes when majority of the replica set is unavailable for write Key: CASSANDRA-19842 URL: https://issues.apache.org/jira/browse/CASSANDRA-19842 Project: Cassandra Issue Type: Bug Components: Analytics Library Reporter: Yifan Cai Assignee: Yifan Cai Consistency level check is performed before proceeding to bulk writing data. The check yields wrong results that when the majority of a replica set is unavailable, it still passes. Leading to writing data to replicas that cannot satisfy the desired consistency level. The following is the test to prove the bug. The test sets all 3 instances in the replica set as blocked (unavailable), so the validation is expected to throw. But it does not. {code:java} @Test void test() { BulkWriterContext mockWriterContext = mock(BulkWriterContext.class); ClusterInfo mockClusterInfo = mock(ClusterInfo.class); when(mockWriterContext.cluster()).thenReturn(mockClusterInfo); CassandraContext mockCassandraContext = mock(CassandraContext.class); when(mockClusterInfo.getCassandraContext()).thenReturn(mockCassandraContext); Map replicationOptions = new HashMap<>(); replicationOptions.put("class", "SimpleStrategy"); replicationOptions.put("replication_factor", "3"); TokenRangeMapping topology = CassandraClusterInfo.getTokenRangeReplicas(() -> mockSimpleTokenRangeReplicasResponse(10, 3), () -> Partitioner.Murmur3Partitioner, () -> new ReplicationFactor(replicationOptions), ringInstance -> { int nodeId = Integer.parseInt(ringInstance.ipAddress().replace("localhost", "")); return nodeId <= 2; // block nodes 0, 1, 2 }); when(mockClusterInfo.getTokenRangeMapping(anyBoolean())).thenReturn(topology); JobInfo mockJobInfo = mock(JobInfo.class); UUID jobId = UUID.randomUUID(); when(mockJobInfo.getId()).thenReturn(jobId.toString()); when(mockJobInfo.getRestoreJobId()).thenReturn(jobId); when(mockJobInfo.qualifiedTableName()).thenReturn(new QualifiedTableName("testkeyspace", "testtable")); when(mockJobInfo.getConsistencyLevel()).thenReturn(ConsistencyLevel.CL.QUORUM); when(mockJobInfo.effectiveSidecarPort()).thenReturn(9043); when(mockJobInfo.jobKeepAliveMinutes()).thenReturn(-1); when(mockWriterContext.job()).thenReturn(mockJobInfo); BulkWriteValidator writerValidator = new BulkWriteValidator(mockWriterContext, new ReplicaAwareFailureHandler<>(Partitioner.Murmur3Partitioner)); assertThatThrownBy(() -> writerValidator.validateClOrFail(topology)) .isExactlyInstanceOf(RuntimeException.class) .hasMessageContaining("Failed to load"); } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-142) Improve S3 download throttling with range-GetObject
[ https://issues.apache.org/jira/browse/CASSANDRASC-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRASC-142: -- Summary: Improve S3 download throttling with range-GetObject (was: [Sidecar] Improve S3 download throttling with range-GetObject) > Improve S3 download throttling with range-GetObject > --- > > Key: CASSANDRASC-142 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-142 > Project: Sidecar for Apache Cassandra > Issue Type: Improvement > Components: Rest API >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Labels: pull-request-available > > The current s3 download throttling in sidecar is implemented by blocking the > streaming consumption. The block happens in Netty event loop threads. The > blocking prolongs each connection, leading to connection reset or suboptimal > concurrency. > This patch changes the throttling mechanism to be range-GetObject based. Each > request retrieves a data range of the object once permitted by rate limiter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19836) [Analytics] Fix NPE when writing UDT values
[ https://issues.apache.org/jira/browse/CASSANDRA-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19836: -- Bug Category: Parent values: Availability(12983)Level 1 values: Process Crash(12992) Complexity: Normal Discovered By: User Report Severity: Normal Status: Open (was: Triage Needed) PR: https://github.com/apache/cassandra-analytics/pull/74 CI: https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19836%2Ftrunk > [Analytics] Fix NPE when writing UDT values > --- > > Key: CASSANDRA-19836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19836 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > When UDT field values are set to null, the bulk writer throws NPE, e.g. the > stacktrace below. Although it is on the boolean type, the NPE can be thrown > on all other types whenever the value is null. > {code:java} > Caused by: java.lang.NullPointerException > at > org.apache.cassandra.spark.data.types.Boolean.setInnerValue(Boolean.java:91) > at > org.apache.cassandra.spark.data.complex.CqlUdt.setInnerValue(CqlUdt.java:534) > at > org.apache.cassandra.spark.data.complex.CqlUdt.toUserTypeValue(CqlUdt.java:522) > at > org.apache.cassandra.spark.data.complex.CqlUdt.convertForCqlWriter(CqlUdt.java:169) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.maybeConvertUdt(RecordWriter.java:450) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.getBindValuesForColumns(RecordWriter.java:432) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.writeRow(RecordWriter.java:415) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.write(RecordWriter.java:202) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19836) [Analytics] Fix NPE when writing UDT values
[ https://issues.apache.org/jira/browse/CASSANDRA-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19836: -- Test and Documentation Plan: ci; integration test Status: Patch Available (was: Open) > [Analytics] Fix NPE when writing UDT values > --- > > Key: CASSANDRA-19836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19836 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > When UDT field values are set to null, the bulk writer throws NPE, e.g. the > stacktrace below. Although it is on the boolean type, the NPE can be thrown > on all other types whenever the value is null. > {code:java} > Caused by: java.lang.NullPointerException > at > org.apache.cassandra.spark.data.types.Boolean.setInnerValue(Boolean.java:91) > at > org.apache.cassandra.spark.data.complex.CqlUdt.setInnerValue(CqlUdt.java:534) > at > org.apache.cassandra.spark.data.complex.CqlUdt.toUserTypeValue(CqlUdt.java:522) > at > org.apache.cassandra.spark.data.complex.CqlUdt.convertForCqlWriter(CqlUdt.java:169) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.maybeConvertUdt(RecordWriter.java:450) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.getBindValuesForColumns(RecordWriter.java:432) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.writeRow(RecordWriter.java:415) > at > org.apache.cassandra.spark.bulkwriter.RecordWriter.write(RecordWriter.java:202) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19836) [Analytics] Fix NPE when writing UDT values
Yifan Cai created CASSANDRA-19836: - Summary: [Analytics] Fix NPE when writing UDT values Key: CASSANDRA-19836 URL: https://issues.apache.org/jira/browse/CASSANDRA-19836 Project: Cassandra Issue Type: Bug Components: Analytics Library Reporter: Yifan Cai Assignee: Yifan Cai When UDT field values are set to null, the bulk writer throws NPE, e.g. the stacktrace below. Although it is on the boolean type, the NPE can be thrown on all other types whenever the value is null. {code:java} Caused by: java.lang.NullPointerException at org.apache.cassandra.spark.data.types.Boolean.setInnerValue(Boolean.java:91) at org.apache.cassandra.spark.data.complex.CqlUdt.setInnerValue(CqlUdt.java:534) at org.apache.cassandra.spark.data.complex.CqlUdt.toUserTypeValue(CqlUdt.java:522) at org.apache.cassandra.spark.data.complex.CqlUdt.convertForCqlWriter(CqlUdt.java:169) at org.apache.cassandra.spark.bulkwriter.RecordWriter.maybeConvertUdt(RecordWriter.java:450) at org.apache.cassandra.spark.bulkwriter.RecordWriter.getBindValuesForColumns(RecordWriter.java:432) at org.apache.cassandra.spark.bulkwriter.RecordWriter.writeRow(RecordWriter.java:415) at org.apache.cassandra.spark.bulkwriter.RecordWriter.write(RecordWriter.java:202) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-142) [Sidecar] Improve S3 download throttling with range-GetObject
[ https://issues.apache.org/jira/browse/CASSANDRASC-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRASC-142: -- Authors: Yifan Cai Test and Documentation Plan: ci; unit Status: Patch Available (was: Open) PR: https://github.com/apache/cassandra-sidecar/pull/132 CI: https://app.circleci.com/pipelines/github/yifan-c/cassandra-sidecar?branch=CASSANDRASC-142%2Ftrunk-storage-client > [Sidecar] Improve S3 download throttling with range-GetObject > - > > Key: CASSANDRASC-142 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-142 > Project: Sidecar for Apache Cassandra > Issue Type: Improvement > Components: Rest API >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Labels: pull-request-available > > The current s3 download throttling in sidecar is implemented by blocking the > streaming consumption. The block happens in Netty event loop threads. The > blocking prolongs each connection, leading to connection reset or suboptimal > concurrency. > This patch changes the throttling mechanism to be range-GetObject based. Each > request retrieves a data range of the object once permitted by rate limiter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-142) [Sidecar] Improve S3 download throttling with range-GetObject
[ https://issues.apache.org/jira/browse/CASSANDRASC-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRASC-142: -- Change Category: Performance Complexity: Normal Component/s: Rest API Status: Open (was: Triage Needed) > [Sidecar] Improve S3 download throttling with range-GetObject > - > > Key: CASSANDRASC-142 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-142 > Project: Sidecar for Apache Cassandra > Issue Type: Improvement > Components: Rest API >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > The current s3 download throttling in sidecar is implemented by blocking the > streaming consumption. The block happens in Netty event loop threads. The > blocking prolongs each connection, leading to connection reset or suboptimal > concurrency. > This patch changes the throttling mechanism to be range-GetObject based. Each > request retrieves a data range of the object once permitted by rate limiter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRASC-142) [Sidecar] Improve S3 download throttling with range-GetObject
Yifan Cai created CASSANDRASC-142: - Summary: [Sidecar] Improve S3 download throttling with range-GetObject Key: CASSANDRASC-142 URL: https://issues.apache.org/jira/browse/CASSANDRASC-142 Project: Sidecar for Apache Cassandra Issue Type: Improvement Reporter: Yifan Cai Assignee: Yifan Cai The current s3 download throttling in sidecar is implemented by blocking the streaming consumption. The block happens in Netty event loop threads. The blocking prolongs each connection, leading to connection reset or suboptimal concurrency. This patch changes the throttling mechanism to be range-GetObject based. Each request retrieves a data range of the object once permitted by rate limiter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19827) [Analytics] Add job_timeout_seconds writer option
[ https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19827: -- Fix Version/s: NA Source Control Link: https://github.com/apache/cassandra-analytics/commit/d75a6bae5abbf80810012a181644f240141014d5 Resolution: Fixed Status: Resolved (was: Ready to Commit) > [Analytics] Add job_timeout_seconds writer option > - > > Key: CASSANDRA-19827 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19827 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: NA > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Option to specify the timeout in seconds for bulk write jobs. By default, it > is disabled. > When JOB_TIMEOUT_SECONDS is specified, a job exceeding the timeout is: > - successful when the desired consistency level is met > - a failure otherwise -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19827) [Analytics] Add job_timeout_seconds writer option
[ https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19827: -- Status: Ready to Commit (was: Review In Progress) > [Analytics] Add job_timeout_seconds writer option > - > > Key: CASSANDRA-19827 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19827 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 1.5h > Remaining Estimate: 0h > > Option to specify the timeout in seconds for bulk write jobs. By default, it > is disabled. > When JOB_TIMEOUT_SECONDS is specified, a job exceeding the timeout is: > - successful when the desired consistency level is met > - a failure otherwise -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19827) [Analytics] Add job_timeout_seconds writer option
[ https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19827: -- Reviewers: Dinesh Joshi, Doug Rohrer Status: Review In Progress (was: Patch Available) > [Analytics] Add job_timeout_seconds writer option > - > > Key: CASSANDRA-19827 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19827 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 1.5h > Remaining Estimate: 0h > > Option to specify the timeout in seconds for bulk write jobs. By default, it > is disabled. > When JOB_TIMEOUT_SECONDS is specified, a job exceeding the timeout is: > - successful when the desired consistency level is met > - a failure otherwise -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19827) [Analytics] Add job_timeout_seconds writer option
[ https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19827: -- Description: Option to specify the timeout in seconds for bulk write jobs. By default, it is disabled. When JOB_TIMEOUT_SECONDS is specified, a job exceeding the timeout is: - successful when the desired consistency level is met - a failure otherwise was: Option to specify the ideal timeout in seconds for bulk write jobs. It is only effective when the bulk write job is using S3_COMPACT data transport mode. When JOB_IDEAL_TIMEOUT_SECONDS is specified and less than the actual time the bulk write job needs to achieve the specified consistency level, it is ignored and job only exit after the desired consistency level has been satisfied. For example, a bulk write job indeed requires 1 hour to achieve LOCAL_QUORUM, it ignores any JOB_IDEAL_TIMEOUT_SECONDS that is less than 3600 seconds (1 hour), and only complete after 1 hour. If JOB_IDEAL_TIMEOUT_SECONDS is 5400 seconds (1.5 hours), the job after achieve LOCAL_QUORUM waits for at most 0.5 hours in addition. The effective wait time is the minimum of the remaining time to ideal timeout and the estimated wait time to finish all slice import (as estimated in org.apache.cassandra.spark.bulkwriter.ImportCompletionCoordinator). The ideal timeout is ignored in order to complete the bulk write job in some circumstances, hence named "ideal". > [Analytics] Add job_timeout_seconds writer option > - > > Key: CASSANDRA-19827 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19827 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 1.5h > Remaining Estimate: 0h > > Option to specify the timeout in seconds for bulk write jobs. By default, it > is disabled. > When JOB_TIMEOUT_SECONDS is specified, a job exceeding the timeout is: > - successful when the desired consistency level is met > - a failure otherwise -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19827) [Analytics] Add job_timeout_seconds writer option
[ https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19827: -- Summary: [Analytics] Add job_timeout_seconds writer option (was: [Analytics] Add job_ideal_timeout_seconds writer option) > [Analytics] Add job_timeout_seconds writer option > - > > Key: CASSANDRA-19827 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19827 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > Option to specify the ideal timeout in seconds for bulk write jobs. > It is only effective when the bulk write job is using S3_COMPACT data > transport mode. > When JOB_IDEAL_TIMEOUT_SECONDS is specified and less than the actual time the > bulk write job > needs to achieve the specified consistency level, it is ignored and job only > exit after the desired consistency level has been satisfied. > For example, a bulk write job indeed requires 1 hour to achieve LOCAL_QUORUM, > it ignores > any JOB_IDEAL_TIMEOUT_SECONDS that is less than 3600 seconds (1 hour), and > only complete after 1 hour. > If JOB_IDEAL_TIMEOUT_SECONDS is 5400 seconds (1.5 hours), the job after > achieve LOCAL_QUORUM waits for at most 0.5 hours in addition. The effective > wait time is the minimum of the remaining time to ideal timeout and the > estimated wait time to finish all slice import (as estimated > in org.apache.cassandra.spark.bulkwriter.ImportCompletionCoordinator). > The ideal timeout is ignored in order to complete the bulk write job in some > circumstances, hence named "ideal". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19827) [Analytics] Add job_ideal_timeout_seconds writer option
[ https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873297#comment-17873297 ] Yifan Cai commented on CASSANDRA-19827: --- I talked with [~drohrer] offline (, as he is reviewing the patch). We decided to generalize job_ideal_timeout_seconds to just job_timeout_seconds for improved clarity. It is the timeout that applies to bulk write in general, instead of only for S3_COMPACT as job_ideal_timeout_seconds. I will update the jira title accordingly. > [Analytics] Add job_ideal_timeout_seconds writer option > --- > > Key: CASSANDRA-19827 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19827 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > Option to specify the ideal timeout in seconds for bulk write jobs. > It is only effective when the bulk write job is using S3_COMPACT data > transport mode. > When JOB_IDEAL_TIMEOUT_SECONDS is specified and less than the actual time the > bulk write job > needs to achieve the specified consistency level, it is ignored and job only > exit after the desired consistency level has been satisfied. > For example, a bulk write job indeed requires 1 hour to achieve LOCAL_QUORUM, > it ignores > any JOB_IDEAL_TIMEOUT_SECONDS that is less than 3600 seconds (1 hour), and > only complete after 1 hour. > If JOB_IDEAL_TIMEOUT_SECONDS is 5400 seconds (1.5 hours), the job after > achieve LOCAL_QUORUM waits for at most 0.5 hours in addition. The effective > wait time is the minimum of the remaining time to ideal timeout and the > estimated wait time to finish all slice import (as estimated > in org.apache.cassandra.spark.bulkwriter.ImportCompletionCoordinator). > The ideal timeout is ignored in order to complete the bulk write job in some > circumstances, hence named "ideal". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19827) [Analytics] Add job_ideal_timeout_seconds writer option
[ https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873029#comment-17873029 ] Yifan Cai commented on CASSANDRA-19827: --- CI is green > [Analytics] Add job_ideal_timeout_seconds writer option > --- > > Key: CASSANDRA-19827 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19827 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > Option to specify the ideal timeout in seconds for bulk write jobs. > It is only effective when the bulk write job is using S3_COMPACT data > transport mode. > When JOB_IDEAL_TIMEOUT_SECONDS is specified and less than the actual time the > bulk write job > needs to achieve the specified consistency level, it is ignored and job only > exit after the desired consistency level has been satisfied. > For example, a bulk write job indeed requires 1 hour to achieve LOCAL_QUORUM, > it ignores > any JOB_IDEAL_TIMEOUT_SECONDS that is less than 3600 seconds (1 hour), and > only complete after 1 hour. > If JOB_IDEAL_TIMEOUT_SECONDS is 5400 seconds (1.5 hours), the job after > achieve LOCAL_QUORUM waits for at most 0.5 hours in addition. The effective > wait time is the minimum of the remaining time to ideal timeout and the > estimated wait time to finish all slice import (as estimated > in org.apache.cassandra.spark.bulkwriter.ImportCompletionCoordinator). > The ideal timeout is ignored in order to complete the bulk write job in some > circumstances, hence named "ideal". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19827) [Analytics] Add job_ideal_timeout_seconds writer option
[ https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19827: -- Test and Documentation Plan: ci; unit Status: Patch Available (was: Open) PR: https://github.com/apache/cassandra-analytics/pull/73 CI: https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19827%2Ftrunk > [Analytics] Add job_ideal_timeout_seconds writer option > --- > > Key: CASSANDRA-19827 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19827 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > Option to specify the ideal timeout in seconds for bulk write jobs. > It is only effective when the bulk write job is using S3_COMPACT data > transport mode. > When JOB_IDEAL_TIMEOUT_SECONDS is specified and less than the actual time the > bulk write job > needs to achieve the specified consistency level, it is ignored and job only > exit after the desired consistency level has been satisfied. > For example, a bulk write job indeed requires 1 hour to achieve LOCAL_QUORUM, > it ignores > any JOB_IDEAL_TIMEOUT_SECONDS that is less than 3600 seconds (1 hour), and > only complete after 1 hour. > If JOB_IDEAL_TIMEOUT_SECONDS is 5400 seconds (1.5 hours), the job after > achieve LOCAL_QUORUM waits for at most 0.5 hours in addition. The effective > wait time is the minimum of the remaining time to ideal timeout and the > estimated wait time to finish all slice import (as estimated > in org.apache.cassandra.spark.bulkwriter.ImportCompletionCoordinator). > The ideal timeout is ignored in order to complete the bulk write job in some > circumstances, hence named "ideal". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19827) [Analytics] Add job_ideal_timeout_seconds writer option
Yifan Cai created CASSANDRA-19827: - Summary: [Analytics] Add job_ideal_timeout_seconds writer option Key: CASSANDRA-19827 URL: https://issues.apache.org/jira/browse/CASSANDRA-19827 Project: Cassandra Issue Type: Improvement Components: Analytics Library Reporter: Yifan Cai Assignee: Yifan Cai Option to specify the ideal timeout in seconds for bulk write jobs. It is only effective when the bulk write job is using S3_COMPACT data transport mode. When JOB_IDEAL_TIMEOUT_SECONDS is specified and less than the actual time the bulk write job needs to achieve the specified consistency level, it is ignored and job only exit after the desired consistency level has been satisfied. For example, a bulk write job indeed requires 1 hour to achieve LOCAL_QUORUM, it ignores any JOB_IDEAL_TIMEOUT_SECONDS that is less than 3600 seconds (1 hour), and only complete after 1 hour. If JOB_IDEAL_TIMEOUT_SECONDS is 5400 seconds (1.5 hours), the job after achieve LOCAL_QUORUM waits for at most 0.5 hours in addition. The effective wait time is the minimum of the remaining time to ideal timeout and the estimated wait time to finish all slice import (as estimated in org.apache.cassandra.spark.bulkwriter.ImportCompletionCoordinator). The ideal timeout is ignored in order to complete the bulk write job in some circumstances, hence named "ideal". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19827) [Analytics] Add job_ideal_timeout_seconds writer option
[ https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19827: -- Change Category: Operability Complexity: Normal Status: Open (was: Triage Needed) > [Analytics] Add job_ideal_timeout_seconds writer option > --- > > Key: CASSANDRA-19827 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19827 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > Option to specify the ideal timeout in seconds for bulk write jobs. > It is only effective when the bulk write job is using S3_COMPACT data > transport mode. > When JOB_IDEAL_TIMEOUT_SECONDS is specified and less than the actual time the > bulk write job > needs to achieve the specified consistency level, it is ignored and job only > exit after the desired consistency level has been satisfied. > For example, a bulk write job indeed requires 1 hour to achieve LOCAL_QUORUM, > it ignores > any JOB_IDEAL_TIMEOUT_SECONDS that is less than 3600 seconds (1 hour), and > only complete after 1 hour. > If JOB_IDEAL_TIMEOUT_SECONDS is 5400 seconds (1.5 hours), the job after > achieve LOCAL_QUORUM waits for at most 0.5 hours in addition. The effective > wait time is the minimum of the remaining time to ideal timeout and the > estimated wait time to finish all slice import (as estimated > in org.apache.cassandra.spark.bulkwriter.ImportCompletionCoordinator). > The ideal timeout is ignored in order to complete the bulk write job in some > circumstances, hence named "ideal". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19821) Prevent double closing SSTable writer
[ https://issues.apache.org/jira/browse/CASSANDRA-19821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19821: -- Fix Version/s: NA Since Version: NA Source Control Link: https://github.com/apache/cassandra-analytics/commit/dbbd211cd420eb185d0579f16f5d46abc7bafeb4 Resolution: Fixed Status: Resolved (was: Ready to Commit) > Prevent double closing SSTable writer > - > > Key: CASSANDRA-19821 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19821 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: NA > > Time Spent: 1h > Remaining Estimate: 0h > > Analytics uses `org.apache.cassandra.io.sstable.SSTableSimpleWriter` to > produce SSTables. Its implementation allows to be closed multiple times. > However, the subsequent calls to "close" cause exception. For example, > {code:java} > java.lang.RuntimeException: Last written key > DecoratedKey(-3078932293011064831, 22fd) >= current key > DecoratedKey(-3078932293011064831, 22fd) writing into nb-1-big-Data.db > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.beforeAppend(BigTableWriter.java:169) > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:208) > at > org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.append(SimpleSSTableMultiWriter.java:48) > at > org.apache.cassandra.io.sstable.SSTableTxnWriter.append(SSTableTxnWriter.java:57) > at > org.apache.cassandra.io.sstable.SSTableSimpleWriter.writePartition(SSTableSimpleWriter.java:152) > at > org.apache.cassandra.io.sstable.SSTableSimpleWriter.writeLastPartitionUpdate(SSTableSimpleWriter.java:125) > at > org.apache.cassandra.io.sstable.SSTableSimpleWriter.close(SSTableSimpleWriter.java:93) > at > org.apache.cassandra.io.sstable.CQLSSTableWriter.close(CQLSSTableWriter.java:337) > {code} > Cassandra analytics should prevent double closing the underlying writer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19821) Prevent double closing SSTable writer
[ https://issues.apache.org/jira/browse/CASSANDRA-19821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19821: -- Status: Ready to Commit (was: Review In Progress) > Prevent double closing SSTable writer > - > > Key: CASSANDRA-19821 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19821 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 1h > Remaining Estimate: 0h > > Analytics uses `org.apache.cassandra.io.sstable.SSTableSimpleWriter` to > produce SSTables. Its implementation allows to be closed multiple times. > However, the subsequent calls to "close" cause exception. For example, > {code:java} > java.lang.RuntimeException: Last written key > DecoratedKey(-3078932293011064831, 22fd) >= current key > DecoratedKey(-3078932293011064831, 22fd) writing into nb-1-big-Data.db > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.beforeAppend(BigTableWriter.java:169) > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:208) > at > org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.append(SimpleSSTableMultiWriter.java:48) > at > org.apache.cassandra.io.sstable.SSTableTxnWriter.append(SSTableTxnWriter.java:57) > at > org.apache.cassandra.io.sstable.SSTableSimpleWriter.writePartition(SSTableSimpleWriter.java:152) > at > org.apache.cassandra.io.sstable.SSTableSimpleWriter.writeLastPartitionUpdate(SSTableSimpleWriter.java:125) > at > org.apache.cassandra.io.sstable.SSTableSimpleWriter.close(SSTableSimpleWriter.java:93) > at > org.apache.cassandra.io.sstable.CQLSSTableWriter.close(CQLSSTableWriter.java:337) > {code} > Cassandra analytics should prevent double closing the underlying writer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19821) Prevent double closing SSTable writer
[ https://issues.apache.org/jira/browse/CASSANDRA-19821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19821: -- Test and Documentation Plan: CI; unit, integration Status: Patch Available (was: Open) PR: https://github.com/apache/cassandra-analytics/pull/72 CI: https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19821%2Ftrunk > Prevent double closing SSTable writer > - > > Key: CASSANDRA-19821 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19821 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > Analytics uses `org.apache.cassandra.io.sstable.SSTableSimpleWriter` to > produce SSTables. Its implementation allows to be closed multiple times. > However, the subsequent calls to "close" cause exception. For example, > {code:java} > java.lang.RuntimeException: Last written key > DecoratedKey(-3078932293011064831, 22fd) >= current key > DecoratedKey(-3078932293011064831, 22fd) writing into nb-1-big-Data.db > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.beforeAppend(BigTableWriter.java:169) > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:208) > at > org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.append(SimpleSSTableMultiWriter.java:48) > at > org.apache.cassandra.io.sstable.SSTableTxnWriter.append(SSTableTxnWriter.java:57) > at > org.apache.cassandra.io.sstable.SSTableSimpleWriter.writePartition(SSTableSimpleWriter.java:152) > at > org.apache.cassandra.io.sstable.SSTableSimpleWriter.writeLastPartitionUpdate(SSTableSimpleWriter.java:125) > at > org.apache.cassandra.io.sstable.SSTableSimpleWriter.close(SSTableSimpleWriter.java:93) > at > org.apache.cassandra.io.sstable.CQLSSTableWriter.close(CQLSSTableWriter.java:337) > {code} > Cassandra analytics should prevent double closing the underlying writer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19821) Prevent double closing SSTable writer
[ https://issues.apache.org/jira/browse/CASSANDRA-19821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19821: -- Bug Category: Parent values: Availability(12983)Level 1 values: Process Crash(12992) Complexity: Normal Discovered By: Adhoc Test Severity: Normal Status: Open (was: Triage Needed) > Prevent double closing SSTable writer > - > > Key: CASSANDRA-19821 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19821 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > Analytics uses `org.apache.cassandra.io.sstable.SSTableSimpleWriter` to > produce SSTables. Its implementation allows to be closed multiple times. > However, the subsequent calls to "close" cause exception. For example, > {code:java} > java.lang.RuntimeException: Last written key > DecoratedKey(-3078932293011064831, 22fd) >= current key > DecoratedKey(-3078932293011064831, 22fd) writing into nb-1-big-Data.db > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.beforeAppend(BigTableWriter.java:169) > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:208) > at > org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.append(SimpleSSTableMultiWriter.java:48) > at > org.apache.cassandra.io.sstable.SSTableTxnWriter.append(SSTableTxnWriter.java:57) > at > org.apache.cassandra.io.sstable.SSTableSimpleWriter.writePartition(SSTableSimpleWriter.java:152) > at > org.apache.cassandra.io.sstable.SSTableSimpleWriter.writeLastPartitionUpdate(SSTableSimpleWriter.java:125) > at > org.apache.cassandra.io.sstable.SSTableSimpleWriter.close(SSTableSimpleWriter.java:93) > at > org.apache.cassandra.io.sstable.CQLSSTableWriter.close(CQLSSTableWriter.java:337) > {code} > Cassandra analytics should prevent double closing the underlying writer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19821) Prevent double closing SSTable writer
Yifan Cai created CASSANDRA-19821: - Summary: Prevent double closing SSTable writer Key: CASSANDRA-19821 URL: https://issues.apache.org/jira/browse/CASSANDRA-19821 Project: Cassandra Issue Type: Bug Components: Analytics Library Reporter: Yifan Cai Assignee: Yifan Cai Analytics uses `org.apache.cassandra.io.sstable.SSTableSimpleWriter` to produce SSTables. Its implementation allows to be closed multiple times. However, the subsequent calls to "close" cause exception. For example, {code:java} java.lang.RuntimeException: Last written key DecoratedKey(-3078932293011064831, 22fd) >= current key DecoratedKey(-3078932293011064831, 22fd) writing into nb-1-big-Data.db at org.apache.cassandra.io.sstable.format.big.BigTableWriter.beforeAppend(BigTableWriter.java:169) at org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:208) at org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.append(SimpleSSTableMultiWriter.java:48) at org.apache.cassandra.io.sstable.SSTableTxnWriter.append(SSTableTxnWriter.java:57) at org.apache.cassandra.io.sstable.SSTableSimpleWriter.writePartition(SSTableSimpleWriter.java:152) at org.apache.cassandra.io.sstable.SSTableSimpleWriter.writeLastPartitionUpdate(SSTableSimpleWriter.java:125) at org.apache.cassandra.io.sstable.SSTableSimpleWriter.close(SSTableSimpleWriter.java:93) at org.apache.cassandra.io.sstable.CQLSSTableWriter.close(CQLSSTableWriter.java:337) {code} Cassandra analytics should prevent double closing the underlying writer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19806) [Analytics] Stream sstable eagerly when bulk writing to allow reclaiming local disk space
[ https://issues.apache.org/jira/browse/CASSANDRA-19806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19806: -- Fix Version/s: NA Source Control Link: https://github.com/apache/cassandra-analytics/commit/e168011c40de2ca48d138514640838067e61feea Resolution: Fixed Status: Resolved (was: Ready to Commit) Merged into trunk as [e168011c|https://github.com/apache/cassandra-analytics/commit/e168011c40de2ca48d138514640838067e61feea] > [Analytics] Stream sstable eagerly when bulk writing to allow reclaiming > local disk space > - > > Key: CASSANDRA-19806 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19806 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Fix For: NA > > Time Spent: 20m > Remaining Estimate: 0h > > Currently, each bulk write executor only sends sstables after exhausting the > input data (of the task). All produced sstables are staged locally, when > executor local disk space is limited or the input data size is too large, > there is a risk of running out of disk space. > The patch changes the streaming strategy to stream eagerly and remove the > local files sooner. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-140) Updating traffic shaping options throws IllegalStateException
[ https://issues.apache.org/jira/browse/CASSANDRASC-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRASC-140: -- Reviewers: Arjun Ashok, Saranya Krishnakumar, Yifan Cai (was: Arjun Ashok, Saranya Krishnakumar, Yifan Cai, Yifan Cai) > Updating traffic shaping options throws IllegalStateException > - > > Key: CASSANDRASC-140 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-140 > Project: Sidecar for Apache Cassandra > Issue Type: Bug > Components: Rest API >Reporter: Francisco Guerrero >Assignee: Francisco Guerrero >Priority: Normal > > When updating the traffic shaping options in Sidecar in > {{org.apache.cassandra.sidecar.server.Server#updateTrafficShapingOptions}}, > we are encountering a bug in vert.x. The problem happens in > {{io.vertx.core.net.impl.TCPServerBase#updateTrafficShapingOptions}} where > the {{trafficShapingHandler}} is {{null}} for {{childHandler}}s. When a > {{null}} {{trafficShapingHandler}} is encountered, the following exception is > thrown: > {code:java} > throw new IllegalStateException("Unable to update traffic shaping options > because the server was not configured " + > "to use traffic shaping during startup"); > {code} > I propose a stopgap measure to fix the issue in Sidecar while we wait for a > new vert.x release that includes a fix for this issue. Without a fix, we risk > leaving Sidecar in unknown state after updating the traffic shaping options. > Because applying the traffic shaping options can succeed or fail before > encountering the exception. This can potentially leave a cluster of Sidecar > servers in an inconsistent state across all Sidecars. The only option to > return to a well-known state is by restarting the Sidecar process across the > cluster with the updated traffic shaping options applied in the yaml before > starting the process. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRASC-140) Updating traffic shaping options throws IllegalStateException
[ https://issues.apache.org/jira/browse/CASSANDRASC-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871468#comment-17871468 ] Yifan Cai commented on CASSANDRASC-140: --- +1 on the patch. > Updating traffic shaping options throws IllegalStateException > - > > Key: CASSANDRASC-140 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-140 > Project: Sidecar for Apache Cassandra > Issue Type: Bug > Components: Rest API >Reporter: Francisco Guerrero >Assignee: Francisco Guerrero >Priority: Normal > > When updating the traffic shaping options in Sidecar in > {{org.apache.cassandra.sidecar.server.Server#updateTrafficShapingOptions}}, > we are encountering a bug in vert.x. The problem happens in > {{io.vertx.core.net.impl.TCPServerBase#updateTrafficShapingOptions}} where > the {{trafficShapingHandler}} is {{null}} for {{childHandler}}s. When a > {{null}} {{trafficShapingHandler}} is encountered, the following exception is > thrown: > {code:java} > throw new IllegalStateException("Unable to update traffic shaping options > because the server was not configured " + > "to use traffic shaping during startup"); > {code} > I propose a stopgap measure to fix the issue in Sidecar while we wait for a > new vert.x release that includes a fix for this issue. Without a fix, we risk > leaving Sidecar in unknown state after updating the traffic shaping options. > Because applying the traffic shaping options can succeed or fail before > encountering the exception. This can potentially leave a cluster of Sidecar > servers in an inconsistent state across all Sidecars. The only option to > return to a well-known state is by restarting the Sidecar process across the > cluster with the updated traffic shaping options applied in the yaml before > starting the process. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-140) Updating traffic shaping options throws IllegalStateException
[ https://issues.apache.org/jira/browse/CASSANDRASC-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRASC-140: -- Reviewers: Arjun Ashok, Saranya Krishnakumar, Yifan Cai, Yifan Cai Status: Review In Progress (was: Patch Available) > Updating traffic shaping options throws IllegalStateException > - > > Key: CASSANDRASC-140 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-140 > Project: Sidecar for Apache Cassandra > Issue Type: Bug > Components: Rest API >Reporter: Francisco Guerrero >Assignee: Francisco Guerrero >Priority: Normal > > When updating the traffic shaping options in Sidecar in > {{org.apache.cassandra.sidecar.server.Server#updateTrafficShapingOptions}}, > we are encountering a bug in vert.x. The problem happens in > {{io.vertx.core.net.impl.TCPServerBase#updateTrafficShapingOptions}} where > the {{trafficShapingHandler}} is {{null}} for {{childHandler}}s. When a > {{null}} {{trafficShapingHandler}} is encountered, the following exception is > thrown: > {code:java} > throw new IllegalStateException("Unable to update traffic shaping options > because the server was not configured " + > "to use traffic shaping during startup"); > {code} > I propose a stopgap measure to fix the issue in Sidecar while we wait for a > new vert.x release that includes a fix for this issue. Without a fix, we risk > leaving Sidecar in unknown state after updating the traffic shaping options. > Because applying the traffic shaping options can succeed or fail before > encountering the exception. This can potentially leave a cluster of Sidecar > servers in an inconsistent state across all Sidecars. The only option to > return to a well-known state is by restarting the Sidecar process across the > cluster with the updated traffic shaping options applied in the yaml before > starting the process. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19807) [Analytics] Improve the core bulk reader test system to match actual and expected rows by concatenating the partition keys with the serialized hex string instead of
[ https://issues.apache.org/jira/browse/CASSANDRA-19807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19807: -- Fix Version/s: NA Source Control Link: https://github.com/apache/cassandra-analytics/commit/3023a204c8ef16f886bd3dc219f7534b7edbaf2a Resolution: Fixed Status: Resolved (was: Ready to Commit) Merged into trunk as [3023a204|https://github.com/apache/cassandra-analytics/commit/3023a204c8ef16f886bd3dc219f7534b7edbaf2a] > [Analytics] Improve the core bulk reader test system to match actual and > expected rows by concatenating the partition keys with the serialized hex > string instead of utf-8 string > - > > Key: CASSANDRA-19807 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19807 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: James Berragan >Assignee: James Berragan >Priority: Low > Fix For: NA > > Time Spent: 20m > Remaining Estimate: 0h > > The current test system for the bulk reader matches actual and expected rows > by building a utf-8 string of the concatenated partition key(s), it would be > better to match on the hex string of the serialized bytes to avoid the > current custom string builder implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19807) [Analytics] Improve the core bulk reader test system to match actual and expected rows by concatenating the partition keys with the serialized hex string instead o
[ https://issues.apache.org/jira/browse/CASSANDRA-19807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870667#comment-17870667 ] Yifan Cai commented on CASSANDRA-19807: --- +1 > [Analytics] Improve the core bulk reader test system to match actual and > expected rows by concatenating the partition keys with the serialized hex > string instead of utf-8 string > - > > Key: CASSANDRA-19807 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19807 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: James Berragan >Assignee: James Berragan >Priority: Low > Time Spent: 10m > Remaining Estimate: 0h > > The current test system for the bulk reader matches actual and expected rows > by building a utf-8 string of the concatenated partition key(s), it would be > better to match on the hex string of the serialized bytes to avoid the > current custom string builder implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19807) [Analytics] Improve the core bulk reader test system to match actual and expected rows by concatenating the partition keys with the serialized hex string instead of
[ https://issues.apache.org/jira/browse/CASSANDRA-19807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19807: -- Status: Ready to Commit (was: Review In Progress) > [Analytics] Improve the core bulk reader test system to match actual and > expected rows by concatenating the partition keys with the serialized hex > string instead of utf-8 string > - > > Key: CASSANDRA-19807 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19807 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: James Berragan >Assignee: James Berragan >Priority: Low > Time Spent: 10m > Remaining Estimate: 0h > > The current test system for the bulk reader matches actual and expected rows > by building a utf-8 string of the concatenated partition key(s), it would be > better to match on the hex string of the serialized bytes to avoid the > current custom string builder implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19806) [Analytics] Stream sstable eagerly when bulk writing to allow reclaiming local disk space
[ https://issues.apache.org/jira/browse/CASSANDRA-19806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19806: -- Test and Documentation Plan: CI Status: Patch Available (was: Open) > [Analytics] Stream sstable eagerly when bulk writing to allow reclaiming > local disk space > - > > Key: CASSANDRA-19806 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19806 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > Currently, each bulk write executor only sends sstables after exhausting the > input data (of the task). All produced sstables are staged locally, when > executor local disk space is limited or the input data size is too large, > there is a risk of running out of disk space. > The patch changes the streaming strategy to stream eagerly and remove the > local files sooner. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19806) [Analytics] Stream sstable eagerly when bulk writing to allow reclaiming local disk space
[ https://issues.apache.org/jira/browse/CASSANDRA-19806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19806: -- Change Category: Semantic Complexity: Normal Status: Open (was: Triage Needed) PR: https://github.com/apache/cassandra-analytics/pull/69 CI: https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19806%2Ftrunk > [Analytics] Stream sstable eagerly when bulk writing to allow reclaiming > local disk space > - > > Key: CASSANDRA-19806 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19806 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > Currently, each bulk write executor only sends sstables after exhausting the > input data (of the task). All produced sstables are staged locally, when > executor local disk space is limited or the input data size is too large, > there is a risk of running out of disk space. > The patch changes the streaming strategy to stream eagerly and remove the > local files sooner. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19800) Enhance CQLSSTableWriter to notify clients on sstable production
[ https://issues.apache.org/jira/browse/CASSANDRA-19800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868532#comment-17868532 ] Yifan Cai commented on CASSANDRA-19800: --- Attached the CI result. There were 7 failed tests. None of them look related with the patch. > Enhance CQLSSTableWriter to notify clients on sstable production > > > Key: CASSANDRA-19800 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19800 > Project: Cassandra > Issue Type: Improvement > Components: Tool/sstable >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Attachments: ci_summary.html, result_details.tar.gz > > > Notifying when SSTables are produced is useful for CQLSSTableWriter clients > to have a better control on processing the SSTables. For example, Cassandra > Analytics can leverage the notification to determine when to import the > sstables. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19800) Enhance CQLSSTableWriter to notify clients on sstable production
[ https://issues.apache.org/jira/browse/CASSANDRA-19800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19800: -- Attachment: result_details.tar.gz > Enhance CQLSSTableWriter to notify clients on sstable production > > > Key: CASSANDRA-19800 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19800 > Project: Cassandra > Issue Type: Improvement > Components: Tool/sstable >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Attachments: ci_summary.html, result_details.tar.gz > > > Notifying when SSTables are produced is useful for CQLSSTableWriter clients > to have a better control on processing the SSTables. For example, Cassandra > Analytics can leverage the notification to determine when to import the > sstables. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19800) Enhance CQLSSTableWriter to notify clients on sstable production
[ https://issues.apache.org/jira/browse/CASSANDRA-19800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19800: -- Attachment: ci_summary.html > Enhance CQLSSTableWriter to notify clients on sstable production > > > Key: CASSANDRA-19800 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19800 > Project: Cassandra > Issue Type: Improvement > Components: Tool/sstable >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Attachments: ci_summary.html > > > Notifying when SSTables are produced is useful for CQLSSTableWriter clients > to have a better control on processing the SSTables. For example, Cassandra > Analytics can leverage the notification to determine when to import the > sstables. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19793) [Analytics] Split the Cassandra type logic out from CassandraBridge so it can be utilized without the Spark dependency.
[ https://issues.apache.org/jira/browse/CASSANDRA-19793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868503#comment-17868503 ] Yifan Cai commented on CASSANDRA-19793: --- +1 on the patch > [Analytics] Split the Cassandra type logic out from CassandraBridge so it can > be utilized without the Spark dependency. > --- > > Key: CASSANDRA-19793 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19793 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: James Berragan >Assignee: James Berragan >Priority: Low > Time Spent: 20m > Remaining Estimate: 0h > > The CassandraBridge is a monolithic class that bridges to Cassandra but for > other use cases it is beneficial to access the Cassandra types independently > to deserialize Cassandra data. By splitting out the Cassandra types into a > separate object we can utilize Cassandra types for deserializing Cassandra > raw ByteBuffers decoupled from the Spark dependency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19800) Enhance CQLSSTableWriter to notify clients on sstable production
[ https://issues.apache.org/jira/browse/CASSANDRA-19800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19800: -- Test and Documentation Plan: unit test Status: Patch Available (was: Open) PR: https://github.com/apache/cassandra/pull/3439 > Enhance CQLSSTableWriter to notify clients on sstable production > > > Key: CASSANDRA-19800 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19800 > Project: Cassandra > Issue Type: Improvement > Components: Tool/sstable >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > Notifying when SSTables are produced is useful for CQLSSTableWriter clients > to have a better control on processing the SSTables. For example, Cassandra > Analytics can leverage the notification to determine when to import the > sstables. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19800) Enhance CQLSSTableWriter to notify clients on sstable production
[ https://issues.apache.org/jira/browse/CASSANDRA-19800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19800: -- Change Category: Semantic Complexity: Low Hanging Fruit Status: Open (was: Triage Needed) > Enhance CQLSSTableWriter to notify clients on sstable production > > > Key: CASSANDRA-19800 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19800 > Project: Cassandra > Issue Type: Improvement > Components: Tool/sstable >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > > Notifying when SSTables are produced is useful for CQLSSTableWriter clients > to have a better control on processing the SSTables. For example, Cassandra > Analytics can leverage the notification to determine when to import the > sstables. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19800) Enhance CQLSSTableWriter to notify clients on sstable production
Yifan Cai created CASSANDRA-19800: - Summary: Enhance CQLSSTableWriter to notify clients on sstable production Key: CASSANDRA-19800 URL: https://issues.apache.org/jira/browse/CASSANDRA-19800 Project: Cassandra Issue Type: Improvement Components: Tool/sstable Reporter: Yifan Cai Assignee: Yifan Cai Notifying when SSTables are produced is useful for CQLSSTableWriter clients to have a better control on processing the SSTables. For example, Cassandra Analytics can leverage the notification to determine when to import the sstables. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19793) [Analytics] Split the Cassandra type logic out from CassandraBridge so it can be utilized without the Spark dependency.
[ https://issues.apache.org/jira/browse/CASSANDRA-19793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-19793: -- Reviewers: Yifan Cai Status: Review In Progress (was: Patch Available) > [Analytics] Split the Cassandra type logic out from CassandraBridge so it can > be utilized without the Spark dependency. > --- > > Key: CASSANDRA-19793 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19793 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: James Berragan >Assignee: James Berragan >Priority: Low > > The CassandraBridge is a monolithic class that bridges to Cassandra but for > other use cases it is beneficial to access the Cassandra types independently > to deserialize Cassandra data. By splitting out the Cassandra types into a > separate object we can utilize Cassandra types for deserializing Cassandra > raw ByteBuffers decoupled from the Spark dependency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org