[jira] [Updated] (CASSANDRA-19981) [Analytics] Fix invalid prefix char produced by BundleNameGenerator

2024-10-04 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19981:
--
  Fix Version/s: NA
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra-analytics/commit/6556d251bdddfbef3935da760bcda2b2387a4391
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> [Analytics] Fix invalid prefix char produced by BundleNameGenerator
> ---
>
> Key: CASSANDRA-19981
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19981
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>
> BundleNameGenerator can produce prefix char that is out of the range of 
> [0-9|a-z|A-Z]. It is a bug and fixed by this patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19981) [Analytics] Fix invalid prefix char produced by BundleNameGenerator

2024-10-04 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19981:
--
Reviewers: Doug Rohrer, Yifan Cai
   Status: Review In Progress  (was: Patch Available)

> [Analytics] Fix invalid prefix char produced by BundleNameGenerator
> ---
>
> Key: CASSANDRA-19981
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19981
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> BundleNameGenerator can produce prefix char that is out of the range of 
> [0-9|a-z|A-Z]. It is a bug and fixed by this patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19981) [Analytics] Fix invalid prefix char produced by BundleNameGenerator

2024-10-04 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19981:
--
Status: Ready to Commit  (was: Review In Progress)

> [Analytics] Fix invalid prefix char produced by BundleNameGenerator
> ---
>
> Key: CASSANDRA-19981
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19981
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> BundleNameGenerator can produce prefix char that is out of the range of 
> [0-9|a-z|A-Z]. It is a bug and fixed by this patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19981) [Analytics] Fix invalid prefix char produced by BundleNameGenerator

2024-10-04 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19981:
--
Reviewers: Doug Rohrer  (was: Doug Rohrer, Yifan Cai)

> [Analytics] Fix invalid prefix char produced by BundleNameGenerator
> ---
>
> Key: CASSANDRA-19981
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19981
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> BundleNameGenerator can produce prefix char that is out of the range of 
> [0-9|a-z|A-Z]. It is a bug and fixed by this patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19981) [Analytics] Fix invalid prefix char produced by BundleNameGenerator

2024-10-04 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19981:
--
Test and Documentation Plan: ci; unit test
 Status: Patch Available  (was: Open)

> [Analytics] Fix invalid prefix char produced by BundleNameGenerator
> ---
>
> Key: CASSANDRA-19981
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19981
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> BundleNameGenerator can produce prefix char that is out of the range of 
> [0-9|a-z|A-Z]. It is a bug and fixed by this patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19981) [Analytics] Fix invalid prefix char produced by BundleNameGenerator

2024-10-04 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19981:
--
 Bug Category: Parent values: Availability(12983)Level 1 values: Process 
Crash(12992)
   Complexity: Low Hanging Fruit
Discovered By: User Report
 Severity: Normal
   Status: Open  (was: Triage Needed)

PR: https://github.com/apache/cassandra-analytics/pull/89
CI: 
https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19981%2Ftrunk

> [Analytics] Fix invalid prefix char produced by BundleNameGenerator
> ---
>
> Key: CASSANDRA-19981
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19981
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> BundleNameGenerator can produce prefix char that is out of the range of 
> [0-9|a-z|A-Z]. It is a bug and fixed by this patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19981) [Analytics] Fix invalid prefix char produced by BundleNameGenerator

2024-10-04 Thread Yifan Cai (Jira)
Yifan Cai created CASSANDRA-19981:
-

 Summary: [Analytics] Fix invalid prefix char produced by 
BundleNameGenerator
 Key: CASSANDRA-19981
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19981
 Project: Cassandra
  Issue Type: Bug
  Components: Analytics Library
Reporter: Yifan Cai
Assignee: Yifan Cai


BundleNameGenerator can produce prefix char that is out of the range of 
[0-9|a-z|A-Z]. It is a bug and fixed by this patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19933) [Analytics] Support aggregated consistency validation for multiple clusters

2024-09-24 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19933:
--
  Fix Version/s: NA
Source Control Link: 
https://github.com/apache/cassandra-analytics/commit/4624a17098e055e0abf9a6025451d4352cb9c147
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> [Analytics] Support aggregated consistency validation for multiple clusters
> ---
>
> Key: CASSANDRA-19933
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19933
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This patch adds the aggregated consistency validation of multiple clusters 
> for coordinated write.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19933) [Analytics] Support aggregated consistency validation for multiple clusters

2024-09-24 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19933:
--
Status: Ready to Commit  (was: Review In Progress)

> [Analytics] Support aggregated consistency validation for multiple clusters
> ---
>
> Key: CASSANDRA-19933
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19933
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This patch adds the aggregated consistency validation of multiple clusters 
> for coordinated write.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19933) [Analytics] Support aggregated consistency validation for multiple clusters

2024-09-18 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19933:
--
Test and Documentation Plan: ci; unit
 Status: Patch Available  (was: Open)

PR: https://github.com/apache/cassandra-analytics/pull/86
CI: 
https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19933%2Fmultiple-clusters-consistency-validation

> [Analytics] Support aggregated consistency validation for multiple clusters
> ---
>
> Key: CASSANDRA-19933
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19933
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This patch adds the aggregated consistency validation of multiple clusters 
> for coordinated write.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19933) [Analytics] Support aggregated consistency validation for multiple clusters

2024-09-18 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19933:
--
Change Category: Semantic
 Complexity: Normal
Component/s: Analytics Library
 Status: Open  (was: Triage Needed)

> [Analytics] Support aggregated consistency validation for multiple clusters
> ---
>
> Key: CASSANDRA-19933
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19933
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> This patch adds the aggregated consistency validation of multiple clusters 
> for coordinated write.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19933) [Analytics] Support aggregated consistency validation for multiple clusters

2024-09-18 Thread Yifan Cai (Jira)
Yifan Cai created CASSANDRA-19933:
-

 Summary: [Analytics] Support aggregated consistency validation for 
multiple clusters
 Key: CASSANDRA-19933
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19933
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yifan Cai
Assignee: Yifan Cai


This patch adds the aggregated consistency validation of multiple clusters for 
coordinated write.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19923) [Analytics] Add transport extension for coordinated write

2024-09-18 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19923:
--
  Fix Version/s: NA
Source Control Link: 
https://github.com/apache/cassandra-analytics/commit/ff9ac41b4695c1df59f5293f69e0d3a1ce0da9f4
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> [Analytics] Add transport extension for coordinated write
> -
>
> Key: CASSANDRA-19923
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19923
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This patch introduces the CoordinatedTransportExtension and 
> CoordinationSignalListener to define the contract for the external write 
> coordinator (who implements the extension) to conduct the 2 phase write.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19923) [Analytics] Add transport extension for coordinated write

2024-09-18 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19923:
--
Status: Ready to Commit  (was: Review In Progress)

> [Analytics] Add transport extension for coordinated write
> -
>
> Key: CASSANDRA-19923
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19923
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This patch introduces the CoordinatedTransportExtension and 
> CoordinationSignalListener to define the contract for the external write 
> coordinator (who implements the extension) to conduct the 2 phase write.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19910) [Analytics] Support data partitioning for multiple clusters coordinated write

2024-09-18 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19910:
--
  Fix Version/s: NA
Source Control Link: 
https://github.com/apache/cassandra-analytics/commit/4fb1e7f47d640353cd57f7a3035c70099049b29c
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> [Analytics] Support data partitioning for multiple clusters coordinated write
> -
>
> Key: CASSANDRA-19910
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19910
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> In the coordinated write, data partitioning should consider the consolidated 
> ring topology from all write-target clusters. Thus, the produced (spark) 
> partitions do not span across multiple nodes, causing inefficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19910) [Analytics] Support data partitioning for multiple clusters coordinated write

2024-09-18 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19910:
--
Status: Ready to Commit  (was: Review In Progress)

> [Analytics] Support data partitioning for multiple clusters coordinated write
> -
>
> Key: CASSANDRA-19910
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19910
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> In the coordinated write, data partitioning should consider the consolidated 
> ring topology from all write-target clusters. Thus, the produced (spark) 
> partitions do not span across multiple nodes, causing inefficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19910) [Analytics] Support data partitioning for multiple clusters coordinated write

2024-09-18 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19910:
--
Reviewers: Doug Rohrer
   Status: Review In Progress  (was: Patch Available)

> [Analytics] Support data partitioning for multiple clusters coordinated write
> -
>
> Key: CASSANDRA-19910
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19910
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> In the coordinated write, data partitioning should consider the consolidated 
> ring topology from all write-target clusters. Thus, the produced (spark) 
> partitions do not span across multiple nodes, causing inefficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19927) [Analytics] Deprecate old compression cache and move to using cache of CompressionMetadata

2024-09-17 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882519#comment-17882519
 ] 

Yifan Cai commented on CASSANDRA-19927:
---

+1 on the patch. Thanks for addressing the comments. 

> [Analytics] Deprecate old compression cache and move to using cache of 
> CompressionMetadata
> --
>
> Key: CASSANDRA-19927
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19927
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: James Berragan
>Assignee: James Berragan
>Priority: Normal
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The compression cache currently caches a single byte array for the 
> CompressionInfo.db file, this is a problem for large files as it involves 
> allocating and garbage collecting large memory segments, but also means that 
> every consumer of the bytes will instantiate a CompressionMetadata object and 
> allocate an individual BigLongArray to store the chunk offsets. This is 
> unnecessary as the CompressionMetadata is immutable and can be re-used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19927) [Analytics] Deprecate old compression cache and move to using cache of CompressionMetadata

2024-09-17 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19927:
--
Status: Review In Progress  (was: Patch Available)

> [Analytics] Deprecate old compression cache and move to using cache of 
> CompressionMetadata
> --
>
> Key: CASSANDRA-19927
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19927
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: James Berragan
>Assignee: James Berragan
>Priority: Normal
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The compression cache currently caches a single byte array for the 
> CompressionInfo.db file, this is a problem for large files as it involves 
> allocating and garbage collecting large memory segments, but also means that 
> every consumer of the bytes will instantiate a CompressionMetadata object and 
> allocate an individual BigLongArray to store the chunk offsets. This is 
> unnecessary as the CompressionMetadata is immutable and can be re-used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19923) [Analytics] Add transport extension for coordinated write

2024-09-13 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19923:
--
Change Category: Semantic
 Complexity: Normal
 Status: Open  (was: Triage Needed)

> [Analytics] Add transport extension for coordinated write
> -
>
> Key: CASSANDRA-19923
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19923
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This patch introduces the CoordinatedTransportExtension and 
> CoordinationSignalListener to define the contract for the external write 
> coordinator (who implements the extension) to conduct the 2 phase write.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19923) [Analytics] Add transport extension for coordinated write

2024-09-13 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19923:
--
Test and Documentation Plan: ci; unit
 Status: Patch Available  (was: Open)

PR: https://github.com/apache/cassandra-analytics/pull/83
CI: 
https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19923%2Ftransport-extension-for-coordinated-write

> [Analytics] Add transport extension for coordinated write
> -
>
> Key: CASSANDRA-19923
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19923
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This patch introduces the CoordinatedTransportExtension and 
> CoordinationSignalListener to define the contract for the external write 
> coordinator (who implements the extension) to conduct the 2 phase write.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19923) [Analytics] Add transport extension for coordinated write

2024-09-13 Thread Yifan Cai (Jira)
Yifan Cai created CASSANDRA-19923:
-

 Summary: [Analytics] Add transport extension for coordinated write
 Key: CASSANDRA-19923
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19923
 Project: Cassandra
  Issue Type: New Feature
  Components: Analytics Library
Reporter: Yifan Cai
Assignee: Yifan Cai


This patch introduces the CoordinatedTransportExtension and 
CoordinationSignalListener to define the contract for the external write 
coordinator (who implements the extension) to conduct the 2 phase write.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17666) Option to disable write path during streaming for CDC enabled tables

2024-09-13 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881670#comment-17881670
 ] 

Yifan Cai commented on CASSANDRA-17666:
---

Hi [~nikolailoginov], new feature is trunk only. 4.1 is a branch in 
maintenance. It would require PMC vote to justify the backport, according to 
[this|https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle]. 
Please feel free to start a DISCUSSION thread in dev mail list. 

> Option to disable write path during streaming for CDC enabled tables
> 
>
> Key: CASSANDRA-17666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17666
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Change Data Capture
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 5.0-alpha1, 5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> For the CDC-enabled tables, a special write path is employed during 
> streaming. The mutations streamed are written into commit log first. 
> There are scenarios that the commit logs can accumulate, which lead to 
> failure of streaming and blocking writes. 
> I'd like to propose adding a dynamic toggle to disable the special write path 
> for CDC during streaming. 
> Please note that the toggle is a trade-off. Because the special write path is 
> there in the hope to ensure data consistency. Turning it off allows the 
> streaming to pass, but in some extreme scenarios, the downstream CDC 
> consumers may have holes in the stream, depending on how they consumes the 
> commit logs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19909) [Analytics] Add writer option COORDINATED_WRITE_CONFIG to define coordinated write to multiple Cassandra clusters

2024-09-11 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19909:
--
  Fix Version/s: NA
Source Control Link: 
https://github.com/apache/cassandra-analytics/commit/f123406e458c0112145f37dcd3f8c20ba47c949d
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> [Analytics] Add writer option COORDINATED_WRITE_CONFIG to define coordinated 
> write to multiple Cassandra clusters
> -
>
> Key: CASSANDRA-19909
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19909
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> As the first step of implementing coordinated write to multiple Cassandra 
> clusters, this patch introduces the new writer option, 
> COORDINATED_WRITE_CONFIG and the optional clusterId to identify clusters. The 
> COORDINATED_WRITE_CONFIG value is a json string that defines the target 
> clusters for the bulk write.
> The coordinated write feature requires the exact same table schema (not 
> including table properties) across clusters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19909) [Analytics] Add writer option COORDINATED_WRITE_CONFIG to define coordinated write to multiple Cassandra clusters

2024-09-11 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19909:
--
Status: Ready to Commit  (was: Review In Progress)

> [Analytics] Add writer option COORDINATED_WRITE_CONFIG to define coordinated 
> write to multiple Cassandra clusters
> -
>
> Key: CASSANDRA-19909
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19909
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As the first step of implementing coordinated write to multiple Cassandra 
> clusters, this patch introduces the new writer option, 
> COORDINATED_WRITE_CONFIG and the optional clusterId to identify clusters. The 
> COORDINATED_WRITE_CONFIG value is a json string that defines the target 
> clusters for the bulk write.
> The coordinated write feature requires the exact same table schema (not 
> including table properties) across clusters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19909) [Analytics] Add writer option COORDINATED_WRITE_CONFIG to define coordinated write to multiple Cassandra clusters

2024-09-11 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19909:
--
Reviewers: Doug Rohrer, Francisco Guerrero  (was: Francisco Guerrero)

> [Analytics] Add writer option COORDINATED_WRITE_CONFIG to define coordinated 
> write to multiple Cassandra clusters
> -
>
> Key: CASSANDRA-19909
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19909
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As the first step of implementing coordinated write to multiple Cassandra 
> clusters, this patch introduces the new writer option, 
> COORDINATED_WRITE_CONFIG and the optional clusterId to identify clusters. The 
> COORDINATED_WRITE_CONFIG value is a json string that defines the target 
> clusters for the bulk write.
> The coordinated write feature requires the exact same table schema (not 
> including table properties) across clusters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19815) [Analytics] Decouple Cassandra types from Spark types so Cassandra types can be used independently from Spark

2024-09-10 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880827#comment-17880827
 ] 

Yifan Cai commented on CASSANDRA-19815:
---

+1 on the patch

> [Analytics] Decouple Cassandra types from Spark types so Cassandra types can 
> be used independently from Spark
> -
>
> Key: CASSANDRA-19815
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19815
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: James Berragan
>Assignee: James Berragan
>Priority: Normal
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The Cassandra types and Spark types are tightly coupled in the same classes, 
> making it difficult to deserialize Cassandra types without pulling in Spark 
> as a dependency, We can split out the Spark types into a separate module by 
> introducing a new TypeConverter that maps Cassandra types to Spark types. 
> This enables use of the Cassandra types without pulling in Spark and also 
> opens the possibility of other TypeConverters in the future beyond Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19910) [Analytics] Support data partitioning for multiple clusters coordinated write

2024-09-10 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19910:
--
Test and Documentation Plan: ci; unit
 Status: Patch Available  (was: Open)

PR: https://github.com/apache/cassandra-analytics/pull/80
CI: 
https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19910%2Fsupport-multiple-clusters-for-data-partitioning

> [Analytics] Support data partitioning for multiple clusters coordinated write
> -
>
> Key: CASSANDRA-19910
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19910
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the coordinated write, data partitioning should consider the consolidated 
> ring topology from all write-target clusters. Thus, the produced (spark) 
> partitions do not span across multiple nodes, causing inefficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19910) [Analytics] Support data partitioning for multiple clusters coordinated write

2024-09-10 Thread Yifan Cai (Jira)
Yifan Cai created CASSANDRA-19910:
-

 Summary: [Analytics] Support data partitioning for multiple 
clusters coordinated write
 Key: CASSANDRA-19910
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19910
 Project: Cassandra
  Issue Type: New Feature
  Components: Analytics Library
Reporter: Yifan Cai
Assignee: Yifan Cai


In the coordinated write, data partitioning should consider the consolidated 
ring topology from all write-target clusters. Thus, the produced (spark) 
partitions do not span across multiple nodes, causing inefficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19910) [Analytics] Support data partitioning for multiple clusters coordinated write

2024-09-10 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19910:
--
Change Category: Semantic
 Complexity: Normal
 Status: Open  (was: Triage Needed)

> [Analytics] Support data partitioning for multiple clusters coordinated write
> -
>
> Key: CASSANDRA-19910
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19910
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> In the coordinated write, data partitioning should consider the consolidated 
> ring topology from all write-target clusters. Thus, the produced (spark) 
> partitions do not span across multiple nodes, causing inefficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19909) [Analytics] Add writer option COORDINATED_WRITE_CONFIG to define coordinated write to multiple Cassandra clusters

2024-09-10 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19909:
--
Description: 
As the first step of implementing coordinated write to multiple Cassandra 
clusters, this patch introduces the new writer option, COORDINATED_WRITE_CONFIG 
and the optional clusterId to identify clusters. The COORDINATED_WRITE_CONFIG 
value is a json string that defines the target clusters for the bulk write.

The coordinated write feature requires the exact same table schema (not 
including table properties) across clusters.

  was:
As the first step of implementing coordinated write to multiple Cassandra 
clusters, this patch introduces the new writer option, COORDINATED_WRITE_CONF 
and the optional clusterId to identify clusters. The COORDINATED_WRITE_CONF 
value is a json string that defines the target clusters for the bulk write.

The coordinated write feature requires the exact same table schema (not 
including table properties) across clusters.


> [Analytics] Add writer option COORDINATED_WRITE_CONFIG to define coordinated 
> write to multiple Cassandra clusters
> -
>
> Key: CASSANDRA-19909
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19909
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As the first step of implementing coordinated write to multiple Cassandra 
> clusters, this patch introduces the new writer option, 
> COORDINATED_WRITE_CONFIG and the optional clusterId to identify clusters. The 
> COORDINATED_WRITE_CONFIG value is a json string that defines the target 
> clusters for the bulk write.
> The coordinated write feature requires the exact same table schema (not 
> including table properties) across clusters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19815) [Analytics] Decouple Cassandra types from Spark types so Cassandra types can be used independently from Spark

2024-09-10 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19815:
--
Status: Review In Progress  (was: Patch Available)

> [Analytics] Decouple Cassandra types from Spark types so Cassandra types can 
> be used independently from Spark
> -
>
> Key: CASSANDRA-19815
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19815
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: James Berragan
>Priority: Normal
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The Cassandra types and Spark types are tightly coupled in the same classes, 
> making it difficult to deserialize Cassandra types without pulling in Spark 
> as a dependency, We can split out the Spark types into a separate module by 
> introducing a new TypeConverter that maps Cassandra types to Spark types. 
> This enables use of the Cassandra types without pulling in Spark and also 
> opens the possibility of other TypeConverters in the future beyond Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19909) [Analytics] Add writer option COORDINATED_WRITE_CONF to define coordinated write to multiple Cassandra clusters

2024-09-10 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19909:
--
Test and Documentation Plan: ci; unit test
 Status: Patch Available  (was: Open)

PR: https://github.com/apache/cassandra-analytics/pull/79
CI: 
https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19909%2Ftrunk-writer-option-for-multiple-clusters

> [Analytics] Add writer option COORDINATED_WRITE_CONF to define coordinated 
> write to multiple Cassandra clusters
> ---
>
> Key: CASSANDRA-19909
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19909
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As the first step of implementing coordinated write to multiple Cassandra 
> clusters, this patch introduces the new writer option, COORDINATED_WRITE_CONF 
> and the optional clusterId to identify clusters. The COORDINATED_WRITE_CONF 
> value is a json string that defines the target clusters for the bulk write.
> The coordinated write feature requires the exact same table schema (not 
> including table properties) across clusters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19909) [Analytics] Add writer option COORDINATED_WRITE_CONF to define coordinated write to multiple Cassandra clusters

2024-09-10 Thread Yifan Cai (Jira)
Yifan Cai created CASSANDRA-19909:
-

 Summary: [Analytics] Add writer option COORDINATED_WRITE_CONF to 
define coordinated write to multiple Cassandra clusters
 Key: CASSANDRA-19909
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19909
 Project: Cassandra
  Issue Type: New Feature
  Components: Analytics Library
Reporter: Yifan Cai
Assignee: Yifan Cai


As the first step of implementing coordinated write to multiple Cassandra 
clusters, this patch introduces the new writer option, COORDINATED_WRITE_CONF 
and the optional clusterId to identify clusters. The COORDINATED_WRITE_CONF 
value is a json string that defines the target clusters for the bulk write.

The coordinated write feature requires the exact same table schema (not 
including table properties) across clusters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19901) [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings

2024-09-06 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19901:
--
  Fix Version/s: NA
Source Control Link: 
https://github.com/apache/cassandra-analytics/commit/8655ca54a5d0749fccb2ad6a06ec230e8b0de24e
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings
> -
>
> Key: CASSANDRA-19901
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19901
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Proposing the refactoring of TokenRangeMapping and the related classes to use 
> proper types instead of String to improve maintainability. As of now, String 
> are used to represent IP, IP with port, node name, etc. It is difficult to 
> distinguish the actual types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19901) [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings

2024-09-06 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19901:
--
Status: Ready to Commit  (was: Review In Progress)

> [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings
> -
>
> Key: CASSANDRA-19901
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19901
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Proposing the refactoring of TokenRangeMapping and the related classes to use 
> proper types instead of String to improve maintainability. As of now, String 
> are used to represent IP, IP with port, node name, etc. It is difficult to 
> distinguish the actual types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19901) [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings

2024-09-05 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19901:
--
Test and Documentation Plan: ci
 Status: Patch Available  (was: Open)

PR: https://github.com/apache/cassandra-analytics/pull/78
CI: 
https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19901%2Ftrunk-refactor-token-range-mapping

> [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings
> -
>
> Key: CASSANDRA-19901
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19901
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Proposing the refactoring of TokenRangeMapping and the related classes to use 
> proper types instead of String to improve maintainability. As of now, String 
> are used to represent IP, IP with port, node name, etc. It is difficult to 
> distinguish the actual types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19901) [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings

2024-09-05 Thread Yifan Cai (Jira)
Yifan Cai created CASSANDRA-19901:
-

 Summary: [Analytics] Refactor TokenRangeMapping to use proper 
types instead of Strings
 Key: CASSANDRA-19901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19901
 Project: Cassandra
  Issue Type: Task
  Components: Analytics Library
Reporter: Yifan Cai
Assignee: Yifan Cai


Proposing the refactoring of TokenRangeMapping and the related classes to use 
proper types instead of String to improve maintainability. As of now, String 
are used to represent IP, IP with port, node name, etc. It is difficult to 
distinguish the actual types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19901) [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings

2024-09-05 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19901:
--
Issue Type: Improvement  (was: Task)

> [Analytics] Refactor TokenRangeMapping to use proper types instead of Strings
> -
>
> Key: CASSANDRA-19901
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19901
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> Proposing the refactoring of TokenRangeMapping and the related classes to use 
> proper types instead of String to improve maintainability. As of now, String 
> are used to represent IP, IP with port, node name, etc. It is difficult to 
> distinguish the actual types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19873) [Analytics] Removes checks for blocked instances from bulk-write path

2024-09-05 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879680#comment-17879680
 ] 

Yifan Cai commented on CASSANDRA-19873:
---

+1

> [Analytics] Removes checks for blocked instances from bulk-write path
> -
>
> Key: CASSANDRA-19873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19873
> Project: Cassandra
>  Issue Type: Task
>  Components: Analytics Library
>Reporter: Arjun Ashok
>Assignee: Arjun Ashok
>Priority: Normal
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The analytics bulk writer currently performs checks for blocked instances for 
> consistency-level validations prior-to and after the bulk-write. It also 
> takes all the blocked nodes into account for these validations instead of the 
> nodes in the specific range being written (addressed separately under 
> https://issues.apache.org/jira/browse/CASSANDRA-19842).
>  
> This change removes the notion of blocked instances for bulk-writes, treating 
> such nodes as available, as the intended usage of "blocked" nodes is to 
> operationally prevent client CQL connections going into the node, but not 
> writes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19873) [Analytics] Removes checks for blocked instances from bulk-write path

2024-09-05 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19873:
--
Reviewers: Yifan Cai
   Status: Review In Progress  (was: Patch Available)

> [Analytics] Removes checks for blocked instances from bulk-write path
> -
>
> Key: CASSANDRA-19873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19873
> Project: Cassandra
>  Issue Type: Task
>  Components: Analytics Library
>Reporter: Arjun Ashok
>Assignee: Arjun Ashok
>Priority: Normal
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The analytics bulk writer currently performs checks for blocked instances for 
> consistency-level validations prior-to and after the bulk-write. It also 
> takes all the blocked nodes into account for these validations instead of the 
> nodes in the specific range being written (addressed separately under 
> https://issues.apache.org/jira/browse/CASSANDRA-19842).
>  
> This change removes the notion of blocked instances for bulk-writes, treating 
> such nodes as available, as the intended usage of "blocked" nodes is to 
> operationally prevent client CQL connections going into the node, but not 
> writes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19842) [Analytics] Consistency level check incorrectly passes when majority of the replica set is unavailable for write

2024-08-30 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19842:
--
  Fix Version/s: NA
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra-analytics/commit/cfe293dadcf7a1d4491591cfd39fc410a8fa52ba
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> [Analytics] Consistency level check incorrectly passes when majority of the 
> replica set is unavailable for write
> 
>
> Key: CASSANDRA-19842
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19842
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Consistency level check is performed before proceeding to bulk writing data. 
> The check yields wrong results that when the majority of a replica set is 
> unavailable, it still passes. Leading to writing data to replicas that cannot 
> satisfy the desired consistency level. 
> The following is the test to prove the bug. The test sets all 3 instances in 
> the replica set as blocked (unavailable), so the validation is expected to 
> throw. But it does not. 
> {code:java}
> @Test
> void test()
> {
> BulkWriterContext mockWriterContext = mock(BulkWriterContext.class);
> ClusterInfo mockClusterInfo = mock(ClusterInfo.class);
> when(mockWriterContext.cluster()).thenReturn(mockClusterInfo);
> CassandraContext mockCassandraContext = mock(CassandraContext.class);
> 
> when(mockClusterInfo.getCassandraContext()).thenReturn(mockCassandraContext);
> Map replicationOptions = new HashMap<>();
> replicationOptions.put("class", "SimpleStrategy");
> replicationOptions.put("replication_factor", "3");
> TokenRangeMapping topology = 
> CassandraClusterInfo.getTokenRangeReplicas(() -> 
> mockSimpleTokenRangeReplicasResponse(10, 3),
>   
> () -> Partitioner.Murmur3Partitioner,
>   
> () -> new ReplicationFactor(replicationOptions),
>   
> ringInstance -> {
>   
> int nodeId = 
> Integer.parseInt(ringInstance.ipAddress().replace("localhost", ""));
>   
> return nodeId <= 2; // block nodes 0, 1, 2
>   
> });
> 
> when(mockClusterInfo.getTokenRangeMapping(anyBoolean())).thenReturn(topology);
> JobInfo mockJobInfo = mock(JobInfo.class);
> UUID jobId = UUID.randomUUID();
> when(mockJobInfo.getId()).thenReturn(jobId.toString());
> when(mockJobInfo.getRestoreJobId()).thenReturn(jobId);
> when(mockJobInfo.qualifiedTableName()).thenReturn(new 
> QualifiedTableName("testkeyspace", "testtable"));
> 
> when(mockJobInfo.getConsistencyLevel()).thenReturn(ConsistencyLevel.CL.QUORUM);
> when(mockJobInfo.effectiveSidecarPort()).thenReturn(9043);
> when(mockJobInfo.jobKeepAliveMinutes()).thenReturn(-1);
> when(mockWriterContext.job()).thenReturn(mockJobInfo);
> BulkWriteValidator writerValidator = new 
> BulkWriteValidator(mockWriterContext, new 
> ReplicaAwareFailureHandler<>(Partitioner.Murmur3Partitioner));
> assertThatThrownBy(() -> writerValidator.validateClOrFail(topology))
> .isExactlyInstanceOf(RuntimeException.class)
> .hasMessageContaining("Failed to load");
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19842) [Analytics] Consistency level check incorrectly passes when majority of the replica set is unavailable for write

2024-08-30 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19842:
--
Status: Ready to Commit  (was: Review In Progress)

> [Analytics] Consistency level check incorrectly passes when majority of the 
> replica set is unavailable for write
> 
>
> Key: CASSANDRA-19842
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19842
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Consistency level check is performed before proceeding to bulk writing data. 
> The check yields wrong results that when the majority of a replica set is 
> unavailable, it still passes. Leading to writing data to replicas that cannot 
> satisfy the desired consistency level. 
> The following is the test to prove the bug. The test sets all 3 instances in 
> the replica set as blocked (unavailable), so the validation is expected to 
> throw. But it does not. 
> {code:java}
> @Test
> void test()
> {
> BulkWriterContext mockWriterContext = mock(BulkWriterContext.class);
> ClusterInfo mockClusterInfo = mock(ClusterInfo.class);
> when(mockWriterContext.cluster()).thenReturn(mockClusterInfo);
> CassandraContext mockCassandraContext = mock(CassandraContext.class);
> 
> when(mockClusterInfo.getCassandraContext()).thenReturn(mockCassandraContext);
> Map replicationOptions = new HashMap<>();
> replicationOptions.put("class", "SimpleStrategy");
> replicationOptions.put("replication_factor", "3");
> TokenRangeMapping topology = 
> CassandraClusterInfo.getTokenRangeReplicas(() -> 
> mockSimpleTokenRangeReplicasResponse(10, 3),
>   
> () -> Partitioner.Murmur3Partitioner,
>   
> () -> new ReplicationFactor(replicationOptions),
>   
> ringInstance -> {
>   
> int nodeId = 
> Integer.parseInt(ringInstance.ipAddress().replace("localhost", ""));
>   
> return nodeId <= 2; // block nodes 0, 1, 2
>   
> });
> 
> when(mockClusterInfo.getTokenRangeMapping(anyBoolean())).thenReturn(topology);
> JobInfo mockJobInfo = mock(JobInfo.class);
> UUID jobId = UUID.randomUUID();
> when(mockJobInfo.getId()).thenReturn(jobId.toString());
> when(mockJobInfo.getRestoreJobId()).thenReturn(jobId);
> when(mockJobInfo.qualifiedTableName()).thenReturn(new 
> QualifiedTableName("testkeyspace", "testtable"));
> 
> when(mockJobInfo.getConsistencyLevel()).thenReturn(ConsistencyLevel.CL.QUORUM);
> when(mockJobInfo.effectiveSidecarPort()).thenReturn(9043);
> when(mockJobInfo.jobKeepAliveMinutes()).thenReturn(-1);
> when(mockWriterContext.job()).thenReturn(mockJobInfo);
> BulkWriteValidator writerValidator = new 
> BulkWriteValidator(mockWriterContext, new 
> ReplicaAwareFailureHandler<>(Partitioner.Murmur3Partitioner));
> assertThatThrownBy(() -> writerValidator.validateClOrFail(topology))
> .isExactlyInstanceOf(RuntimeException.class)
> .hasMessageContaining("Failed to load");
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19842) [Analytics] Consistency level check incorrectly passes when majority of the replica set is unavailable for write

2024-08-30 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19842:
--
Reviewers: Doug Rohrer, Francisco Guerrero  (was: Doug Rohrer, Francisco 
Guerrero)

> [Analytics] Consistency level check incorrectly passes when majority of the 
> replica set is unavailable for write
> 
>
> Key: CASSANDRA-19842
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19842
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Consistency level check is performed before proceeding to bulk writing data. 
> The check yields wrong results that when the majority of a replica set is 
> unavailable, it still passes. Leading to writing data to replicas that cannot 
> satisfy the desired consistency level. 
> The following is the test to prove the bug. The test sets all 3 instances in 
> the replica set as blocked (unavailable), so the validation is expected to 
> throw. But it does not. 
> {code:java}
> @Test
> void test()
> {
> BulkWriterContext mockWriterContext = mock(BulkWriterContext.class);
> ClusterInfo mockClusterInfo = mock(ClusterInfo.class);
> when(mockWriterContext.cluster()).thenReturn(mockClusterInfo);
> CassandraContext mockCassandraContext = mock(CassandraContext.class);
> 
> when(mockClusterInfo.getCassandraContext()).thenReturn(mockCassandraContext);
> Map replicationOptions = new HashMap<>();
> replicationOptions.put("class", "SimpleStrategy");
> replicationOptions.put("replication_factor", "3");
> TokenRangeMapping topology = 
> CassandraClusterInfo.getTokenRangeReplicas(() -> 
> mockSimpleTokenRangeReplicasResponse(10, 3),
>   
> () -> Partitioner.Murmur3Partitioner,
>   
> () -> new ReplicationFactor(replicationOptions),
>   
> ringInstance -> {
>   
> int nodeId = 
> Integer.parseInt(ringInstance.ipAddress().replace("localhost", ""));
>   
> return nodeId <= 2; // block nodes 0, 1, 2
>   
> });
> 
> when(mockClusterInfo.getTokenRangeMapping(anyBoolean())).thenReturn(topology);
> JobInfo mockJobInfo = mock(JobInfo.class);
> UUID jobId = UUID.randomUUID();
> when(mockJobInfo.getId()).thenReturn(jobId.toString());
> when(mockJobInfo.getRestoreJobId()).thenReturn(jobId);
> when(mockJobInfo.qualifiedTableName()).thenReturn(new 
> QualifiedTableName("testkeyspace", "testtable"));
> 
> when(mockJobInfo.getConsistencyLevel()).thenReturn(ConsistencyLevel.CL.QUORUM);
> when(mockJobInfo.effectiveSidecarPort()).thenReturn(9043);
> when(mockJobInfo.jobKeepAliveMinutes()).thenReturn(-1);
> when(mockWriterContext.job()).thenReturn(mockJobInfo);
> BulkWriteValidator writerValidator = new 
> BulkWriteValidator(mockWriterContext, new 
> ReplicaAwareFailureHandler<>(Partitioner.Murmur3Partitioner));
> assertThatThrownBy(() -> writerValidator.validateClOrFail(topology))
> .isExactlyInstanceOf(RuntimeException.class)
> .hasMessageContaining("Failed to load");
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19842) [Analytics] Consistency level check incorrectly passes when majority of the replica set is unavailable for write

2024-08-30 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19842:
--
Reviewers: Doug Rohrer, Francisco Guerrero  (was: Francisco Guerrero)

> [Analytics] Consistency level check incorrectly passes when majority of the 
> replica set is unavailable for write
> 
>
> Key: CASSANDRA-19842
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19842
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Consistency level check is performed before proceeding to bulk writing data. 
> The check yields wrong results that when the majority of a replica set is 
> unavailable, it still passes. Leading to writing data to replicas that cannot 
> satisfy the desired consistency level. 
> The following is the test to prove the bug. The test sets all 3 instances in 
> the replica set as blocked (unavailable), so the validation is expected to 
> throw. But it does not. 
> {code:java}
> @Test
> void test()
> {
> BulkWriterContext mockWriterContext = mock(BulkWriterContext.class);
> ClusterInfo mockClusterInfo = mock(ClusterInfo.class);
> when(mockWriterContext.cluster()).thenReturn(mockClusterInfo);
> CassandraContext mockCassandraContext = mock(CassandraContext.class);
> 
> when(mockClusterInfo.getCassandraContext()).thenReturn(mockCassandraContext);
> Map replicationOptions = new HashMap<>();
> replicationOptions.put("class", "SimpleStrategy");
> replicationOptions.put("replication_factor", "3");
> TokenRangeMapping topology = 
> CassandraClusterInfo.getTokenRangeReplicas(() -> 
> mockSimpleTokenRangeReplicasResponse(10, 3),
>   
> () -> Partitioner.Murmur3Partitioner,
>   
> () -> new ReplicationFactor(replicationOptions),
>   
> ringInstance -> {
>   
> int nodeId = 
> Integer.parseInt(ringInstance.ipAddress().replace("localhost", ""));
>   
> return nodeId <= 2; // block nodes 0, 1, 2
>   
> });
> 
> when(mockClusterInfo.getTokenRangeMapping(anyBoolean())).thenReturn(topology);
> JobInfo mockJobInfo = mock(JobInfo.class);
> UUID jobId = UUID.randomUUID();
> when(mockJobInfo.getId()).thenReturn(jobId.toString());
> when(mockJobInfo.getRestoreJobId()).thenReturn(jobId);
> when(mockJobInfo.qualifiedTableName()).thenReturn(new 
> QualifiedTableName("testkeyspace", "testtable"));
> 
> when(mockJobInfo.getConsistencyLevel()).thenReturn(ConsistencyLevel.CL.QUORUM);
> when(mockJobInfo.effectiveSidecarPort()).thenReturn(9043);
> when(mockJobInfo.jobKeepAliveMinutes()).thenReturn(-1);
> when(mockWriterContext.job()).thenReturn(mockJobInfo);
> BulkWriteValidator writerValidator = new 
> BulkWriteValidator(mockWriterContext, new 
> ReplicaAwareFailureHandler<>(Partitioner.Murmur3Partitioner));
> assertThatThrownBy(() -> writerValidator.validateClOrFail(topology))
> .isExactlyInstanceOf(RuntimeException.class)
> .hasMessageContaining("Failed to load");
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRASC-143) Enable github squash in asf.yaml

2024-08-26 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRASC-143:
--
  Fix Version/s: 1.0
Source Control Link: 
https://github.com/apache/cassandra-sidecar/commit/f07e248d0ce8303a06daf93b462190ef7be7304d
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Enable github squash in asf.yaml
> 
>
> Key: CASSANDRASC-143
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-143
> Project: Sidecar for Apache Cassandra
>  Issue Type: Task
>  Components: Configuration
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 1.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CASSANDRA-19854 added the asf.yaml that disabled "Squash and Merge" option. 
> It is the option that has been used in the Cassandra Sidecar project. I had a 
> discussion with Mick and Stefan, and we agreed on enable the squash option. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRASC-143) Enable github squash in asf.yaml

2024-08-26 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRASC-143:
--
Reviewers: Francisco Guerrero, Stefan Miklosovic  (was: Francisco Guerrero)

> Enable github squash in asf.yaml
> 
>
> Key: CASSANDRASC-143
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-143
> Project: Sidecar for Apache Cassandra
>  Issue Type: Task
>  Components: Configuration
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CASSANDRA-19854 added the asf.yaml that disabled "Squash and Merge" option. 
> It is the option that has been used in the Cassandra Sidecar project. I had a 
> discussion with Mick and Stefan, and we agreed on enable the squash option. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRASC-143) Enable github squash in asf.yaml

2024-08-26 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRASC-143:
--
Status: Ready to Commit  (was: Review In Progress)

> Enable github squash in asf.yaml
> 
>
> Key: CASSANDRASC-143
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-143
> Project: Sidecar for Apache Cassandra
>  Issue Type: Task
>  Components: Configuration
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CASSANDRA-19854 added the asf.yaml that disabled "Squash and Merge" option. 
> It is the option that has been used in the Cassandra Sidecar project. I had a 
> discussion with Mick and Stefan, and we agreed on enable the squash option. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRASC-143) Enable github squash in asf.yaml

2024-08-23 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRASC-143:
--
Authors: Yifan Cai
Test and Documentation Plan: no test for GitHub configuration
 Status: Patch Available  (was: Open)

PR: https://github.com/apache/cassandra-sidecar/pull/134

> Enable github squash in asf.yaml
> 
>
> Key: CASSANDRASC-143
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-143
> Project: Sidecar for Apache Cassandra
>  Issue Type: Task
>  Components: Configuration
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CASSANDRA-19854 added the asf.yaml that disabled "Squash and Merge" option. 
> It is the option that has been used in the Cassandra Sidecar project. I had a 
> discussion with Mick and Stefan, and we agreed on enable the squash option. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRASC-143) Enable github squash in asf.yaml

2024-08-23 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRASC-143:
--
Change Category: Operability
 Complexity: Low Hanging Fruit
Component/s: Configuration
 Status: Open  (was: Triage Needed)

> Enable github squash in asf.yaml
> 
>
> Key: CASSANDRASC-143
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-143
> Project: Sidecar for Apache Cassandra
>  Issue Type: Task
>  Components: Configuration
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CASSANDRA-19854 added the asf.yaml that disabled "Squash and Merge" option. 
> It is the option that has been used in the Cassandra Sidecar project. I had a 
> discussion with Mick and Stefan, and we agreed on enable the squash option. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRASC-143) Enable github squash in asf.yaml

2024-08-23 Thread Yifan Cai (Jira)
Yifan Cai created CASSANDRASC-143:
-

 Summary: Enable github squash in asf.yaml
 Key: CASSANDRASC-143
 URL: https://issues.apache.org/jira/browse/CASSANDRASC-143
 Project: Sidecar for Apache Cassandra
  Issue Type: Task
Reporter: Yifan Cai
Assignee: Yifan Cai


CASSANDRA-19854 added the asf.yaml that disabled "Squash and Merge" option. It 
is the option that has been used in the Cassandra Sidecar project. I had a 
discussion with Mick and Stefan, and we agreed on enable the squash option. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRASC-142) Improve S3 download throttling with range-GetObject

2024-08-23 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRASC-142:
--
  Fix Version/s: 1.0
Source Control Link: 
https://github.com/apache/cassandra-sidecar/commit/4601e28529996a3447e74093cc6cc35879143031
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Improve S3 download throttling with range-GetObject
> ---
>
> Key: CASSANDRASC-142
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-142
> Project: Sidecar for Apache Cassandra
>  Issue Type: Improvement
>  Components: Rest API
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 1.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The current s3 download throttling in sidecar is implemented by blocking the 
> streaming consumption. The block happens in Netty event loop threads. The 
> blocking prolongs each connection, leading to connection reset or suboptimal 
> concurrency.
> This patch changes the throttling mechanism to be range-GetObject based. Each 
> request retrieves a data range of the object once permitted by rate limiter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRASC-142) Improve S3 download throttling with range-GetObject

2024-08-23 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRASC-142:
--
Reviewers: Doug Rohrer, Saranya Krishnakumar  (was: Saranya Krishnakumar)

> Improve S3 download throttling with range-GetObject
> ---
>
> Key: CASSANDRASC-142
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-142
> Project: Sidecar for Apache Cassandra
>  Issue Type: Improvement
>  Components: Rest API
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The current s3 download throttling in sidecar is implemented by blocking the 
> streaming consumption. The block happens in Netty event loop threads. The 
> blocking prolongs each connection, leading to connection reset or suboptimal 
> concurrency.
> This patch changes the throttling mechanism to be range-GetObject based. Each 
> request retrieves a data range of the object once permitted by rate limiter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRASC-142) Improve S3 download throttling with range-GetObject

2024-08-23 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRASC-142:
--
Status: Ready to Commit  (was: Review In Progress)

> Improve S3 download throttling with range-GetObject
> ---
>
> Key: CASSANDRASC-142
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-142
> Project: Sidecar for Apache Cassandra
>  Issue Type: Improvement
>  Components: Rest API
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The current s3 download throttling in sidecar is implemented by blocking the 
> streaming consumption. The block happens in Netty event loop threads. The 
> blocking prolongs each connection, leading to connection reset or suboptimal 
> concurrency.
> This patch changes the throttling mechanism to be range-GetObject based. Each 
> request retrieves a data range of the object once permitted by rate limiter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19842) [Analytics] Consistency level check incorrectly passes when majority of the replica set is unavailable for write

2024-08-21 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19842:
--
Test and Documentation Plan: ci; unit
 Status: Patch Available  (was: Open)

PR: https://github.com/apache/cassandra-analytics/pull/75
CI: 
https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19842%2Ftrunk

> [Analytics] Consistency level check incorrectly passes when majority of the 
> replica set is unavailable for write
> 
>
> Key: CASSANDRA-19842
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19842
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Consistency level check is performed before proceeding to bulk writing data. 
> The check yields wrong results that when the majority of a replica set is 
> unavailable, it still passes. Leading to writing data to replicas that cannot 
> satisfy the desired consistency level. 
> The following is the test to prove the bug. The test sets all 3 instances in 
> the replica set as blocked (unavailable), so the validation is expected to 
> throw. But it does not. 
> {code:java}
> @Test
> void test()
> {
> BulkWriterContext mockWriterContext = mock(BulkWriterContext.class);
> ClusterInfo mockClusterInfo = mock(ClusterInfo.class);
> when(mockWriterContext.cluster()).thenReturn(mockClusterInfo);
> CassandraContext mockCassandraContext = mock(CassandraContext.class);
> 
> when(mockClusterInfo.getCassandraContext()).thenReturn(mockCassandraContext);
> Map replicationOptions = new HashMap<>();
> replicationOptions.put("class", "SimpleStrategy");
> replicationOptions.put("replication_factor", "3");
> TokenRangeMapping topology = 
> CassandraClusterInfo.getTokenRangeReplicas(() -> 
> mockSimpleTokenRangeReplicasResponse(10, 3),
>   
> () -> Partitioner.Murmur3Partitioner,
>   
> () -> new ReplicationFactor(replicationOptions),
>   
> ringInstance -> {
>   
> int nodeId = 
> Integer.parseInt(ringInstance.ipAddress().replace("localhost", ""));
>   
> return nodeId <= 2; // block nodes 0, 1, 2
>   
> });
> 
> when(mockClusterInfo.getTokenRangeMapping(anyBoolean())).thenReturn(topology);
> JobInfo mockJobInfo = mock(JobInfo.class);
> UUID jobId = UUID.randomUUID();
> when(mockJobInfo.getId()).thenReturn(jobId.toString());
> when(mockJobInfo.getRestoreJobId()).thenReturn(jobId);
> when(mockJobInfo.qualifiedTableName()).thenReturn(new 
> QualifiedTableName("testkeyspace", "testtable"));
> 
> when(mockJobInfo.getConsistencyLevel()).thenReturn(ConsistencyLevel.CL.QUORUM);
> when(mockJobInfo.effectiveSidecarPort()).thenReturn(9043);
> when(mockJobInfo.jobKeepAliveMinutes()).thenReturn(-1);
> when(mockWriterContext.job()).thenReturn(mockJobInfo);
> BulkWriteValidator writerValidator = new 
> BulkWriteValidator(mockWriterContext, new 
> ReplicaAwareFailureHandler<>(Partitioner.Murmur3Partitioner));
> assertThatThrownBy(() -> writerValidator.validateClOrFail(topology))
> .isExactlyInstanceOf(RuntimeException.class)
> .hasMessageContaining("Failed to load");
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19836) [Analytics] Fix NPE when writing UDT values

2024-08-20 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19836:
--
  Fix Version/s: NA
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra-analytics/commit/555e8494d3ca27a7b35aebabb1f669eede20cc53
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> [Analytics] Fix NPE when writing UDT values
> ---
>
> Key: CASSANDRA-19836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19836
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
> Fix For: NA
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When UDT field values are set to null, the bulk writer throws NPE, e.g. the 
> stacktrace below. Although it is on the boolean type, the NPE can be thrown 
> on all other types whenever the value is null.
> {code:java}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.cassandra.spark.data.types.Boolean.setInnerValue(Boolean.java:91)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.setInnerValue(CqlUdt.java:534)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.toUserTypeValue(CqlUdt.java:522)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.convertForCqlWriter(CqlUdt.java:169)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.maybeConvertUdt(RecordWriter.java:450)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.getBindValuesForColumns(RecordWriter.java:432)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.writeRow(RecordWriter.java:415)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.write(RecordWriter.java:202)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19836) [Analytics] Fix NPE when writing UDT values

2024-08-20 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19836:
--
Reviewers: Dinesh Joshi, Doug Rohrer  (was: Doug Rohrer, Yifan Cai)

> [Analytics] Fix NPE when writing UDT values
> ---
>
> Key: CASSANDRA-19836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19836
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When UDT field values are set to null, the bulk writer throws NPE, e.g. the 
> stacktrace below. Although it is on the boolean type, the NPE can be thrown 
> on all other types whenever the value is null.
> {code:java}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.cassandra.spark.data.types.Boolean.setInnerValue(Boolean.java:91)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.setInnerValue(CqlUdt.java:534)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.toUserTypeValue(CqlUdt.java:522)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.convertForCqlWriter(CqlUdt.java:169)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.maybeConvertUdt(RecordWriter.java:450)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.getBindValuesForColumns(RecordWriter.java:432)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.writeRow(RecordWriter.java:415)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.write(RecordWriter.java:202)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19836) [Analytics] Fix NPE when writing UDT values

2024-08-20 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19836:
--
Status: Ready to Commit  (was: Review In Progress)

> [Analytics] Fix NPE when writing UDT values
> ---
>
> Key: CASSANDRA-19836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19836
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When UDT field values are set to null, the bulk writer throws NPE, e.g. the 
> stacktrace below. Although it is on the boolean type, the NPE can be thrown 
> on all other types whenever the value is null.
> {code:java}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.cassandra.spark.data.types.Boolean.setInnerValue(Boolean.java:91)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.setInnerValue(CqlUdt.java:534)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.toUserTypeValue(CqlUdt.java:522)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.convertForCqlWriter(CqlUdt.java:169)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.maybeConvertUdt(RecordWriter.java:450)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.getBindValuesForColumns(RecordWriter.java:432)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.writeRow(RecordWriter.java:415)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.write(RecordWriter.java:202)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19836) [Analytics] Fix NPE when writing UDT values

2024-08-20 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19836:
--
Reviewers: Doug Rohrer, Yifan Cai
   Status: Review In Progress  (was: Patch Available)

> [Analytics] Fix NPE when writing UDT values
> ---
>
> Key: CASSANDRA-19836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19836
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When UDT field values are set to null, the bulk writer throws NPE, e.g. the 
> stacktrace below. Although it is on the boolean type, the NPE can be thrown 
> on all other types whenever the value is null.
> {code:java}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.cassandra.spark.data.types.Boolean.setInnerValue(Boolean.java:91)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.setInnerValue(CqlUdt.java:534)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.toUserTypeValue(CqlUdt.java:522)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.convertForCqlWriter(CqlUdt.java:169)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.maybeConvertUdt(RecordWriter.java:450)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.getBindValuesForColumns(RecordWriter.java:432)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.writeRow(RecordWriter.java:415)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.write(RecordWriter.java:202)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19842) [Analytics] Consistency level check incorrectly passes when majority of the replica set is unavailable for write

2024-08-20 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19842:
--
 Bug Category: Parent values: Correctness(12982)Level 1 values: 
Consistency(12989)
   Complexity: Normal
Discovered By: Code Inspection
 Severity: Critical
   Status: Open  (was: Triage Needed)

> [Analytics] Consistency level check incorrectly passes when majority of the 
> replica set is unavailable for write
> 
>
> Key: CASSANDRA-19842
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19842
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> Consistency level check is performed before proceeding to bulk writing data. 
> The check yields wrong results that when the majority of a replica set is 
> unavailable, it still passes. Leading to writing data to replicas that cannot 
> satisfy the desired consistency level. 
> The following is the test to prove the bug. The test sets all 3 instances in 
> the replica set as blocked (unavailable), so the validation is expected to 
> throw. But it does not. 
> {code:java}
> @Test
> void test()
> {
> BulkWriterContext mockWriterContext = mock(BulkWriterContext.class);
> ClusterInfo mockClusterInfo = mock(ClusterInfo.class);
> when(mockWriterContext.cluster()).thenReturn(mockClusterInfo);
> CassandraContext mockCassandraContext = mock(CassandraContext.class);
> 
> when(mockClusterInfo.getCassandraContext()).thenReturn(mockCassandraContext);
> Map replicationOptions = new HashMap<>();
> replicationOptions.put("class", "SimpleStrategy");
> replicationOptions.put("replication_factor", "3");
> TokenRangeMapping topology = 
> CassandraClusterInfo.getTokenRangeReplicas(() -> 
> mockSimpleTokenRangeReplicasResponse(10, 3),
>   
> () -> Partitioner.Murmur3Partitioner,
>   
> () -> new ReplicationFactor(replicationOptions),
>   
> ringInstance -> {
>   
> int nodeId = 
> Integer.parseInt(ringInstance.ipAddress().replace("localhost", ""));
>   
> return nodeId <= 2; // block nodes 0, 1, 2
>   
> });
> 
> when(mockClusterInfo.getTokenRangeMapping(anyBoolean())).thenReturn(topology);
> JobInfo mockJobInfo = mock(JobInfo.class);
> UUID jobId = UUID.randomUUID();
> when(mockJobInfo.getId()).thenReturn(jobId.toString());
> when(mockJobInfo.getRestoreJobId()).thenReturn(jobId);
> when(mockJobInfo.qualifiedTableName()).thenReturn(new 
> QualifiedTableName("testkeyspace", "testtable"));
> 
> when(mockJobInfo.getConsistencyLevel()).thenReturn(ConsistencyLevel.CL.QUORUM);
> when(mockJobInfo.effectiveSidecarPort()).thenReturn(9043);
> when(mockJobInfo.jobKeepAliveMinutes()).thenReturn(-1);
> when(mockWriterContext.job()).thenReturn(mockJobInfo);
> BulkWriteValidator writerValidator = new 
> BulkWriteValidator(mockWriterContext, new 
> ReplicaAwareFailureHandler<>(Partitioner.Murmur3Partitioner));
> assertThatThrownBy(() -> writerValidator.validateClOrFail(topology))
> .isExactlyInstanceOf(RuntimeException.class)
> .hasMessageContaining("Failed to load");
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19842) [Analytics] Consistency level check incorrectly passes when majority of the replica set is unavailable for write

2024-08-20 Thread Yifan Cai (Jira)
Yifan Cai created CASSANDRA-19842:
-

 Summary: [Analytics] Consistency level check incorrectly passes 
when majority of the replica set is unavailable for write
 Key: CASSANDRA-19842
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19842
 Project: Cassandra
  Issue Type: Bug
  Components: Analytics Library
Reporter: Yifan Cai
Assignee: Yifan Cai


Consistency level check is performed before proceeding to bulk writing data. 
The check yields wrong results that when the majority of a replica set is 
unavailable, it still passes. Leading to writing data to replicas that cannot 
satisfy the desired consistency level. 

The following is the test to prove the bug. The test sets all 3 instances in 
the replica set as blocked (unavailable), so the validation is expected to 
throw. But it does not. 

{code:java}
@Test
void test()
{
BulkWriterContext mockWriterContext = mock(BulkWriterContext.class);
ClusterInfo mockClusterInfo = mock(ClusterInfo.class);
when(mockWriterContext.cluster()).thenReturn(mockClusterInfo);

CassandraContext mockCassandraContext = mock(CassandraContext.class);

when(mockClusterInfo.getCassandraContext()).thenReturn(mockCassandraContext);
Map replicationOptions = new HashMap<>();
replicationOptions.put("class", "SimpleStrategy");
replicationOptions.put("replication_factor", "3");
TokenRangeMapping topology = 
CassandraClusterInfo.getTokenRangeReplicas(() -> 
mockSimpleTokenRangeReplicasResponse(10, 3),

  () -> Partitioner.Murmur3Partitioner,

  () -> new ReplicationFactor(replicationOptions),

  ringInstance -> {

  int nodeId = 
Integer.parseInt(ringInstance.ipAddress().replace("localhost", ""));

  return nodeId <= 2; // block nodes 0, 1, 2

  });

when(mockClusterInfo.getTokenRangeMapping(anyBoolean())).thenReturn(topology);

JobInfo mockJobInfo = mock(JobInfo.class);
UUID jobId = UUID.randomUUID();
when(mockJobInfo.getId()).thenReturn(jobId.toString());
when(mockJobInfo.getRestoreJobId()).thenReturn(jobId);
when(mockJobInfo.qualifiedTableName()).thenReturn(new 
QualifiedTableName("testkeyspace", "testtable"));

when(mockJobInfo.getConsistencyLevel()).thenReturn(ConsistencyLevel.CL.QUORUM);
when(mockJobInfo.effectiveSidecarPort()).thenReturn(9043);
when(mockJobInfo.jobKeepAliveMinutes()).thenReturn(-1);
when(mockWriterContext.job()).thenReturn(mockJobInfo);

BulkWriteValidator writerValidator = new 
BulkWriteValidator(mockWriterContext, new 
ReplicaAwareFailureHandler<>(Partitioner.Murmur3Partitioner));
assertThatThrownBy(() -> writerValidator.validateClOrFail(topology))
.isExactlyInstanceOf(RuntimeException.class)
.hasMessageContaining("Failed to load");
}
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRASC-142) Improve S3 download throttling with range-GetObject

2024-08-19 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRASC-142:
--
Summary: Improve S3 download throttling with range-GetObject  (was: 
[Sidecar] Improve S3 download throttling with range-GetObject)

> Improve S3 download throttling with range-GetObject
> ---
>
> Key: CASSANDRASC-142
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-142
> Project: Sidecar for Apache Cassandra
>  Issue Type: Improvement
>  Components: Rest API
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
>
> The current s3 download throttling in sidecar is implemented by blocking the 
> streaming consumption. The block happens in Netty event loop threads. The 
> blocking prolongs each connection, leading to connection reset or suboptimal 
> concurrency.
> This patch changes the throttling mechanism to be range-GetObject based. Each 
> request retrieves a data range of the object once permitted by rate limiter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19836) [Analytics] Fix NPE when writing UDT values

2024-08-15 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19836:
--
 Bug Category: Parent values: Availability(12983)Level 1 values: Process 
Crash(12992)
   Complexity: Normal
Discovered By: User Report
 Severity: Normal
   Status: Open  (was: Triage Needed)

PR: https://github.com/apache/cassandra-analytics/pull/74
CI: 
https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19836%2Ftrunk

> [Analytics] Fix NPE when writing UDT values
> ---
>
> Key: CASSANDRA-19836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19836
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When UDT field values are set to null, the bulk writer throws NPE, e.g. the 
> stacktrace below. Although it is on the boolean type, the NPE can be thrown 
> on all other types whenever the value is null.
> {code:java}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.cassandra.spark.data.types.Boolean.setInnerValue(Boolean.java:91)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.setInnerValue(CqlUdt.java:534)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.toUserTypeValue(CqlUdt.java:522)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.convertForCqlWriter(CqlUdt.java:169)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.maybeConvertUdt(RecordWriter.java:450)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.getBindValuesForColumns(RecordWriter.java:432)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.writeRow(RecordWriter.java:415)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.write(RecordWriter.java:202)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19836) [Analytics] Fix NPE when writing UDT values

2024-08-15 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19836:
--
Test and Documentation Plan: ci; integration test
 Status: Patch Available  (was: Open)

> [Analytics] Fix NPE when writing UDT values
> ---
>
> Key: CASSANDRA-19836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19836
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When UDT field values are set to null, the bulk writer throws NPE, e.g. the 
> stacktrace below. Although it is on the boolean type, the NPE can be thrown 
> on all other types whenever the value is null.
> {code:java}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.cassandra.spark.data.types.Boolean.setInnerValue(Boolean.java:91)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.setInnerValue(CqlUdt.java:534)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.toUserTypeValue(CqlUdt.java:522)
>   at 
> org.apache.cassandra.spark.data.complex.CqlUdt.convertForCqlWriter(CqlUdt.java:169)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.maybeConvertUdt(RecordWriter.java:450)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.getBindValuesForColumns(RecordWriter.java:432)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.writeRow(RecordWriter.java:415)
>   at 
> org.apache.cassandra.spark.bulkwriter.RecordWriter.write(RecordWriter.java:202)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19836) [Analytics] Fix NPE when writing UDT values

2024-08-15 Thread Yifan Cai (Jira)
Yifan Cai created CASSANDRA-19836:
-

 Summary: [Analytics] Fix NPE when writing UDT values
 Key: CASSANDRA-19836
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19836
 Project: Cassandra
  Issue Type: Bug
  Components: Analytics Library
Reporter: Yifan Cai
Assignee: Yifan Cai


When UDT field values are set to null, the bulk writer throws NPE, e.g. the 
stacktrace below. Although it is on the boolean type, the NPE can be thrown on 
all other types whenever the value is null.

{code:java}
Caused by: java.lang.NullPointerException
  at 
org.apache.cassandra.spark.data.types.Boolean.setInnerValue(Boolean.java:91)
  at 
org.apache.cassandra.spark.data.complex.CqlUdt.setInnerValue(CqlUdt.java:534)
  at 
org.apache.cassandra.spark.data.complex.CqlUdt.toUserTypeValue(CqlUdt.java:522)
  at 
org.apache.cassandra.spark.data.complex.CqlUdt.convertForCqlWriter(CqlUdt.java:169)
  at 
org.apache.cassandra.spark.bulkwriter.RecordWriter.maybeConvertUdt(RecordWriter.java:450)
  at 
org.apache.cassandra.spark.bulkwriter.RecordWriter.getBindValuesForColumns(RecordWriter.java:432)
  at 
org.apache.cassandra.spark.bulkwriter.RecordWriter.writeRow(RecordWriter.java:415)
  at 
org.apache.cassandra.spark.bulkwriter.RecordWriter.write(RecordWriter.java:202)
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRASC-142) [Sidecar] Improve S3 download throttling with range-GetObject

2024-08-14 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRASC-142:
--
Authors: Yifan Cai
Test and Documentation Plan: ci; unit
 Status: Patch Available  (was: Open)

PR: https://github.com/apache/cassandra-sidecar/pull/132
CI: 
https://app.circleci.com/pipelines/github/yifan-c/cassandra-sidecar?branch=CASSANDRASC-142%2Ftrunk-storage-client

> [Sidecar] Improve S3 download throttling with range-GetObject
> -
>
> Key: CASSANDRASC-142
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-142
> Project: Sidecar for Apache Cassandra
>  Issue Type: Improvement
>  Components: Rest API
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
>
> The current s3 download throttling in sidecar is implemented by blocking the 
> streaming consumption. The block happens in Netty event loop threads. The 
> blocking prolongs each connection, leading to connection reset or suboptimal 
> concurrency.
> This patch changes the throttling mechanism to be range-GetObject based. Each 
> request retrieves a data range of the object once permitted by rate limiter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRASC-142) [Sidecar] Improve S3 download throttling with range-GetObject

2024-08-14 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRASC-142:
--
Change Category: Performance
 Complexity: Normal
Component/s: Rest API
 Status: Open  (was: Triage Needed)

> [Sidecar] Improve S3 download throttling with range-GetObject
> -
>
> Key: CASSANDRASC-142
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-142
> Project: Sidecar for Apache Cassandra
>  Issue Type: Improvement
>  Components: Rest API
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> The current s3 download throttling in sidecar is implemented by blocking the 
> streaming consumption. The block happens in Netty event loop threads. The 
> blocking prolongs each connection, leading to connection reset or suboptimal 
> concurrency.
> This patch changes the throttling mechanism to be range-GetObject based. Each 
> request retrieves a data range of the object once permitted by rate limiter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRASC-142) [Sidecar] Improve S3 download throttling with range-GetObject

2024-08-14 Thread Yifan Cai (Jira)
Yifan Cai created CASSANDRASC-142:
-

 Summary: [Sidecar] Improve S3 download throttling with 
range-GetObject
 Key: CASSANDRASC-142
 URL: https://issues.apache.org/jira/browse/CASSANDRASC-142
 Project: Sidecar for Apache Cassandra
  Issue Type: Improvement
Reporter: Yifan Cai
Assignee: Yifan Cai


The current s3 download throttling in sidecar is implemented by blocking the 
streaming consumption. The block happens in Netty event loop threads. The 
blocking prolongs each connection, leading to connection reset or suboptimal 
concurrency.
This patch changes the throttling mechanism to be range-GetObject based. Each 
request retrieves a data range of the object once permitted by rate limiter.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19827) [Analytics] Add job_timeout_seconds writer option

2024-08-14 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19827:
--
  Fix Version/s: NA
Source Control Link: 
https://github.com/apache/cassandra-analytics/commit/d75a6bae5abbf80810012a181644f240141014d5
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> [Analytics] Add job_timeout_seconds writer option
> -
>
> Key: CASSANDRA-19827
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19827
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Option to specify the timeout in seconds for bulk write jobs. By default, it 
> is disabled.
> When JOB_TIMEOUT_SECONDS is specified, a job exceeding the timeout is:
> - successful when the desired consistency level is met
> - a failure otherwise



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19827) [Analytics] Add job_timeout_seconds writer option

2024-08-14 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19827:
--
Status: Ready to Commit  (was: Review In Progress)

> [Analytics] Add job_timeout_seconds writer option
> -
>
> Key: CASSANDRA-19827
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19827
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Option to specify the timeout in seconds for bulk write jobs. By default, it 
> is disabled.
> When JOB_TIMEOUT_SECONDS is specified, a job exceeding the timeout is:
> - successful when the desired consistency level is met
> - a failure otherwise



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19827) [Analytics] Add job_timeout_seconds writer option

2024-08-14 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19827:
--
Reviewers: Dinesh Joshi, Doug Rohrer
   Status: Review In Progress  (was: Patch Available)

> [Analytics] Add job_timeout_seconds writer option
> -
>
> Key: CASSANDRA-19827
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19827
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Option to specify the timeout in seconds for bulk write jobs. By default, it 
> is disabled.
> When JOB_TIMEOUT_SECONDS is specified, a job exceeding the timeout is:
> - successful when the desired consistency level is met
> - a failure otherwise



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19827) [Analytics] Add job_timeout_seconds writer option

2024-08-14 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19827:
--
Description: 
Option to specify the timeout in seconds for bulk write jobs. By default, it is 
disabled.
When JOB_TIMEOUT_SECONDS is specified, a job exceeding the timeout is:
- successful when the desired consistency level is met
- a failure otherwise

  was:
Option to specify the ideal timeout in seconds for bulk write jobs.
It is only effective when the bulk write job is using S3_COMPACT data transport 
mode.
When JOB_IDEAL_TIMEOUT_SECONDS is specified and less than the actual time the 
bulk write job
needs to achieve the specified consistency level, it is ignored and job only 
exit after the desired consistency level has been satisfied.
For example, a bulk write job indeed requires 1 hour to achieve LOCAL_QUORUM, 
it ignores
any JOB_IDEAL_TIMEOUT_SECONDS that is less than 3600 seconds (1 hour), and only 
complete after 1 hour.
If JOB_IDEAL_TIMEOUT_SECONDS is 5400 seconds (1.5 hours), the job after achieve 
LOCAL_QUORUM waits for at most 0.5 hours in addition. The effective
wait time is the minimum of the remaining time to ideal timeout and the 
estimated wait time to finish all slice import (as estimated
in org.apache.cassandra.spark.bulkwriter.ImportCompletionCoordinator).
The ideal timeout is ignored in order to complete the bulk write job in some 
circumstances, hence named "ideal".


> [Analytics] Add job_timeout_seconds writer option
> -
>
> Key: CASSANDRA-19827
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19827
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Option to specify the timeout in seconds for bulk write jobs. By default, it 
> is disabled.
> When JOB_TIMEOUT_SECONDS is specified, a job exceeding the timeout is:
> - successful when the desired consistency level is met
> - a failure otherwise



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19827) [Analytics] Add job_timeout_seconds writer option

2024-08-13 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19827:
--
Summary: [Analytics] Add job_timeout_seconds writer option  (was: 
[Analytics] Add job_ideal_timeout_seconds writer option)

> [Analytics] Add job_timeout_seconds writer option
> -
>
> Key: CASSANDRA-19827
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19827
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Option to specify the ideal timeout in seconds for bulk write jobs.
> It is only effective when the bulk write job is using S3_COMPACT data 
> transport mode.
> When JOB_IDEAL_TIMEOUT_SECONDS is specified and less than the actual time the 
> bulk write job
> needs to achieve the specified consistency level, it is ignored and job only 
> exit after the desired consistency level has been satisfied.
> For example, a bulk write job indeed requires 1 hour to achieve LOCAL_QUORUM, 
> it ignores
> any JOB_IDEAL_TIMEOUT_SECONDS that is less than 3600 seconds (1 hour), and 
> only complete after 1 hour.
> If JOB_IDEAL_TIMEOUT_SECONDS is 5400 seconds (1.5 hours), the job after 
> achieve LOCAL_QUORUM waits for at most 0.5 hours in addition. The effective
> wait time is the minimum of the remaining time to ideal timeout and the 
> estimated wait time to finish all slice import (as estimated
> in org.apache.cassandra.spark.bulkwriter.ImportCompletionCoordinator).
> The ideal timeout is ignored in order to complete the bulk write job in some 
> circumstances, hence named "ideal".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19827) [Analytics] Add job_ideal_timeout_seconds writer option

2024-08-13 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873297#comment-17873297
 ] 

Yifan Cai commented on CASSANDRA-19827:
---

I talked with [~drohrer] offline (, as he is reviewing the patch). We decided 
to generalize job_ideal_timeout_seconds to just job_timeout_seconds for 
improved clarity. It is the timeout that applies to bulk write in general, 
instead of only for S3_COMPACT as job_ideal_timeout_seconds. I will update the 
jira title accordingly. 

> [Analytics] Add job_ideal_timeout_seconds writer option
> ---
>
> Key: CASSANDRA-19827
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19827
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Option to specify the ideal timeout in seconds for bulk write jobs.
> It is only effective when the bulk write job is using S3_COMPACT data 
> transport mode.
> When JOB_IDEAL_TIMEOUT_SECONDS is specified and less than the actual time the 
> bulk write job
> needs to achieve the specified consistency level, it is ignored and job only 
> exit after the desired consistency level has been satisfied.
> For example, a bulk write job indeed requires 1 hour to achieve LOCAL_QUORUM, 
> it ignores
> any JOB_IDEAL_TIMEOUT_SECONDS that is less than 3600 seconds (1 hour), and 
> only complete after 1 hour.
> If JOB_IDEAL_TIMEOUT_SECONDS is 5400 seconds (1.5 hours), the job after 
> achieve LOCAL_QUORUM waits for at most 0.5 hours in addition. The effective
> wait time is the minimum of the remaining time to ideal timeout and the 
> estimated wait time to finish all slice import (as estimated
> in org.apache.cassandra.spark.bulkwriter.ImportCompletionCoordinator).
> The ideal timeout is ignored in order to complete the bulk write job in some 
> circumstances, hence named "ideal".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19827) [Analytics] Add job_ideal_timeout_seconds writer option

2024-08-12 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873029#comment-17873029
 ] 

Yifan Cai commented on CASSANDRA-19827:
---

CI is green

> [Analytics] Add job_ideal_timeout_seconds writer option
> ---
>
> Key: CASSANDRA-19827
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19827
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Option to specify the ideal timeout in seconds for bulk write jobs.
> It is only effective when the bulk write job is using S3_COMPACT data 
> transport mode.
> When JOB_IDEAL_TIMEOUT_SECONDS is specified and less than the actual time the 
> bulk write job
> needs to achieve the specified consistency level, it is ignored and job only 
> exit after the desired consistency level has been satisfied.
> For example, a bulk write job indeed requires 1 hour to achieve LOCAL_QUORUM, 
> it ignores
> any JOB_IDEAL_TIMEOUT_SECONDS that is less than 3600 seconds (1 hour), and 
> only complete after 1 hour.
> If JOB_IDEAL_TIMEOUT_SECONDS is 5400 seconds (1.5 hours), the job after 
> achieve LOCAL_QUORUM waits for at most 0.5 hours in addition. The effective
> wait time is the minimum of the remaining time to ideal timeout and the 
> estimated wait time to finish all slice import (as estimated
> in org.apache.cassandra.spark.bulkwriter.ImportCompletionCoordinator).
> The ideal timeout is ignored in order to complete the bulk write job in some 
> circumstances, hence named "ideal".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19827) [Analytics] Add job_ideal_timeout_seconds writer option

2024-08-12 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19827:
--
Test and Documentation Plan: ci; unit
 Status: Patch Available  (was: Open)

PR: https://github.com/apache/cassandra-analytics/pull/73
CI: 
https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19827%2Ftrunk

> [Analytics] Add job_ideal_timeout_seconds writer option
> ---
>
> Key: CASSANDRA-19827
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19827
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Option to specify the ideal timeout in seconds for bulk write jobs.
> It is only effective when the bulk write job is using S3_COMPACT data 
> transport mode.
> When JOB_IDEAL_TIMEOUT_SECONDS is specified and less than the actual time the 
> bulk write job
> needs to achieve the specified consistency level, it is ignored and job only 
> exit after the desired consistency level has been satisfied.
> For example, a bulk write job indeed requires 1 hour to achieve LOCAL_QUORUM, 
> it ignores
> any JOB_IDEAL_TIMEOUT_SECONDS that is less than 3600 seconds (1 hour), and 
> only complete after 1 hour.
> If JOB_IDEAL_TIMEOUT_SECONDS is 5400 seconds (1.5 hours), the job after 
> achieve LOCAL_QUORUM waits for at most 0.5 hours in addition. The effective
> wait time is the minimum of the remaining time to ideal timeout and the 
> estimated wait time to finish all slice import (as estimated
> in org.apache.cassandra.spark.bulkwriter.ImportCompletionCoordinator).
> The ideal timeout is ignored in order to complete the bulk write job in some 
> circumstances, hence named "ideal".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19827) [Analytics] Add job_ideal_timeout_seconds writer option

2024-08-12 Thread Yifan Cai (Jira)
Yifan Cai created CASSANDRA-19827:
-

 Summary: [Analytics] Add job_ideal_timeout_seconds writer option
 Key: CASSANDRA-19827
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19827
 Project: Cassandra
  Issue Type: Improvement
  Components: Analytics Library
Reporter: Yifan Cai
Assignee: Yifan Cai


Option to specify the ideal timeout in seconds for bulk write jobs.
It is only effective when the bulk write job is using S3_COMPACT data transport 
mode.
When JOB_IDEAL_TIMEOUT_SECONDS is specified and less than the actual time the 
bulk write job
needs to achieve the specified consistency level, it is ignored and job only 
exit after the desired consistency level has been satisfied.
For example, a bulk write job indeed requires 1 hour to achieve LOCAL_QUORUM, 
it ignores
any JOB_IDEAL_TIMEOUT_SECONDS that is less than 3600 seconds (1 hour), and only 
complete after 1 hour.
If JOB_IDEAL_TIMEOUT_SECONDS is 5400 seconds (1.5 hours), the job after achieve 
LOCAL_QUORUM waits for at most 0.5 hours in addition. The effective
wait time is the minimum of the remaining time to ideal timeout and the 
estimated wait time to finish all slice import (as estimated
in org.apache.cassandra.spark.bulkwriter.ImportCompletionCoordinator).
The ideal timeout is ignored in order to complete the bulk write job in some 
circumstances, hence named "ideal".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19827) [Analytics] Add job_ideal_timeout_seconds writer option

2024-08-12 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19827:
--
Change Category: Operability
 Complexity: Normal
 Status: Open  (was: Triage Needed)

> [Analytics] Add job_ideal_timeout_seconds writer option
> ---
>
> Key: CASSANDRA-19827
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19827
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> Option to specify the ideal timeout in seconds for bulk write jobs.
> It is only effective when the bulk write job is using S3_COMPACT data 
> transport mode.
> When JOB_IDEAL_TIMEOUT_SECONDS is specified and less than the actual time the 
> bulk write job
> needs to achieve the specified consistency level, it is ignored and job only 
> exit after the desired consistency level has been satisfied.
> For example, a bulk write job indeed requires 1 hour to achieve LOCAL_QUORUM, 
> it ignores
> any JOB_IDEAL_TIMEOUT_SECONDS that is less than 3600 seconds (1 hour), and 
> only complete after 1 hour.
> If JOB_IDEAL_TIMEOUT_SECONDS is 5400 seconds (1.5 hours), the job after 
> achieve LOCAL_QUORUM waits for at most 0.5 hours in addition. The effective
> wait time is the minimum of the remaining time to ideal timeout and the 
> estimated wait time to finish all slice import (as estimated
> in org.apache.cassandra.spark.bulkwriter.ImportCompletionCoordinator).
> The ideal timeout is ignored in order to complete the bulk write job in some 
> circumstances, hence named "ideal".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19821) Prevent double closing SSTable writer

2024-08-09 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19821:
--
  Fix Version/s: NA
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra-analytics/commit/dbbd211cd420eb185d0579f16f5d46abc7bafeb4
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Prevent double closing SSTable writer
> -
>
> Key: CASSANDRA-19821
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19821
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Analytics uses `org.apache.cassandra.io.sstable.SSTableSimpleWriter` to 
> produce SSTables. Its implementation allows to be closed multiple times. 
> However, the subsequent calls to "close" cause exception. For example,
> {code:java}
> java.lang.RuntimeException: Last written key 
> DecoratedKey(-3078932293011064831, 22fd) >= current key 
> DecoratedKey(-3078932293011064831, 22fd) writing into nb-1-big-Data.db
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.beforeAppend(BigTableWriter.java:169)
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:208)
>   at 
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.append(SimpleSSTableMultiWriter.java:48)
>   at 
> org.apache.cassandra.io.sstable.SSTableTxnWriter.append(SSTableTxnWriter.java:57)
>   at 
> org.apache.cassandra.io.sstable.SSTableSimpleWriter.writePartition(SSTableSimpleWriter.java:152)
>   at 
> org.apache.cassandra.io.sstable.SSTableSimpleWriter.writeLastPartitionUpdate(SSTableSimpleWriter.java:125)
>   at 
> org.apache.cassandra.io.sstable.SSTableSimpleWriter.close(SSTableSimpleWriter.java:93)
>   at 
> org.apache.cassandra.io.sstable.CQLSSTableWriter.close(CQLSSTableWriter.java:337)
> {code}
> Cassandra analytics should prevent double closing the underlying writer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19821) Prevent double closing SSTable writer

2024-08-09 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19821:
--
Status: Ready to Commit  (was: Review In Progress)

> Prevent double closing SSTable writer
> -
>
> Key: CASSANDRA-19821
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19821
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Analytics uses `org.apache.cassandra.io.sstable.SSTableSimpleWriter` to 
> produce SSTables. Its implementation allows to be closed multiple times. 
> However, the subsequent calls to "close" cause exception. For example,
> {code:java}
> java.lang.RuntimeException: Last written key 
> DecoratedKey(-3078932293011064831, 22fd) >= current key 
> DecoratedKey(-3078932293011064831, 22fd) writing into nb-1-big-Data.db
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.beforeAppend(BigTableWriter.java:169)
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:208)
>   at 
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.append(SimpleSSTableMultiWriter.java:48)
>   at 
> org.apache.cassandra.io.sstable.SSTableTxnWriter.append(SSTableTxnWriter.java:57)
>   at 
> org.apache.cassandra.io.sstable.SSTableSimpleWriter.writePartition(SSTableSimpleWriter.java:152)
>   at 
> org.apache.cassandra.io.sstable.SSTableSimpleWriter.writeLastPartitionUpdate(SSTableSimpleWriter.java:125)
>   at 
> org.apache.cassandra.io.sstable.SSTableSimpleWriter.close(SSTableSimpleWriter.java:93)
>   at 
> org.apache.cassandra.io.sstable.CQLSSTableWriter.close(CQLSSTableWriter.java:337)
> {code}
> Cassandra analytics should prevent double closing the underlying writer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19821) Prevent double closing SSTable writer

2024-08-08 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19821:
--
Test and Documentation Plan: CI; unit, integration
 Status: Patch Available  (was: Open)

PR: https://github.com/apache/cassandra-analytics/pull/72
CI: 
https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19821%2Ftrunk

> Prevent double closing SSTable writer
> -
>
> Key: CASSANDRA-19821
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19821
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Analytics uses `org.apache.cassandra.io.sstable.SSTableSimpleWriter` to 
> produce SSTables. Its implementation allows to be closed multiple times. 
> However, the subsequent calls to "close" cause exception. For example,
> {code:java}
> java.lang.RuntimeException: Last written key 
> DecoratedKey(-3078932293011064831, 22fd) >= current key 
> DecoratedKey(-3078932293011064831, 22fd) writing into nb-1-big-Data.db
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.beforeAppend(BigTableWriter.java:169)
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:208)
>   at 
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.append(SimpleSSTableMultiWriter.java:48)
>   at 
> org.apache.cassandra.io.sstable.SSTableTxnWriter.append(SSTableTxnWriter.java:57)
>   at 
> org.apache.cassandra.io.sstable.SSTableSimpleWriter.writePartition(SSTableSimpleWriter.java:152)
>   at 
> org.apache.cassandra.io.sstable.SSTableSimpleWriter.writeLastPartitionUpdate(SSTableSimpleWriter.java:125)
>   at 
> org.apache.cassandra.io.sstable.SSTableSimpleWriter.close(SSTableSimpleWriter.java:93)
>   at 
> org.apache.cassandra.io.sstable.CQLSSTableWriter.close(CQLSSTableWriter.java:337)
> {code}
> Cassandra analytics should prevent double closing the underlying writer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19821) Prevent double closing SSTable writer

2024-08-08 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19821:
--
 Bug Category: Parent values: Availability(12983)Level 1 values: Process 
Crash(12992)
   Complexity: Normal
Discovered By: Adhoc Test
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Prevent double closing SSTable writer
> -
>
> Key: CASSANDRA-19821
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19821
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> Analytics uses `org.apache.cassandra.io.sstable.SSTableSimpleWriter` to 
> produce SSTables. Its implementation allows to be closed multiple times. 
> However, the subsequent calls to "close" cause exception. For example,
> {code:java}
> java.lang.RuntimeException: Last written key 
> DecoratedKey(-3078932293011064831, 22fd) >= current key 
> DecoratedKey(-3078932293011064831, 22fd) writing into nb-1-big-Data.db
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.beforeAppend(BigTableWriter.java:169)
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:208)
>   at 
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.append(SimpleSSTableMultiWriter.java:48)
>   at 
> org.apache.cassandra.io.sstable.SSTableTxnWriter.append(SSTableTxnWriter.java:57)
>   at 
> org.apache.cassandra.io.sstable.SSTableSimpleWriter.writePartition(SSTableSimpleWriter.java:152)
>   at 
> org.apache.cassandra.io.sstable.SSTableSimpleWriter.writeLastPartitionUpdate(SSTableSimpleWriter.java:125)
>   at 
> org.apache.cassandra.io.sstable.SSTableSimpleWriter.close(SSTableSimpleWriter.java:93)
>   at 
> org.apache.cassandra.io.sstable.CQLSSTableWriter.close(CQLSSTableWriter.java:337)
> {code}
> Cassandra analytics should prevent double closing the underlying writer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19821) Prevent double closing SSTable writer

2024-08-08 Thread Yifan Cai (Jira)
Yifan Cai created CASSANDRA-19821:
-

 Summary: Prevent double closing SSTable writer
 Key: CASSANDRA-19821
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19821
 Project: Cassandra
  Issue Type: Bug
  Components: Analytics Library
Reporter: Yifan Cai
Assignee: Yifan Cai


Analytics uses `org.apache.cassandra.io.sstable.SSTableSimpleWriter` to produce 
SSTables. Its implementation allows to be closed multiple times. However, the 
subsequent calls to "close" cause exception. For example,


{code:java}
java.lang.RuntimeException: Last written key DecoratedKey(-3078932293011064831, 
22fd) >= current key DecoratedKey(-3078932293011064831, 22fd) writing 
into nb-1-big-Data.db
at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter.beforeAppend(BigTableWriter.java:169)
at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:208)
at 
org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.append(SimpleSSTableMultiWriter.java:48)
at 
org.apache.cassandra.io.sstable.SSTableTxnWriter.append(SSTableTxnWriter.java:57)
at 
org.apache.cassandra.io.sstable.SSTableSimpleWriter.writePartition(SSTableSimpleWriter.java:152)
at 
org.apache.cassandra.io.sstable.SSTableSimpleWriter.writeLastPartitionUpdate(SSTableSimpleWriter.java:125)
at 
org.apache.cassandra.io.sstable.SSTableSimpleWriter.close(SSTableSimpleWriter.java:93)
at 
org.apache.cassandra.io.sstable.CQLSSTableWriter.close(CQLSSTableWriter.java:337)
{code}


Cassandra analytics should prevent double closing the underlying writer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19806) [Analytics] Stream sstable eagerly when bulk writing to allow reclaiming local disk space

2024-08-07 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19806:
--
  Fix Version/s: NA
Source Control Link: 
https://github.com/apache/cassandra-analytics/commit/e168011c40de2ca48d138514640838067e61feea
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Merged into trunk as 
[e168011c|https://github.com/apache/cassandra-analytics/commit/e168011c40de2ca48d138514640838067e61feea]

> [Analytics] Stream sstable eagerly when bulk writing to allow reclaiming 
> local disk space
> -
>
> Key: CASSANDRA-19806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19806
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, each bulk write executor only sends sstables after exhausting the 
> input data (of the task). All produced sstables are staged locally, when 
> executor local disk space is limited or the input data size is too large, 
> there is a risk of running out of disk space.
> The patch changes the streaming strategy to stream eagerly and remove the 
> local files sooner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRASC-140) Updating traffic shaping options throws IllegalStateException

2024-08-06 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRASC-140:
--
Reviewers: Arjun Ashok, Saranya Krishnakumar, Yifan Cai  (was: Arjun Ashok, 
Saranya Krishnakumar, Yifan Cai, Yifan Cai)

> Updating traffic shaping options throws IllegalStateException
> -
>
> Key: CASSANDRASC-140
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-140
> Project: Sidecar for Apache Cassandra
>  Issue Type: Bug
>  Components: Rest API
>Reporter: Francisco Guerrero
>Assignee: Francisco Guerrero
>Priority: Normal
>
> When updating the traffic shaping options in Sidecar in 
> {{org.apache.cassandra.sidecar.server.Server#updateTrafficShapingOptions}}, 
> we are encountering a bug in vert.x. The problem happens in 
> {{io.vertx.core.net.impl.TCPServerBase#updateTrafficShapingOptions}} where 
> the {{trafficShapingHandler}} is {{null}} for {{childHandler}}s. When a 
> {{null}} {{trafficShapingHandler}} is encountered, the following exception is 
> thrown:
> {code:java}
> throw new IllegalStateException("Unable to update traffic shaping options 
> because the server was not configured " +
> "to use traffic shaping during startup");
> {code}
> I propose a stopgap measure to fix the issue in Sidecar while we wait for a 
> new vert.x release that includes a fix for this issue. Without a fix, we risk 
> leaving Sidecar in unknown state after updating the traffic shaping options. 
> Because applying the traffic shaping options can succeed or fail before 
> encountering the exception. This can potentially leave a cluster of Sidecar 
> servers in an inconsistent state across all Sidecars. The only option to 
> return to a well-known state is by restarting the Sidecar process across the 
> cluster with the updated traffic shaping options applied in the yaml before 
> starting the process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRASC-140) Updating traffic shaping options throws IllegalStateException

2024-08-06 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRASC-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871468#comment-17871468
 ] 

Yifan Cai commented on CASSANDRASC-140:
---

+1 on the patch.

> Updating traffic shaping options throws IllegalStateException
> -
>
> Key: CASSANDRASC-140
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-140
> Project: Sidecar for Apache Cassandra
>  Issue Type: Bug
>  Components: Rest API
>Reporter: Francisco Guerrero
>Assignee: Francisco Guerrero
>Priority: Normal
>
> When updating the traffic shaping options in Sidecar in 
> {{org.apache.cassandra.sidecar.server.Server#updateTrafficShapingOptions}}, 
> we are encountering a bug in vert.x. The problem happens in 
> {{io.vertx.core.net.impl.TCPServerBase#updateTrafficShapingOptions}} where 
> the {{trafficShapingHandler}} is {{null}} for {{childHandler}}s. When a 
> {{null}} {{trafficShapingHandler}} is encountered, the following exception is 
> thrown:
> {code:java}
> throw new IllegalStateException("Unable to update traffic shaping options 
> because the server was not configured " +
> "to use traffic shaping during startup");
> {code}
> I propose a stopgap measure to fix the issue in Sidecar while we wait for a 
> new vert.x release that includes a fix for this issue. Without a fix, we risk 
> leaving Sidecar in unknown state after updating the traffic shaping options. 
> Because applying the traffic shaping options can succeed or fail before 
> encountering the exception. This can potentially leave a cluster of Sidecar 
> servers in an inconsistent state across all Sidecars. The only option to 
> return to a well-known state is by restarting the Sidecar process across the 
> cluster with the updated traffic shaping options applied in the yaml before 
> starting the process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRASC-140) Updating traffic shaping options throws IllegalStateException

2024-08-06 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRASC-140:
--
Reviewers: Arjun Ashok, Saranya Krishnakumar, Yifan Cai, Yifan Cai
   Status: Review In Progress  (was: Patch Available)

> Updating traffic shaping options throws IllegalStateException
> -
>
> Key: CASSANDRASC-140
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-140
> Project: Sidecar for Apache Cassandra
>  Issue Type: Bug
>  Components: Rest API
>Reporter: Francisco Guerrero
>Assignee: Francisco Guerrero
>Priority: Normal
>
> When updating the traffic shaping options in Sidecar in 
> {{org.apache.cassandra.sidecar.server.Server#updateTrafficShapingOptions}}, 
> we are encountering a bug in vert.x. The problem happens in 
> {{io.vertx.core.net.impl.TCPServerBase#updateTrafficShapingOptions}} where 
> the {{trafficShapingHandler}} is {{null}} for {{childHandler}}s. When a 
> {{null}} {{trafficShapingHandler}} is encountered, the following exception is 
> thrown:
> {code:java}
> throw new IllegalStateException("Unable to update traffic shaping options 
> because the server was not configured " +
> "to use traffic shaping during startup");
> {code}
> I propose a stopgap measure to fix the issue in Sidecar while we wait for a 
> new vert.x release that includes a fix for this issue. Without a fix, we risk 
> leaving Sidecar in unknown state after updating the traffic shaping options. 
> Because applying the traffic shaping options can succeed or fail before 
> encountering the exception. This can potentially leave a cluster of Sidecar 
> servers in an inconsistent state across all Sidecars. The only option to 
> return to a well-known state is by restarting the Sidecar process across the 
> cluster with the updated traffic shaping options applied in the yaml before 
> starting the process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19807) [Analytics] Improve the core bulk reader test system to match actual and expected rows by concatenating the partition keys with the serialized hex string instead of

2024-08-02 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19807:
--
  Fix Version/s: NA
Source Control Link: 
https://github.com/apache/cassandra-analytics/commit/3023a204c8ef16f886bd3dc219f7534b7edbaf2a
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Merged into trunk as 
[3023a204|https://github.com/apache/cassandra-analytics/commit/3023a204c8ef16f886bd3dc219f7534b7edbaf2a]

> [Analytics] Improve the core bulk reader test system to match actual and 
> expected rows by concatenating the partition keys with the serialized hex 
> string instead of utf-8 string
> -
>
> Key: CASSANDRA-19807
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19807
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: James Berragan
>Assignee: James Berragan
>Priority: Low
> Fix For: NA
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The current test system for the bulk reader matches actual and expected rows 
> by building a utf-8 string of the concatenated partition key(s), it would be 
> better to match on the hex string of the serialized bytes to avoid the 
> current custom string builder implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19807) [Analytics] Improve the core bulk reader test system to match actual and expected rows by concatenating the partition keys with the serialized hex string instead o

2024-08-02 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870667#comment-17870667
 ] 

Yifan Cai commented on CASSANDRA-19807:
---

+1

> [Analytics] Improve the core bulk reader test system to match actual and 
> expected rows by concatenating the partition keys with the serialized hex 
> string instead of utf-8 string
> -
>
> Key: CASSANDRA-19807
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19807
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: James Berragan
>Assignee: James Berragan
>Priority: Low
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current test system for the bulk reader matches actual and expected rows 
> by building a utf-8 string of the concatenated partition key(s), it would be 
> better to match on the hex string of the serialized bytes to avoid the 
> current custom string builder implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19807) [Analytics] Improve the core bulk reader test system to match actual and expected rows by concatenating the partition keys with the serialized hex string instead of

2024-08-02 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19807:
--
Status: Ready to Commit  (was: Review In Progress)

> [Analytics] Improve the core bulk reader test system to match actual and 
> expected rows by concatenating the partition keys with the serialized hex 
> string instead of utf-8 string
> -
>
> Key: CASSANDRA-19807
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19807
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: James Berragan
>Assignee: James Berragan
>Priority: Low
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current test system for the bulk reader matches actual and expected rows 
> by building a utf-8 string of the concatenated partition key(s), it would be 
> better to match on the hex string of the serialized bytes to avoid the 
> current custom string builder implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19806) [Analytics] Stream sstable eagerly when bulk writing to allow reclaiming local disk space

2024-07-29 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19806:
--
Test and Documentation Plan: CI
 Status: Patch Available  (was: Open)

> [Analytics] Stream sstable eagerly when bulk writing to allow reclaiming 
> local disk space
> -
>
> Key: CASSANDRA-19806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19806
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, each bulk write executor only sends sstables after exhausting the 
> input data (of the task). All produced sstables are staged locally, when 
> executor local disk space is limited or the input data size is too large, 
> there is a risk of running out of disk space.
> The patch changes the streaming strategy to stream eagerly and remove the 
> local files sooner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19806) [Analytics] Stream sstable eagerly when bulk writing to allow reclaiming local disk space

2024-07-29 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19806:
--
Change Category: Semantic
 Complexity: Normal
 Status: Open  (was: Triage Needed)

PR: https://github.com/apache/cassandra-analytics/pull/69
CI: 
https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19806%2Ftrunk

> [Analytics] Stream sstable eagerly when bulk writing to allow reclaiming 
> local disk space
> -
>
> Key: CASSANDRA-19806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19806
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, each bulk write executor only sends sstables after exhausting the 
> input data (of the task). All produced sstables are staged locally, when 
> executor local disk space is limited or the input data size is too large, 
> there is a risk of running out of disk space.
> The patch changes the streaming strategy to stream eagerly and remove the 
> local files sooner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19800) Enhance CQLSSTableWriter to notify clients on sstable production

2024-07-24 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868532#comment-17868532
 ] 

Yifan Cai commented on CASSANDRA-19800:
---

Attached the CI result. There were 7 failed tests. None of them look related 
with the patch.

> Enhance CQLSSTableWriter to notify clients on sstable production
> 
>
> Key: CASSANDRA-19800
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19800
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/sstable
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Attachments: ci_summary.html, result_details.tar.gz
>
>
> Notifying when SSTables are produced is useful for CQLSSTableWriter clients 
> to have a better control on processing the SSTables. For example, Cassandra 
> Analytics can leverage the notification to determine when to import the 
> sstables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19800) Enhance CQLSSTableWriter to notify clients on sstable production

2024-07-24 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19800:
--
Attachment: result_details.tar.gz

> Enhance CQLSSTableWriter to notify clients on sstable production
> 
>
> Key: CASSANDRA-19800
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19800
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/sstable
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Attachments: ci_summary.html, result_details.tar.gz
>
>
> Notifying when SSTables are produced is useful for CQLSSTableWriter clients 
> to have a better control on processing the SSTables. For example, Cassandra 
> Analytics can leverage the notification to determine when to import the 
> sstables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19800) Enhance CQLSSTableWriter to notify clients on sstable production

2024-07-24 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19800:
--
Attachment: ci_summary.html

> Enhance CQLSSTableWriter to notify clients on sstable production
> 
>
> Key: CASSANDRA-19800
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19800
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/sstable
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Attachments: ci_summary.html
>
>
> Notifying when SSTables are produced is useful for CQLSSTableWriter clients 
> to have a better control on processing the SSTables. For example, Cassandra 
> Analytics can leverage the notification to determine when to import the 
> sstables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19793) [Analytics] Split the Cassandra type logic out from CassandraBridge so it can be utilized without the Spark dependency.

2024-07-24 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868503#comment-17868503
 ] 

Yifan Cai commented on CASSANDRA-19793:
---

+1 on the patch

> [Analytics] Split the Cassandra type logic out from CassandraBridge so it can 
> be utilized without the Spark dependency.
> ---
>
> Key: CASSANDRA-19793
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19793
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: James Berragan
>Assignee: James Berragan
>Priority: Low
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The CassandraBridge is a monolithic class that bridges to Cassandra but for 
> other use cases it is beneficial to access the Cassandra types independently 
> to deserialize Cassandra data. By splitting out the Cassandra types into a 
> separate object we can utilize Cassandra types for deserializing Cassandra 
> raw ByteBuffers decoupled from the Spark dependency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19800) Enhance CQLSSTableWriter to notify clients on sstable production

2024-07-24 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19800:
--
Test and Documentation Plan: unit test
 Status: Patch Available  (was: Open)

PR: https://github.com/apache/cassandra/pull/3439

> Enhance CQLSSTableWriter to notify clients on sstable production
> 
>
> Key: CASSANDRA-19800
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19800
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/sstable
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> Notifying when SSTables are produced is useful for CQLSSTableWriter clients 
> to have a better control on processing the SSTables. For example, Cassandra 
> Analytics can leverage the notification to determine when to import the 
> sstables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19800) Enhance CQLSSTableWriter to notify clients on sstable production

2024-07-24 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19800:
--
Change Category: Semantic
 Complexity: Low Hanging Fruit
 Status: Open  (was: Triage Needed)

> Enhance CQLSSTableWriter to notify clients on sstable production
> 
>
> Key: CASSANDRA-19800
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19800
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/sstable
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> Notifying when SSTables are produced is useful for CQLSSTableWriter clients 
> to have a better control on processing the SSTables. For example, Cassandra 
> Analytics can leverage the notification to determine when to import the 
> sstables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19800) Enhance CQLSSTableWriter to notify clients on sstable production

2024-07-24 Thread Yifan Cai (Jira)
Yifan Cai created CASSANDRA-19800:
-

 Summary: Enhance CQLSSTableWriter to notify clients on sstable 
production
 Key: CASSANDRA-19800
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19800
 Project: Cassandra
  Issue Type: Improvement
  Components: Tool/sstable
Reporter: Yifan Cai
Assignee: Yifan Cai


Notifying when SSTables are produced is useful for CQLSSTableWriter clients to 
have a better control on processing the SSTables. For example, Cassandra 
Analytics can leverage the notification to determine when to import the 
sstables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19793) [Analytics] Split the Cassandra type logic out from CassandraBridge so it can be utilized without the Spark dependency.

2024-07-23 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19793:
--
Reviewers: Yifan Cai
   Status: Review In Progress  (was: Patch Available)

> [Analytics] Split the Cassandra type logic out from CassandraBridge so it can 
> be utilized without the Spark dependency.
> ---
>
> Key: CASSANDRA-19793
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19793
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: James Berragan
>Assignee: James Berragan
>Priority: Low
>
> The CassandraBridge is a monolithic class that bridges to Cassandra but for 
> other use cases it is beneficial to access the Cassandra types independently 
> to deserialize Cassandra data. By splitting out the Cassandra types into a 
> separate object we can utilize Cassandra types for deserializing Cassandra 
> raw ByteBuffers decoupled from the Spark dependency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



  1   2   3   4   5   6   7   8   9   10   >