[
https://issues.apache.org/jira/browse/CASSGO-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raj Ummadisetty updated CASSGO-104:
-----------------------------------
Description:
h3. Problem:
The TokenAwareHostPolicy has two critical bugs affecting multi-keyspace
workloads:
{*}Issue 1{*}: Missing Replica Maps for Non-Default Keyspaces
When a session is created with keyspace (e.g., {color:#0747a6}ks1{color}) but
queries are executed against a different keyspace (e.g., {color:#00875a}SELECT
* FROM ks2.table{color}), the TokenAwareHostPolicy fails to perform token-aware
routing for the non-default keyspace. This occurs because the replica map
(meta.replicas) is only populated for the session's default keyspace.
This is a follow-up to issue ([GitHub
#1621|https://github.com/apache/cassandra-gocql-driver/issues/1621]). While [PR
#1714|https://github.com/apache/cassandra-gocql-driver/pull/1714] fixed the
keyspace extraction from prepared statements, it did not address the underlying
issue of populating replica information for non-default keyspaces.
*Issue 2:* Stale Replica Maps After Topology Changes
When a non-default keyspace is added via schema change events
(KeyspaceChanged), its replica map is populated. However, when topology changes
occur
([AddHost/RemoveHost|https://github.com/apache/cassandra-gocql-driver/blob/f1e31a58f7e0c25e58e2e2a0a0c6de358e643e8b/policies.go#L490-L547]),
only the session keyspace replica map is updated. Non-default keyspaces retain
STALE replica maps with outdated topology information.
h3.
Impact:
* Ineffective replica shuffling: Even when ShuffleReplicas() is enabled, all
queries to non-default keyspaces go to the primary replica
* Creates uneven load distribution across the cluster
* Queries to non-default keyspaces route to wrong or removed nodes
* Can cause query failures (NoHostAvailableException)
h3. Steps to Reproduce:
{code:java}
// Create session with keyspace ks1
cluster := gocql.NewCluster("127.0.0.1")
cluster.Keyspace = "ks1"
cluster.PoolConfig.HostSelectionPolicy =
gocql.TokenAwareHostPolicy(gocql.RoundRobinHostPolicy(),gocql.ShuffleReplicas))
session, _ := cluster.CreateSession()
defer session.Close()
// Query keyspace ks2
stmt := "SELECT * FROM ks2.table WHERE id = ?"
// This query will always be routed to primary replica
query := session.Query(stmt, someID)
{code}
h3. Root Cause:
In policies.go, the Pick() method looks up replicas:
[ht :=
meta.replicas[keyspace].replicasFor(token)|https://github.com/apache/cassandra-gocql-driver/blob/f1e31a58f7e0c25e58e2e2a0a0c6de358e643e8b/policies.go#L615]
However, meta.replicas[keyspace] is nil for any keyspace except the session's
default keyspace
h3. Proposed Solution:
* When TokenAwareHostPolicy is initialized hydrate the replica map for all
keyspaces.
* replica map needs to be updated for all the keyspaces for topology changes.
h3. Alternatives:
Implement lazy loading of replica maps in the Pick() method:
1. When replicas are not found for a keyspace, call a new
ensureReplicasForKeyspace() method
2. This method uses double-checked locking to populate the replica map on-demand
3. Subsequent queries to the same keyspace use the cached replica information
was:
h3. Problem:
When a session is created with keyspace (e.g., {color:#0747a6}ks1{color}) but
queries are executed against a different keyspace (e.g., {color:#00875a}SELECT
* FROM ks2.table{color}), the TokenAwareHostPolicy fails to perform token-aware
routing for the non-default keyspace. This occurs because the replica map
(meta.replicas) is only populated for the session's default keyspace.
This is a follow-up to issue ([GitHub
#1621|https://github.com/apache/cassandra-gocql-driver/issues/1621]). While [PR
#1714|https://github.com/apache/cassandra-gocql-driver/pull/1714] fixed the
keyspace extraction from prepared statements, it did not address the underlying
issue of populating replica information for non-default keyspaces.
h3. Impact:
* Ineffective replica shuffling: Even when ShuffleReplicas() is enabled, all
queries to non-default keyspaces go to the primary replica
* Creates uneven load distribution across the cluster
h3. Steps to Reproduce:
{code:java}
// Create session with keyspace ks1
cluster := gocql.NewCluster("127.0.0.1")
cluster.Keyspace = "ks1"
cluster.PoolConfig.HostSelectionPolicy =
gocql.TokenAwareHostPolicy(gocql.RoundRobinHostPolicy(),gocql.ShuffleReplicas))
session, _ := cluster.CreateSession()
defer session.Close()
// Query keyspace ks2
stmt := "SELECT * FROM ks2.table WHERE id = ?"
// This query will always be routed to primary replica
query := session.Query(stmt, someID)
{code}
h3. Root Cause:
In policies.go, the Pick() method looks up replicas:
[ht :=
meta.replicas[keyspace].replicasFor(token)|https://github.com/apache/cassandra-gocql-driver/blob/f1e31a58f7e0c25e58e2e2a0a0c6de358e643e8b/policies.go#L615]
However, meta.replicas[keyspace] is nil for any keyspace except the session's
default keyspace
h3. Proposed Solution
Implement lazy loading of replica maps in the Pick() method:
1. When replicas are not found for a keyspace, call a new
ensureReplicasForKeyspace() method
2. This method uses double-checked locking to populate the replica map on-demand
3. Subsequent queries to the same keyspace use the cached replica information
> "TokenAwareHostPolicy should populate and maintain replica maps for all
> keyspaces
> ---------------------------------------------------------------------------------
>
> Key: CASSGO-104
> URL: https://issues.apache.org/jira/browse/CASSGO-104
> Project: Apache Cassandra Go driver
> Issue Type: Bug
> Reporter: Raj Ummadisetty
> Assignee: Raj Ummadisetty
> Priority: Normal
> Time Spent: 20m
> Remaining Estimate: 0h
>
> h3. Problem:
> The TokenAwareHostPolicy has two critical bugs affecting multi-keyspace
> workloads:
> {*}Issue 1{*}: Missing Replica Maps for Non-Default Keyspaces
> When a session is created with keyspace (e.g., {color:#0747a6}ks1{color}) but
> queries are executed against a different keyspace (e.g.,
> {color:#00875a}SELECT * FROM ks2.table{color}), the TokenAwareHostPolicy
> fails to perform token-aware routing for the non-default keyspace. This
> occurs because the replica map (meta.replicas) is only populated for the
> session's default keyspace.
> This is a follow-up to issue ([GitHub
> #1621|https://github.com/apache/cassandra-gocql-driver/issues/1621]). While
> [PR #1714|https://github.com/apache/cassandra-gocql-driver/pull/1714] fixed
> the keyspace extraction from prepared statements, it did not address the
> underlying issue of populating replica information for non-default keyspaces.
> *Issue 2:* Stale Replica Maps After Topology Changes
> When a non-default keyspace is added via schema change events
> (KeyspaceChanged), its replica map is populated. However, when topology
> changes occur
> ([AddHost/RemoveHost|https://github.com/apache/cassandra-gocql-driver/blob/f1e31a58f7e0c25e58e2e2a0a0c6de358e643e8b/policies.go#L490-L547]),
> only the session keyspace replica map is updated. Non-default keyspaces
> retain STALE replica maps with outdated topology information.
> h3.
> Impact:
> * Ineffective replica shuffling: Even when ShuffleReplicas() is enabled, all
> queries to non-default keyspaces go to the primary replica
> * Creates uneven load distribution across the cluster
> * Queries to non-default keyspaces route to wrong or removed nodes
> * Can cause query failures (NoHostAvailableException)
> h3. Steps to Reproduce:
>
> {code:java}
> // Create session with keyspace ks1
> cluster := gocql.NewCluster("127.0.0.1")
> cluster.Keyspace = "ks1"
> cluster.PoolConfig.HostSelectionPolicy =
> gocql.TokenAwareHostPolicy(gocql.RoundRobinHostPolicy(),gocql.ShuffleReplicas))
> session, _ := cluster.CreateSession()
> defer session.Close()
> // Query keyspace ks2
> stmt := "SELECT * FROM ks2.table WHERE id = ?"
> // This query will always be routed to primary replica
> query := session.Query(stmt, someID)
> {code}
>
> h3. Root Cause:
> In policies.go, the Pick() method looks up replicas:
> [ht :=
> meta.replicas[keyspace].replicasFor(token)|https://github.com/apache/cassandra-gocql-driver/blob/f1e31a58f7e0c25e58e2e2a0a0c6de358e643e8b/policies.go#L615]
> However, meta.replicas[keyspace] is nil for any keyspace except the session's
> default keyspace
> h3. Proposed Solution:
> * When TokenAwareHostPolicy is initialized hydrate the replica map for all
> keyspaces.
> * replica map needs to be updated for all the keyspaces for topology
> changes.
> h3. Alternatives:
> Implement lazy loading of replica maps in the Pick() method:
> 1. When replicas are not found for a keyspace, call a new
> ensureReplicasForKeyspace() method
> 2. This method uses double-checked locking to populate the replica map
> on-demand
> 3. Subsequent queries to the same keyspace use the cached replica information
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]