[
https://issues.apache.org/jira/browse/BEAM-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550154#comment-17550154
]
Danny McCormick commented on BEAM-14558:
----------------------------------------
This issue has been migrated to https://github.com/apache/beam/issues/21715
> Data missing when using CassandraIO.Read
> ----------------------------------------
>
> Key: BEAM-14558
> URL: https://issues.apache.org/jira/browse/BEAM-14558
> Project: Beam
> Issue Type: Bug
> Components: io-java-cassandra
> Affects Versions: 2.34.0, 2.35.0, 2.36.0, 2.37.0, 2.38.0, 2.39.0
> Reporter: Christophe ROQUETTE
> Priority: P1
>
> h2. Bug
> Data at the beginning or end of the token ring is never retrieved, due to a
> bad TokenRange request.
> This bug was introduced by BEAM-9008, in [this
> commit|https://github.com/apache/beam/commit/e12fc33e55e23db9f2aee330039d16dace34f9aa]
> A basic reproduction case & workarounds are available here:
> [Github/beam-cassandraio-bug|https://github.com/KriKroff/beam-cassandraio-bug]
> h2. Description
> When using {{{}CassandraIO{}}}, a list of token ranges is requested to C*
> nodes in order to create splits in those ranges.
> A split will be represented as a RingRange resulting in a request to C* in
> the form of
> `TOKEN(partition_key) >= range_start AND TOKEN(partition_key) < range_end`
> The token ring goes from Long.MIN_VALUE to Long.MAX_VALUE (so -2xxx to 2xxx),
> a range may contains the "join point" and be represented by [2xx, -2xxx].
> In this case (Aka TokenRange isWrapping), old implementation used to send 2
> different requests:
> * {{TOKEN(partition_key) >= range_start}} (To get result up to the end of
> the ring, i.e. Long.MAX_VALUE)
> * {{TOKEN(partition_key) < range_end}} (To get result from the beginning end
> of the ring, i.e. Long.MIN_VALUE)
> Now, this behavior is not implemented anymore and token ranges are all called
> the same way, even in the wrapping case.
> It results in a request like :
> {{TOKEN(partition_key) >= 2XXX AND TOKEN(partition_key) < -2xxx}}
> This gives 0 results, and some data is never retrieved.
>
> h2. WorkArounds
> * Downgrade to 2.33.0
> * Use customer TokenRanges & readAll implementation
--
This message was sent by Atlassian Jira
(v8.20.7#820007)