[ https://issues.apache.org/jira/browse/CASSANDRA-19325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yifan Cai updated CASSANDRA-19325: ---------------------------------- Bug Category: Parent values: Code(13163) (was: Parent values: Correctness(12982)Level 1 values: Recoverable Corruption / Loss(12986)) Description: The ranges used in the analytics library do not have the consistent notation with Cassandra. The analytics library, as in the cassandra ecosystem, should use the open-closed range notation consistently, to avoid potential bugs in implementation. Besides that, during write process, the split sub-ranges are unordered. It does not seem to affect correctness, but can be confusing. was: The range splitting implementation can produce the following false results. - Given a tiny range, it can produce duplicated ranges, leading to spark executors working on the same data set. - The produced ranges are closed on both ends, making the same tokens being shared by 2 ranges, leading to data duplication. Besides the splitting error, during write process, the split sub-ranges are unordered. It does not seem to affect correctness, but can be confusing. Summary: [Analytics] Fix range split and use open-closed range notation consistently (was: [Analytics] Fix range splitting that can produce overlapping ranges) > [Analytics] Fix range split and use open-closed range notation consistently > --------------------------------------------------------------------------- > > Key: CASSANDRA-19325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19325 > Project: Cassandra > Issue Type: Bug > Components: Analytics Library > Reporter: Yifan Cai > Assignee: Yifan Cai > Priority: Normal > Time Spent: 2h 50m > Remaining Estimate: 0h > > The ranges used in the analytics library do not have the consistent notation > with Cassandra. The analytics library, as in the cassandra ecosystem, should > use the open-closed range notation consistently, to avoid potential bugs in > implementation. > Besides that, during write process, the split sub-ranges are unordered. It > does not seem to affect correctness, but can be confusing. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org