[ 
https://issues.apache.org/jira/browse/BEAM-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437588#comment-16437588
 ] 

Alexander Dejanovski commented on BEAM-3485:
--------------------------------------------

# So, out of experience I know that most clusters out there are running with 16 
to 256 vnodes per node, times the number of nodes we're going to generate a lot 
of splits. Still, it would be good to be able to enforce a minimum number of 
splits if needed, so I'd be in favor of adding it as optional input. If the 
computed number of splits is lower (or if Beam fails to compute it) then we 
should fallback to the user input.
Tell me if you agree and I'll add it.
 # It is for Murmur3 but it could be good to support the RandomPartitioner 
which uses tokens between 0 and 2^127-1, which should be out of the Long span. 

> CassandraIO.read() splitting produces invalid queries
> -----------------------------------------------------
>
>                 Key: BEAM-3485
>                 URL: https://issues.apache.org/jira/browse/BEAM-3485
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-cassandra
>            Reporter: Eugene Kirpichov
>            Assignee: Alexey Romanenko
>            Priority: Major
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> See 
> [https://stackoverflow.com/questions/48090668/how-to-increase-dataflow-read-parallelism-from-cassandra/48131264?noredirect=1#comment83548442_48131264]
> As the question author points out, the error is likely that token($pk) should 
> be token(pk). This was likely masked by BEAM-3424 and BEAM-3425, and the 
> splitting code path effectively was never invoked, and was broken from the 
> first PR - so there are likely other bugs.
> When testing this issue, we must ensure good code coverage in an IT against a 
> real Cassandra instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to