[ 
https://issues.apache.org/jira/browse/CASSANDRA-19633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson reassigned CASSANDRA-19633:
-------------------------------------------

    Assignee: Marcus Eriksson

> Replaced node is stuck in a loop calculating ranges
> ---------------------------------------------------
>
>                 Key: CASSANDRA-19633
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19633
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jai Bheemsen Rao Dhanwada
>            Assignee: Marcus Eriksson
>            Priority: Normal
>              Labels: Bootstrap
>         Attachments: result1.html
>
>
> Hello,
>  
> I am running into an issue where in a node that is replacing a dead 
> (non-seed) node is stuck in calculating ranges forever. It eventually 
> succeeds, however the time taken for calculating the ranges is not constant. 
> I do sometimes see that it takes 24 hours to calculate ranges for each 
> keyspace. Attached the flume graph of the cassandra process during this time, 
> which points to the below code. 
> {code:java}
> Multimap<InetAddressAndPort, Range<Token>> 
> getRangeFetchMapForNonTrivialRanges()
> {
> //Get the graph with edges between ranges and their source endpoints
> MutableCapacityGraph<Vertex, Integer> graph = getGraph();
> //Add source and destination vertex and edges
> addSourceAndDestination(graph, getDestinationLinkCapacity(graph));
> int flow = 0;
> MaximumFlowAlgorithmResult<Integer, CapacityEdge<Vertex, Integer>> result = 
> null;
> //We might not be working on all ranges
> while (flow < getTotalRangeVertices(graph))
> {
> if (flow > 0)
> { //We could not find a path with previous graph. Bump the capacity b/w 
> endpoint vertices and destination by 1 incrementCapacity(graph, 1); }
> MaximumFlowAlgorithm fordFulkerson = 
> FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance());
> result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, 
> IntegerNumberSystem.getInstance());
> int newFlow = result.calcTotalFlow();
> assert newFlow > flow; //We are not making progress which should not happen
> flow = newFlow;
> }
> return getRangeFetchMapFromGraphResult(graph, result);
> }
> {code}
> Digging through the logs, I see the below log line for a given keyspace 
> `system_auth`
> {code:java}
> INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range 
> Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on 
> Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for 
> keyspace system_auth{code}
> corresponding code:
> {code:java}
> for (Map.Entry<Replica, Replica> entry : fetchMap.flattenEntries())
> logger.info("{}: range {} exists on {} for keyspace {}", description, 
> entry.getKey(), entry.getValue(), keyspaceName);{code}
> BUT do not see the below line for the corresponding keyspace
> {code:java}
> RangeStreamer.java:606 - Output from RangeFetchMapCalculator for 
> keyspace{code}
> this means the code it's stuck in `getRangeFetchMap();`
> {code:java}
> Multimap<InetAddressAndPort, Range<Token>> rangeFetchMapMap = 
> calculator.getRangeFetchMap();
> logger.info("Output from RangeFetchMapCalculator for keyspace {}", 
> keyspace);{code}
> Here is the cluster topology:
>  * Cassandra version: 4.0.12
>  * # of nodes: 190
>  * Tokens (vnodes): 128
> Initial hypothesis was that the graph calculation was taking longer due to 
> the combination of nodes + tokens + tables but in the same cluster I see one 
> of the node joined without any issues. 
> wondering if I am hitting a bug causing it to  work sometimes but get into an 
> infinite loop some times?
> Please let me know if you need any other details and appreciate any pointers 
> to debug this further.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to