[
https://issues.apache.org/jira/browse/IGNITE-12080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911630#comment-16911630
]
Maxim Muzafarov commented on IGNITE-12080:
------------------------------------------
Folks,
I've looked through the changes and have a few questions regarding
implementation.
* Why the static utils class is used for collecting rebalance info? Why not,
for instance, the DiagnosticProcossor (introduced recently)?
* After IGNITE-3195 will be merged there is no reason to collect statistics
about `rebalance topic` it will be replaced with the thread pools.
* Do you have any benchmarks with the `printRebalanceStatistics` property
enabled? Since the rebalance procedure can be run 4-8 hours it is necessary to
check and analyze JVM metrics (GC, used heap etc.) We can have thousands of
Supply-Demand messages and for each, we are holding in the heap a
`RebalanceMessageStatistics` until the rebalance procedure finishes.
* Printed statistics are not in the human-readable format. Is it
user-friendly? Moreover, it is up to the implementation to print statistics the
right way in logs. I think we don't need any abbreviations (e.g.
`writeAliasesRebalanceStatistics`) to decode logs.
* Do we have TC execution with `printRebalanceStatistics` enabled property on
all suites? It seems to me we can get a `NullPointerException` for some of the
cases.
* Why the `RebalanceMessageStatistics` is needed? I don't think that holding
`sndMsgTime` for each message will be useful for rebalancing statistic at all.
The same thing for `rcvMsgTime`.
* I think `ReceivePartitionStatistics`.`msgSize` will be the same for 98%
cases. Do we need it?
* Do we need `PartitionStatistics` at all? Can the same value be obtained from
metrics `onRebalanceKeyReceived` and the end of the rebalance procedure?
Please, do not merge PR until all the issues will be resolved.
> Add extended logging for rebalance
> ----------------------------------
>
> Key: IGNITE-12080
> URL: https://issues.apache.org/jira/browse/IGNITE-12080
> Project: Ignite
> Issue Type: Improvement
> Reporter: Kirill Tkalenko
> Assignee: Kirill Tkalenko
> Priority: Major
> Time Spent: 50m
> Remaining Estimate: 0h
>
> We should log all information about finished rebalance on demander node.
> I'd have in log:
> h3. Total information:
> # Rebalance duration, rebalance start time/rebalance finish time
> # How many partitions were processed in each topic (number of paritions,
> number of entries, number of bytes)
> # How many nodes were suppliers in rebalance (nodeId, number of supplied
> paritions, number of supplied entries, number of bytes, duraton of getting
> and processing partitions from supplier)
> h3. Information per cache group:
> # Rebalance duration, rebalance start time/rebalance finish time
> # How many partitions were processed in each topic (number of paritions,
> number of entries, number of bytes)
> # How many nodes were suppliers in rebalance (nodeId, number of supplied
> paritions, list of partition ids with PRIMARY/BACKUP flag, number of supplied
> entries, number of bytes, duraton of getting and processing partitions from
> supplier)
> # Information about each partition distribution (list of nodeIds with
> primary/backup flag and marked supplier nodeId)
> h3. Information per supplier node:
> # How many paritions were requested:
> #* Total number
> #* Primary/backup distribution (number of primary partitions, number of
> backup partitions)
> #* Total number of entries
> #* Total size partitions in bytes
> # How many paritions were requested per cache group:
> #* Number of requested partitions
> #* Number of entries in partitions
> #* Total size of partitions in bytes
> #* List of requested partitions with size in bytes, count entries, primary or
> backup partition flag
--
This message was sent by Atlassian Jira
(v8.3.2#803003)