[ 
https://issues.apache.org/jira/browse/IGNITE-18171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641806#comment-17641806
 ] 

Andrey Mashenkov commented on IGNITE-18171:
-------------------------------------------

[~alapin] ,
I've attached pull request with ItClusterStartupTest that checks some 
meaningful scenarios.
These tests pass on PR, but have some issues.
 # Node start future can't finish if there is no MetaStorage group quorum. Even 
ClusterManagementGroup node. Is it a bug.
 # Node start future fails after 30 sec timeout if there is no both CMG and 
MetaStorage group quorum. 
Looks like a bug. Increase NODE_JOIN_WAIT_TIMEOUT field value to reproduce, 
e.g. no test pass with 15+ sec value.
What should be correct behaviour?
 # Table can't be span over predefined set of nodes. This is not possible 
without distributed zoned support. So, it is ok for now.
 # Calling a tx.rollback() on commited transaction instance fails with 
exception. I thought it this is valid pattern.

{code:java}
Transaction tx = node.transactions().begin();
try {
    // operations

    tx.commit();
} finally {
    tx.rollback(); // FAILS here with TransactionException "Fail to finish the 
transaction inconsistent state"
}{code}
I've commented the call for now.

> Descibe nodes start/stop scenarios
> ----------------------------------
>
>                 Key: IGNITE-18171
>                 URL: https://issues.apache.org/jira/browse/IGNITE-18171
>             Project: Ignite
>          Issue Type: Improvement
>          Components: sql
>            Reporter: Andrey Mashenkov
>            Assignee: Andrey Mashenkov
>            Priority: Major
>              Labels: ignite-3
>
> h2. Definitions.
> We can distinguish next cluster node groups, see below. Each node may be part 
> of one or more groups.
>  * Cluster Management Group (CMG), that control new nodes join process.
>  * MetaStorage group (MSG), that hosts meta storage.
>  * Data node group (DNG), that just hosts tables partitions.
> The components (CMG, meta storage, tables components) are depends on each 
> other, but may resides on different (even disjoint) node subsets. So, some 
> components may become temporary unavailable, and dependant components must be 
> aware of such issues and handle them (wait, retry, throw exception or 
> whatever) in expected way, which has to be documented also.
> [See IEP for 
> details|https://cwiki.apache.org/confluence/display/IGNITE/IEP-77%3A+Node+Join+Protocol+and+Initialization+for+Ignite+3]
> h2. Motivation.
> As of now, the correct way to start the grid (after it was stopped) is: start 
> CMG nodes, then Meta Storage nodes, then Data nodes. And in backward order 
> for correct stop. Other scenarios are not tested and may lead to unexpected 
> behaviour.
> Let's describe all possible scenarios, expected behaviour for each of them 
> and extend test coverage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to