[jira] [Created] (IGNITE-14452) Add cehcking of the iptables settings applied.

2021-03-31 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14452:
-

 Summary: Add cehcking of the iptables settings applied.
 Key: IGNITE-14452
 URL: https://issues.apache.org/jira/browse/IGNITE-14452
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Sometimes, we lack settings of iptables for unknows reason. Let's monitor this 
issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14437) Adjust test params: exclude input net failures with disabled connRecovery

2021-03-29 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14437:
-

 Summary: Adjust test params: exclude input net failures with 
disabled connRecovery
 Key: IGNITE-14437
 URL: https://issues.apache.org/jira/browse/IGNITE-14437
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14378) Remove delay from node ping.

2021-03-22 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14378:
-

 Summary: Remove delay from node ping.
 Key: IGNITE-14378
 URL: https://issues.apache.org/jira/browse/IGNITE-14378
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Remove U.sleep(200) from the node ping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14377) Enchance log of node ping failure.

2021-03-22 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14377:
-

 Summary: Enchance log of node ping failure.
 Key: IGNITE-14377
 URL: https://issues.apache.org/jira/browse/IGNITE-14377
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Log of unsuccessful ping during the joining is insufficient. No failure reason 
is logged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14096) Try to bring randomization in node waiting with TcpDiscoverySpi.reconnectDelay.

2021-01-28 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14096:
-

 Summary: Try to bring randomization in node waiting with 
TcpDiscoverySpi.reconnectDelay.
 Key: IGNITE-14096
 URL: https://issues.apache.org/jira/browse/IGNITE-14096
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


To speed up cluster start slyghtly, try to bring randomization in node waiting 
with TcpDiscoverySpi.reconnectDelay. Check with the ducktape integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14095) Try fasten cluster start in the ducktests with decreasing 'spi.reconnectDelay'

2021-01-28 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14095:
-

 Summary: Try fasten cluster start in the ducktests with decreasing 
'spi.reconnectDelay'
 Key: IGNITE-14095
 URL: https://issues.apache.org/jira/browse/IGNITE-14095
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14068) Infinite node persistance in the ring while outcoming connections are lost

2021-01-26 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14068:
-

 Summary: Infinite node persistance in the ring while outcoming 
connections are lost
 Key: IGNITE-14068
 URL: https://issues.apache.org/jira/browse/IGNITE-14068
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14054) Improve discovery ducktest: add partial network drop.

2021-01-25 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14054:
-

 Summary: Improve discovery ducktest: add partial network drop.
 Key: IGNITE-14054
 URL: https://issues.apache.org/jira/browse/IGNITE-14054
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14053) Remove status check message at all.

2021-01-25 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14053:
-

 Summary: Remove status check message at all.
 Key: IGNITE-14053
 URL: https://issues.apache.org/jira/browse/IGNITE-14053
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14038) Separate JVM settings in the ducktests.

2021-01-22 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14038:
-

 Summary: Separate JVM settings in the ducktests.
 Key: IGNITE-14038
 URL: https://issues.apache.org/jira/browse/IGNITE-14038
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14037) Separate JVM settings in the ducktests.

2021-01-22 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-14037:
-

 Summary: Separate JVM settings in the ducktests.
 Key: IGNITE-14037
 URL: https://issues.apache.org/jira/browse/IGNITE-14037
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13980) Remove duplicated ping: status check message.

2021-01-12 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13980:
-

 Summary: Remove duplicated ping: status check message.
 Key: IGNITE-13980
 URL: https://issues.apache.org/jira/browse/IGNITE-13980
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13835) Improve discovery ducktape test to research small timeouts and behavior on large cluster.

2020-12-10 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13835:
-

 Summary: Improve discovery ducktape test to research small 
timeouts and behavior on large cluster.
 Key: IGNITE-13835
 URL: https://issues.apache.org/jira/browse/IGNITE-13835
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Improve discovery ducktape test to research the cluster behavior with bigger 
node number and smaller timeouts. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13705) Fix middle node failed when failed next node and previous.

2020-11-13 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13705:
-

 Summary: Fix middle node failed when failed next node and previous.
 Key: IGNITE-13705
 URL: https://issues.apache.org/jira/browse/IGNITE-13705
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


The discovery ducktape test has detected failure of third node in the middle of 
2 simulateously failed nodes. First research shows the trouble in backward 
connection checking: next node has checked itself:

[2020-11-13 14:50:44,463][INFO ][tcp-disco-sock-reader-[47cc6f70 
10.53.125.224:35381]-#7-#79][org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi1]
 Connection check done 
[liveAddr=tkles-pprb00188.vm.esrt.cloud.sbrf.ru/10.53.125.160:47500, 
previousNode=TcpDiscoveryNode [id=8331a61c-ea93-4bf5-bc8c-b24c032068d0, 
consistentId=tkles-pprb00188.vm.esrt.cloud.sbrf.ru, addrs=ArrayList 
[10.53.125.160], sockAddrs=HashSet 
[tkles-pprb00188.vm.esrt.cloud.sbrf.ru/10.53.125.160:47500], discPort=47500, 
order=1, intOrder=1, lastExchangeTime=1605268203598, loc=false, 
ver=2.10.0#20201113-sha1:, isClient=false], 
addressesToCheck=[tkles-pprb00188.vm.esrt.cloud.sbrf.ru/10.53.125.160:47500], 
connectingNodeId=47cc6f70-9fe4-437d-b183-826f2687aac8]





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13704) Try failuredetectionTimeout==500 in ducktape integration test.

2020-11-13 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13704:
-

 Summary: Try failuredetectionTimeout==500 in ducktape integration 
test.
 Key: IGNITE-13704
 URL: https://issues.apache.org/jira/browse/IGNITE-13704
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Try failuredetectionTimeout==500 in ducktape integration test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13702) Fix description of soLibger for DiscoveryTcpSpi.

2020-11-12 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13702:
-

 Summary: Fix description of soLibger for DiscoveryTcpSpi.
 Key: IGNITE-13702
 URL: https://issues.apache.org/jira/browse/IGNITE-13702
 Project: Ignite
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.10
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin
 Fix For: 2.10


Fix description of soLibger for DiscoveryTcpSpi.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13695) Move javadoc of affection of several addresses on failure detection.

2020-11-11 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13695:
-

 Summary: Move javadoc of affection of several addresses on failure 
detection.
 Key: IGNITE-13695
 URL: https://issues.apache.org/jira/browse/IGNITE-13695
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Current javadoc of affection several node addresses of failure detection is 
located under `TcpDiscoverySpi.setIpFinder()`. Correct place is by 
`TcpDiscoverySpi.setLocalAddress()`.
Perhaps, the test might be slightly changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13666) Disable socket linger in discovery ducktape test.

2020-11-03 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13666:
-

 Summary: Disable socket linger in discovery ducktape test.
 Key: IGNITE-13666
 URL: https://issues.apache.org/jira/browse/IGNITE-13666
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


soLinger might be disabled to fasten the discovery tests. Additionally, we 
could reduce failureDetectionTimeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13663) Represent in the documenttion affection of several node addresses on failure detection v2.

2020-11-03 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13663:
-

 Summary: Represent in the documenttion affection of several node 
addresses on failure detection v2.
 Key: IGNITE-13663
 URL: https://issues.apache.org/jira/browse/IGNITE-13663
 Project: Ignite
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.9
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin
 Fix For: 2.10






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13662) Discribe soLinger setting in TCP Discovery and SSL issues.

2020-11-03 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13662:
-

 Summary: Discribe soLinger setting in TCP Discovery and SSL issues.
 Key: IGNITE-13662
 URL: https://issues.apache.org/jira/browse/IGNITE-13662
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Discribe soLinger setting in TCP Discovery and SSL issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13646) Discovery ducktape test might have setting for socket linger.

2020-10-30 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13646:
-

 Summary: Discovery ducktape test might have setting for socket 
linger.
 Key: IGNITE-13646
 URL: https://issues.apache.org/jira/browse/IGNITE-13646
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Since IGNITE-13643, discovery ducktape test might have additional setting for 
socket linger. This could unveil new issues with the linger and start fixing or 
redeeming tcp discovery settings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13645) Discovery ducktape test should detect failed nodes by asking the cluster.

2020-10-30 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13645:
-

 Summary: Discovery ducktape test should detect failed nodes by 
asking the cluster.
 Key: IGNITE-13645
 URL: https://issues.apache.org/jira/browse/IGNITE-13645
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Discovery ducktape test should measure detection time of failed nodes by asking 
whole rest of the cluster. Currently, we measure by asking only one watching 
node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13644) Close socket bravely.

2020-10-29 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13644:
-

 Summary: Close socket bravely.
 Key: IGNITE-13644
 URL: https://issues.apache.org/jira/browse/IGNITE-13644
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


We should not to wait for socket closing once we finisshed logical connection 
and data exchange. This can violate configured timeouts and detection 
guaranties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13643) Fix long closing of the socker in ServerImpl (TcpDiscoverySpi)

2020-10-29 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13643:
-

 Summary: Fix long closing of the socker in ServerImpl 
(TcpDiscoverySpi)
 Key: IGNITE-13643
 URL: https://issues.apache.org/jira/browse/IGNITE-13643
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Current IgniteUtils.closeQuiet(@Nullable Socket sock) takes about 5sec to close 
socket. Probably it is default soTimeout. This violates node detection failure. 
Despite we set failureDetectionTiemout == 1000, node failure is detected within 
6.5 secs in average. Logging shows delay on socket closing in 
IgniteUtils.closeQuiet(@Nullable Socket sock).

Suggestion: use forced closing, set soLinger=0, do now wait for rest of the 
socket IO. We close socket in TcpDiscoverySpi when we already waited for target 
timeouts and consider connection is lost or invalid. We do not need to wait for 
any traffic on the socket any more.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13641) More logs for debugging DiscoveryTcpSpi

2020-10-29 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13641:
-

 Summary: More logs for debugging DiscoveryTcpSpi
 Key: IGNITE-13641
 URL: https://issues.apache.org/jira/browse/IGNITE-13641
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Logs in DiscoveryTcp (ServerImpl) are insufficient. We do not see actual passed 
timeouts in sockets. It's difficult to realise why the timeouts, awaits 
happened are what they are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13638) Bring log config to ducktape tests

2020-10-28 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13638:
-

 Summary: Bring log config to ducktape tests
 Key: IGNITE-13638
 URL: https://issues.apache.org/jira/browse/IGNITE-13638
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13625) Make network timeout rely on failureDetectionTimeout in TcpDiscovery

2020-10-26 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13625:
-

 Summary: Make network timeout rely on failureDetectionTimeout in 
TcpDiscovery
 Key: IGNITE-13625
 URL: https://issues.apache.org/jira/browse/IGNITE-13625
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13620) Bind ignite node to 1 address in the ducktests

2020-10-23 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13620:
-

 Summary: Bind ignite node to 1 address in the ducktests
 Key: IGNITE-13620
 URL: https://issues.apache.org/jira/browse/IGNITE-13620
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13603) TcpDiscoverySpi seems do not drop network recovery state and it's timer.

2020-10-21 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13603:
-

 Summary: TcpDiscoverySpi seems do not drop network recovery state 
and it's timer.
 Key: IGNITE-13603
 URL: https://issues.apache.org/jira/browse/IGNITE-13603
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


ServerImpl keeps sndState (CrossRingMessageSendState) in its message send 
cycle. Once created with a failure recovery timer, it is not cleared or 
refreshed any more. This may issue instant timeout on next send failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13602) Create discovery node failure test based on network malfunction emulation.

2020-10-21 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13602:
-

 Summary: Create discovery node failure test based on network 
malfunction emulation.
 Key: IGNITE-13602
 URL: https://issues.apache.org/jira/browse/IGNITE-13602
 Project: Ignite
  Issue Type: Task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13282) Fix TcpDiscoveryCoordinatorFailureTest.testClusterFailedNewCoordinatorInitialized()

2020-07-21 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13282:
-

 Summary: Fix 
TcpDiscoveryCoordinatorFailureTest.testClusterFailedNewCoordinatorInitialized()
 Key: IGNITE-13282
 URL: https://issues.apache.org/jira/browse/IGNITE-13282
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13208) Refactoring of IgniteSpiOperationTimeoutHelper

2020-07-02 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13208:
-

 Summary: Refactoring of IgniteSpiOperationTimeoutHelper
 Key: IGNITE-13208
 URL: https://issues.apache.org/jira/browse/IGNITE-13208
 Project: Ignite
  Issue Type: Task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


IgniteSpiOperationTimeoutHelper has many timeout fields. It looks like to get 
simplified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13206) Represent in the doc affection of several node addresses on failure detection.

2020-07-02 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13206:
-

 Summary: Represent in the doc affection of several node addresses 
on failure detection.
 Key: IGNITE-13206
 URL: https://issues.apache.org/jira/browse/IGNITE-13206
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13205) Represent in logs, javadoc affection of several node addresses on failure detection.

2020-07-02 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13205:
-

 Summary: Represent in logs, javadoc affection of several node 
addresses on failure detection.
 Key: IGNITE-13205
 URL: https://issues.apache.org/jira/browse/IGNITE-13205
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Current TcpDiscoverySpi can prolong detection of node failure which has several 
IP addresses. This happens because most of the timeouts like 
failureDetectionTimeout, sockTimeout, ackTimeout works per address. And the 
node addresses are sorted out serially. This affection on failure detection 
should be noted in logs, javadocs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13194) Fix testNodeWithIncompatibleMetadataIsProhibitedToJoinTheCluster()

2020-06-29 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13194:
-

 Summary: Fix 
testNodeWithIncompatibleMetadataIsProhibitedToJoinTheCluster()
 Key: IGNITE-13194
 URL: https://issues.apache.org/jira/browse/IGNITE-13194
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13134) Fix connection recovery timout.

2020-06-08 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13134:
-

 Summary: Fix connection recovery timout.
 Key: IGNITE-13134
 URL: https://issues.apache.org/jira/browse/IGNITE-13134
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.8.1
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


If node experiences connection issues it must establish new connection or fail 
within failureDetectionTimeout + connectionRecoveryTimout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13111) Simplify backward checking of node connection.

2020-06-03 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13111:
-

 Summary: Simplify backward checking of node connection.
 Key: IGNITE-13111
 URL: https://issues.apache.org/jira/browse/IGNITE-13111
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


We should fix several drawbacks in the backward checking of failed node. They 
prolong node failure detection upto: 
ServerImpl.CON_CHECK_INTERVAL + 2 * IgniteConfiguretion.failureDetectionTimeout 
+ 300ms. 

See:
* ‘_NodeFailureResearch.patch_'. It creates test 'FailureDetectionResearch' 
which emulates long answears on a failed node and measures failure detection 
delays.
* '_FailureDetectionResearch.txt_' - results of the test.
* '_FailureDetectionResearch_fixed.txt_' - results of the test after this fix.
* '_WostCaseStepByStep.txt_' - description how the worst case happens.


*Suggestion:*

1) We can simplify backward connection checking as we implement IGNITE-13012. 
Once we get robust, predictable connection ping, we don't need to check 
previous node because we can see whether it sent ping to current node within 
failure detection timeout. If not, previous node can be considered lost.

Instead of:
{code:java}
// Node cannot connect to it's next (for local node it's previous).
// Need to check connectivity to it.
long rcvdTime = lastRingMsgReceivedTime;
long now = U.currentTimeMillis();

// We got message from previous in less than double 
connection check interval.
boolean ok = rcvdTime + effectiveExchangeTimeout() >= 
now;
TcpDiscoveryNode previous = null;

if (ok) {
// Check case when previous node suddenly died. 
This will speed up
// node failing.

  Checking connection to previous node
 }
{code}

2) Then, seems we can remove:
{code:java}
ServerImpl.SocketReader.isConnectionRefused(SocketAddress addr);
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13090) Add parameter of connection check period to TcpDiscoverySpi

2020-05-28 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13090:
-

 Summary: Add parameter of connection check period to 
TcpDiscoverySpi
 Key: IGNITE-13090
 URL: https://issues.apache.org/jira/browse/IGNITE-13090
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


We should add parameter of connection check period to TcpDiscoverySpi. If it 
isn't automatically set by IgniteConfiguration.setFailureDetectionTimeout(), 
user should be able to adjust it. Similar params:


{code:java}
TcpDiscoverySpi.setReconnectCount()
TcpDiscoverySpi.setAckTimeout()
TcpDiscoverySpi.setSocketTimeout()
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13040) Remove unused parameter from TcpDiscoverySpi.writeToSocket()

2020-05-20 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13040:
-

 Summary: Remove unused parameter from 
TcpDiscoverySpi.writeToSocket()
 Key: IGNITE-13040
 URL: https://issues.apache.org/jira/browse/IGNITE-13040
 Project: Ignite
  Issue Type: Task
 Environment: Unused parameter {code:java}TcpDiscoveryAbstractMessage 
msg{code} should be removed from
{code:java}
TcpDiscovery.writeToSocket(Socket sock, TcpDiscoveryAbstractMessage msg, byte[] 
data, long timeout){code}

This method seems to send raw data, not a message.
 
Reporter: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13018) Get rid of duplicated checking of failed node.

2020-05-15 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13018:
-

 Summary: Get rid of duplicated checking of failed node.
 Key: IGNITE-13018
 URL: https://issues.apache.org/jira/browse/IGNITE-13018
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Failed node checking should be simplified to one step: ping node (send a 
message) from previous one in the ring and wait for response within 
IgniteConfiguration.failureDetectionTimeout. If node doesn't respond, we should 
consider it failed. Extra steps of connection checking may seriously delay 
failure detection, bring confusion and weird behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13017) Remove delay of 200ms from re-marking failed node as alive.

2020-05-15 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13017:
-

 Summary: Remove delay of 200ms from re-marking failed node as 
alive.
 Key: IGNITE-13017
 URL: https://issues.apache.org/jira/browse/IGNITE-13017
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


We should remove hardcoded timeout from:

{code:java}
boolean ServerImpl.CrossRingMessageSendState.markLastFailedNodeAlive() {
if (state == RingMessageSendState.FORWARD_PASS || state == 
RingMessageSendState.BACKWARD_PASS) {
   ...

if (--failedNodes <= 0) {
...

state = RingMessageSendState.STARTING_POINT;

try {
Thread.sleep(200);
}
catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}

return true;
}

return false;
}
{code}

This can bring additional 200ms to duration of failed node detection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13016) Remove hardcoded values/timeouts from backward checking of failed node.

2020-05-15 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13016:
-

 Summary: Remove hardcoded values/timeouts from backward checking 
of failed node.
 Key: IGNITE-13016
 URL: https://issues.apache.org/jira/browse/IGNITE-13016
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Backward checking of failed node rely on hardcoced timeout 100ms:

{code:java}
private boolean ServerImpls.isConnectionRefused(SocketAddress addr) {
try (Socket sock = new Socket()) {
sock.connect(addr, 100);
}
catch (ConnectException e) {
return true;
}
catch (IOException e) {
return false;
}

return false;
}
{code}

We should make it bound to configurable params like 
IgniteConfiguration.failureDetectionTimeout




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13015) Use nono time instead of currentMills() in node failure ddetection.

2020-05-15 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13015:
-

 Summary: Use nono time instead of currentMills() in node failure 
ddetection.
 Key: IGNITE-13015
 URL: https://issues.apache.org/jira/browse/IGNITE-13015
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Make sure in node failure detection not used:
{code:java}
System.currentTimeMillis()
and
IgniteUtils.currentTimeMillis()
{code}

Disadventages:

1)  Current system time has no quarantine of strict forward movement. 
System time can be adjusted, synchronized by NTP as example. This can lead to 
incorrect and negative delays.

2)   IgniteUtils.currentTimeMillis() is granulated by 10ms



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13014) Remove long, double checking of node availability. Fix hardcoded values.

2020-05-15 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13014:
-

 Summary: Remove long, double checking of node availability. Fix 
hardcoded values.
 Key: IGNITE-13014
 URL: https://issues.apache.org/jira/browse/IGNITE-13014
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


For the present, we have duplicated checking of node availability. This 
prolongs node failure detection and gives no additional benefits. There are 
mesh and hardcoded values in this routine.
Let's imagine node 2 doesn't answer any more. Node 1 becomes unable to ping 
node 2 and asks Node 3 to establish permanent connection instead of node 2. 
Despite node 2 has been already pinged within configured timeouts, node 3 try 
to connect to node 2 too. 
Disadvantages:
1)  Possible long detection of node failure up to 
ServerImpl.CON_CHECK_INTERVAL + 2 * IgniteConfiguretion.failureDetectionTimeout 
+ 300ms. See ‘WostCase.txt’

2)  Unexpected, not-configurable decision to check availability of previous 
node based on ‘2 * ServerImpl.CON_CHECK_INTERVAL‘:

// We got message from previous in less than double connection check interval.
boolean ok = rcvdTime + CON_CHECK_INTERVAL * 2 >= now; 

If ‘ok == true’ node 3 checks node 2.

3)  Double node checking brings several not-configurable hardcoded delays:
Node 3 checks node 2 with hardcoded timeout 100ms:
ServerImpl.isConnectionRefused():

sock.connect(addr, 100);

Checking availability of previous node considers any exception but 
ConnectionException (connection refused) as existing connection. Even a 
timeout. See ServerImpl.isConnectionRefused().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13012) Make node connection checking rely on the configuration. Simplify node ping routine.

2020-05-14 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-13012:
-

 Summary: Make node connection checking rely on the configuration. 
Simplify node ping routine.
 Key: IGNITE-13012
 URL: https://issues.apache.org/jira/browse/IGNITE-13012
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin



Current noted-to-node connection checking has several drawbacks:
1)  Minimal connection checking interval is not bound to failure detection 
parameters: 
static int ServerImpls.CON_CHECK_INTERVAL = 500;
2)  Connection checking is made as ability of periodical message sending 
(TcpDiscoveryConnectionCheckMessage). It is bound to own time (ServerImpl. 
RingMessageWorker.lastTimeConnCheckMsgSent), not to common time of last sent 
message. This is weird because any discovery message actually checks 
connection. And TpDiscoveryConnectionCheckMessage is just an addition when 
message queue is empty for a long time.
3)  Period of Node-to-Node connection checking can be sometimes shortened 
for strange reason: if no sent or received message appears within 
failureDetectionTimeout. Here, despite we have minimal period of connection 
checking (ServerImpls.CON_CHECK_INTERVAL), we can also send 
TpDiscoveryConnectionCheckMessage before this period exhausted. Moreover, this 
premature node ping relies also on time of last received message. Imagine: if 
node 2 receives no message from node 1 within some time it decides to do extra 
ping node 3 not waiting for regular ping interval. Such behavior makes 
confusion and gives no additional guaranties.
4)  If #3 happens, node writes in the log on INFO: “Local node seems to be 
disconnected from topology …” whereas it is not actually disconnected. User can 
see this message if he typed failureDetectionTimeout < 500ms. I wouldn’t like 
seeing INFO in a log saying a node is might be disconnected. This sounds like 
some troubles raised in network. But not as everything is OK. 

Suggestions:
1)  Make connection check interval be based on failureDetectionTimeout or 
similar params.
2)  Make connection check interval rely on common time of last sent 
message. Not on dedicated time.
3)  Remove additional, random, quickened connection checking.
4)  Do not worry user with “Node disconnected” when everything is OK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12779) Split Ignite and IgniteMXBean, make different behavior of the active(boolean)

2020-03-12 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-12779:
-

 Summary: Split Ignite and IgniteMXBean, make different behavior of 
the active(boolean)
 Key: IGNITE-12779
 URL: https://issues.apache.org/jira/browse/IGNITE-12779
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


To make cluster deactivation through JMX without sudden erasure in-memory data 
we should:

1)  Add _IgniteMXBean#state(String state, boolean force)_.

2)  Let _IgniteMXBean#state(String state)_ and _IgniteMXBean#active(boolean 
active)_  fail when deactivating cluster with in-memory data.

3)  Separate implementations _Ignite_ and _IgniteMXBean_ from 
_IgniteKernal_. They have same method _void active(boolean active)_ which is 
required with different behavior. In case of _Ignite#active(boolean active)_ it 
should not fail when deactivating cluster with in-memory data.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12773) Reduce number of cluster deactivation methods in internal API.

2020-03-11 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-12773:
-

 Summary: Reduce number of cluster deactivation methods in internal 
API.
 Key: IGNITE-12773
 URL: https://issues.apache.org/jira/browse/IGNITE-12773
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


To reduce number of cluster deactivation methods in internal API we might:

1.  Remove
GridClientClusterState#active()

2.  Remove
GridClientClusterState#active(boolean active)

3.  Remove
IGridClusterStateProcessor#changeGlobalState(
boolean activate,
Collection baselineNodes,
boolean forceChangeBaselineTopology
)

4.  Remove
GridClusterStateProcessor#changeGlobalState(
final boolean activate,
Collection baselineNodes,
boolean forceChangeBaselineTopology,
boolean isAutoAdjust
)

5.  Remove
GridClusterStateProcessor#changeGlobalState(
final boolean activate,
Collection baselineNodes,
boolean forceChangeBaselineTopology
)

6.  Remove 
GridClusterStateProcessor#changeGlobalState(
ClusterState state,
boolean forceDeactivation,
Collection baselineNodes,
boolean forceChangeBaselineTopology
)

7.  Add boolean isAutoAdjust to 
IGridClusterStateProcessor#changeGlobalState(
ClusterState state,
boolean forceDeactivation,
Collection baselineNodes,
boolean forceChangeBaselineTopology,
   /* here */ boolean isAutoAdjust /* here */
)

8.  Add @Override to 
/* here */ @Override /* here */
GridClusterStateProcessor#changeGlobalState(
ClusterState state,
boolean forceDeactivation,
Collection baselineNodes,
boolean forceChangeBaselineTopology
)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12704) Fail of recognition of default scheme in SQL queries.

2020-02-19 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-12704:
-

 Summary: Fail of recognition of default scheme in SQL queries.
 Key: IGNITE-12704
 URL: https://issues.apache.org/jira/browse/IGNITE-12704
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin


Got a connectionConnection conn = ...;

// execute() - is just a helper function. Creates prepared statement, pass 
params...

// Get all the tables.
List> lst = execute(conn, "select SCHEMA_NAME, TABLE_NAME from
SYS.TABLES");

for( List row : lst ){
String schemaName = (String)row.get(0);
String tableName = (String)row.get(1);

// Shows: "schema: default, table: PERSON"
System.out.println("schema: " +  schemName + ", table: " +
tableName));

// Fails with with: java.sql.SQLException: Failed to parse query.
Схема "DEFAULT" не найдена
execute( conn, "drop table "+schemaName + "."+tableName+"'" );
}

I think this case should fail with error like "only cache created tables
can be removed with drop table. ", not with "scheme not found."
SQL-engine is supposed to accept and understand values it returns itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12701) Disallow silent deactivation in CLI and REST.

2020-02-19 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-12701:
-

 Summary: Disallow silent deactivation in CLI and REST.
 Key: IGNITE-12701
 URL: https://issues.apache.org/jira/browse/IGNITE-12701
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


Disallow silent deactivation through the CLI and REST. 

Skip JMX call 
{code:java}
void IgniteMXBean#active(boolean active)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12614) Disallow silent deactivation of cluster to prevent in-mem data loss.

2020-01-31 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-12614:
-

 Summary: Disallow silent deactivation of cluster to prevent in-mem 
data loss.
 Key: IGNITE-12614
 URL: https://issues.apache.org/jira/browse/IGNITE-12614
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Steshin


Currently, anyone is able to deactivate cluster with command line utility 
(control.sh). Probably with JMX too. That would lead to data loss when the 
persistence is off. In-memory data is erased during deactivation. Such behavior 
can be considered as unexpected to user. 

Suggestions:

1)  Disallow silent deactivate cluster keeping caches. Show a warning like 
“Your cluster has in-memory cache configured. During deactivation all data from 
these caches will be cleared!”

2)  Add param ‘--force’ which skips the warning message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12606) Parametrize IgniteTxStoreExceptionAbstractSelfTest

2020-01-29 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-12606:
-

 Summary: Parametrize IgniteTxStoreExceptionAbstractSelfTest
 Key: IGNITE-12606
 URL: https://issues.apache.org/jira/browse/IGNITE-12606
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


IgniteTxStoreExceptionAbstractSelfTest seems to fit well the parametrization. 
It has only single depth of sub-tests which are used in one place together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12597) IgniteTxStoreExceptionAbstractSelfTest

2020-01-28 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-12597:
-

 Summary: IgniteTxStoreExceptionAbstractSelfTest
 Key: IGNITE-12597
 URL: https://issues.apache.org/jira/browse/IGNITE-12597
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


org.apache.ignite.internal.processors.cache.GridCacheColocatedTxStoreExceptionSelfTest
 might be parametrized. Extending classes wear only params and are executed in 
a row



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12596) Parametrization of IgniteCacheAbstractExecutionContextTest

2020-01-28 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-12596:
-

 Summary: Parametrization of IgniteCacheAbstractExecutionContextTest
 Key: IGNITE-12596
 URL: https://issues.apache.org/jira/browse/IGNITE-12596
 Project: Ignite
  Issue Type: Sub-task
 Environment: 
org.apache.ignite.internal.processors.cache.context.IgniteCacheAbstractExecutionContextTest
 is activated 3 times with just various params via inheritance. The problem is 
that the extending classes are included in the target test suits not always 
with entire combinations of params. Sometimes only 2 extendins classes are 
involved within tests, sometimes 3. I think of using subclasses of 
IgniteCacheAbstractExecutionContextTest as set of params.
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12595) Parametrization of GridCacheSetAbstractSelfTest

2020-01-28 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-12595:
-

 Summary: Parametrization of GridCacheSetAbstractSelfTest
 Key: IGNITE-12595
 URL: https://issues.apache.org/jira/browse/IGNITE-12595
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


org.apache.ignite.internal.processors.cache.datastructures.GridCacheSetAbstractSelfTest
 might be used with params. Not the best candidate, but is still able to reduce 
tests code base.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12583) Parametrization of JdbcThinBulkLoadAbstractSelfTest

2020-01-27 Thread Vladimir Steshin (Jira)
Vladimir Steshin created IGNITE-12583:
-

 Summary: Parametrization of JdbcThinBulkLoadAbstractSelfTest
 Key: IGNITE-12583
 URL: https://issues.apache.org/jira/browse/IGNITE-12583
 Project: Ignite
  Issue Type: Sub-task
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin


org.apache.ignite.jdbc.thin.JdbcThinBulkLoadAbstractSelfTest is extended 
several times using just parameter-assigning-getters like 

{code:java}
protected CacheMode cacheMode() { return CacheMode.REPLICATED; }
protected CacheAtomicityMode atomicityMode() { return 
CacheAtomicityMode.TRANSACTIONAL;}
protected boolean nearCache() { return false; }
{code}

Should go with params instead.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)