[
https://issues.apache.org/jira/browse/IGNITE-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589913#comment-16589913
]
ASF GitHub Bot commented on IGNITE-9338:
----------------------------------------
GitHub user dmitrievanthony opened a pull request:
https://github.com/apache/ignite/pull/4601
IGNITE-9338 Add connection data int env variables of TensorFlow worker
processes
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gridgain/apache-ignite ignite-9338
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/ignite/pull/4601.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4601
----
commit 597ca635203f1cbf77504a5e18519f45abea73e3
Author: Anton Dmitriev <dmitrievanthony@...>
Date: 2018-08-22T13:12:01Z
IGNITE-9338 Pass Ignite dataset host and port into Python processes.
commit fe4bb1f04e95bf8c09c17eeed51bbf2cde696510
Author: Anton Dmitriev <dmitrievanthony@...>
Date: 2018-08-22T14:02:33Z
IGNITE-9338 Pass Ignite dataset host and port into Python processes.
commit 18a936a1acd28eb0ae95ed0127a3874e8165ba7c
Author: Anton Dmitriev <dmitrievanthony@...>
Date: 2018-08-22T14:04:12Z
IGNITE-9338 Pass Ignite dataset host and port into Python processes.
----
> ML TF integration: tf cluster can't connect after killing first node with
> default port 10800
> --------------------------------------------------------------------------------------------
>
> Key: IGNITE-9338
> URL: https://issues.apache.org/jira/browse/IGNITE-9338
> Project: Ignite
> Issue Type: Bug
> Components: ml
> Reporter: Stepan Pilschikov
> Assignee: Anton Dmitriev
> Priority: Major
> Labels: tf-integration
>
> Case:
> - Run cluster with 3 node on 1 host
> - Filling caches with data
> - Running python script
> - Killing lead node with port 10800 with chief + user_script processes
> Expect:
> - chief and user_script restarted on other node
> - script rerun
> Actual:
> - chief and user_secript restarted on other node but started to crash and run
> again because can't connect to default 10800 port
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)