[jira] [Commented] (IGNITE-9338) ML TF integration: tf cluster can't connect after killing first node with default port 10800

ASF GitHub Bot (JIRA) Thu, 23 Aug 2018 01:41:08 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589913#comment-16589913
 ]


ASF GitHub Bot commented on IGNITE-9338:
----------------------------------------

GitHub user dmitrievanthony opened a pull request:

    https://github.com/apache/ignite/pull/4601

    IGNITE-9338 Add connection data int env variables of TensorFlow worker 
processes

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gridgain/apache-ignite ignite-9338

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/ignite/pull/4601.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4601
    
----
commit 597ca635203f1cbf77504a5e18519f45abea73e3
Author: Anton Dmitriev <dmitrievanthony@...>
Date:   2018-08-22T13:12:01Z

    IGNITE-9338 Pass Ignite dataset host and port into Python processes.

commit fe4bb1f04e95bf8c09c17eeed51bbf2cde696510
Author: Anton Dmitriev <dmitrievanthony@...>
Date:   2018-08-22T14:02:33Z

    IGNITE-9338 Pass Ignite dataset host and port into Python processes.

commit 18a936a1acd28eb0ae95ed0127a3874e8165ba7c
Author: Anton Dmitriev <dmitrievanthony@...>
Date:   2018-08-22T14:04:12Z

    IGNITE-9338 Pass Ignite dataset host and port into Python processes.

----


> ML TF integration: tf cluster can't connect after killing first node with 
> default port 10800
> --------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-9338
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9338
>             Project: Ignite
>          Issue Type: Bug
>          Components: ml
>            Reporter: Stepan Pilschikov
>            Assignee: Anton Dmitriev
>            Priority: Major
>              Labels: tf-integration
>
> Case: 
> - Run cluster with 3 node on 1 host
> - Filling caches with data
> - Running python script
> - Killing lead node with port 10800 with chief + user_script processes
> Expect:
> - chief and user_script restarted on other node
> - script rerun
> Actual:
> - chief and user_secript restarted on other node but started to crash and run 
> again because can't connect to default 10800 port



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (IGNITE-9338) ML TF integration: tf cluster can't connect after killing first node with default port 10800

Reply via email to