Could you submit a job when you set the job manager address to "localhost"?
I did not see any logging statements of received jobs. If you did, could
you also send the logs of the client?

The 0.0.0.0 to which the BlobServer binds works for me on my machine. I
cannot remember that we had problems with that before. But I agree, we
should set it to the network interface which the JobManager uses.

I cannot explain why your fix solves the problem. It does not touch any of
the JobClient/JobManager logic.

I updated my local branch [1] with a fix for the BlobServer. Could you try
it out again and send us the logs? Thanks a lot for your help Dulaj.

On Thu, Mar 5, 2015 at 1:24 PM, Dulaj Viduranga <vidura...@icloud.com>
wrote:

> But can you explain why did my fix solved it?
>
> > On Mar 5, 2015, at 5:50 PM, Stephan Ewen <se...@apache.org> wrote:
> >
> > Hi Dulaj!
> >
> > Okay, the logs give us some insight. Both setups seem to look good in
> terms
> > of TaskManager and JobManager startup.
> >
> > In one of the logs (127.0.0.1) you submit a job. The job fails because
> the
> > TaskManager cannot grab the JAR file from the JobManager.
> > I think the problem is that the BLOB server binds to 0.0.0.0 - it should
> > bind to the same address as the JobManager actor system.
> >
> > That should definitely be changed...
> >
> > On Thu, Mar 5, 2015 at 10:08 AM, Dulaj Viduranga <vidura...@icloud.com>
> > wrote:
> >
> >> Hi,
> >> This is the log with setting “localhost”
> >> flink-Vidura-jobmanager-localhost.log <
> >>
> https://gist.github.com/viduranga/e9d43521587697de3eb5#file-flink-vidura-jobmanager-localhost-log
> >>>
> >>
> >> And this is the log with setting “127.0.0.1”
> >> flink-Vidura-jobmanager-localhost.log <
> >>
> https://gist.github.com/viduranga/5af6b05f204e1f4b344f#file-flink-vidura-jobmanager-localhost-log
> >>>
> >>
> >>> On Mar 5, 2015, at 2:23 PM, Till Rohrmann <trohrm...@apache.org>
> wrote:
> >>>
> >>> What does the jobmanager log says? I think Stephan added some more
> >> logging
> >>> output which helps us to debug this problem.
> >>>
> >>> On Thu, Mar 5, 2015 at 9:36 AM, Dulaj Viduranga <vidura...@icloud.com>
> >>> wrote:
> >>>
> >>>> Using start-locat.sh.
> >>>> I’m using the original config yaml. I also tried changing jobmanager
> >>>> address in config to “127.0.0.1 but no luck. With my changes it works
> >> ok.
> >>>> The conf file follows.
> >>>>
> >>>>
> >>>>
> >>
> ################################################################################
> >>>> #  Licensed to the Apache Software Foundation (ASF) under one
> >>>> #  or more contributor license agreements.  See the NOTICE file
> >>>> #  distributed with this work for additional information
> >>>> #  regarding copyright ownership.  The ASF licenses this file
> >>>> #  to you under the Apache License, Version 2.0 (the
> >>>> #  "License"); you may not use this file except in compliance
> >>>> #  with the License.  You may obtain a copy of the License at
> >>>> #
> >>>> #      http://www.apache.org/licenses/LICENSE-2.0
> >>>> #
> >>>> #  Unless required by applicable law or agreed to in writing, software
> >>>> #  distributed under the License is distributed on an "AS IS" BASIS,
> >>>> #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> >> implied.
> >>>> #  See the License for the specific language governing permissions and
> >>>> # limitations under the License.
> >>>>
> >>>>
> >>
> ################################################################################
> >>>>
> >>>>
> >>>>
> >>>>
> >>
> #==============================================================================
> >>>> # Common
> >>>>
> >>>>
> >>
> #==============================================================================
> >>>>
> >>>> jobmanager.rpc.address: 127.0.0.1
> >>>>
> >>>> jobmanager.rpc.port: 6123
> >>>>
> >>>> jobmanager.heap.mb: 256
> >>>>
> >>>> taskmanager.heap.mb: 512
> >>>>
> >>>> taskmanager.numberOfTaskSlots: 1
> >>>>
> >>>> parallelization.degree.default: 1
> >>>>
> >>>>
> >>>>
> >>
> #==============================================================================
> >>>> # Web Frontend
> >>>>
> >>>>
> >>
> #==============================================================================
> >>>>
> >>>> # The port under which the web-based runtime monitor listens.
> >>>> # A value of -1 deactivates the web server.
> >>>>
> >>>> jobmanager.web.port: 8081
> >>>>
> >>>> # The port uder which the standalone web client
> >>>> # (for job upload and submit) listens.
> >>>>
> >>>> webclient.port: 8080
> >>>>
> >>>>
> >>>>
> >>
> #==============================================================================
> >>>> # Advanced
> >>>>
> >>>>
> >>
> #==============================================================================
> >>>>
> >>>> # The number of buffers for the network stack.
> >>>> #
> >>>> # taskmanager.network.numberOfBuffers: 2048
> >>>>
> >>>> # Directories for temporary files.
> >>>> #
> >>>> # Add a delimited list for multiple directories, using the system
> >> directory
> >>>> # delimiter (colon ':' on unix) or a comma, e.g.:
> >>>> #     /data1/tmp:/data2/tmp:/data3/tmp
> >>>> #
> >>>> # Note: Each directory entry is read from and written to by a
> different
> >> I/O
> >>>> # thread. You can include the same directory multiple times in order
> to
> >>>> create
> >>>> # multiple I/O threads against that directory. This is for example
> >>>> relevant for
> >>>> # high-throughput RAIDs.
> >>>> #
> >>>> # If not specified, the system-specific Java temporary directory
> >>>> (java.io.tmpdir
> >>>> # property) is taken.
> >>>> #
> >>>> # taskmanager.tmp.dirs: /tmp
> >>>>
> >>>> # Path to the Hadoop configuration directory.
> >>>> #
> >>>> # This configuration is used when writing into HDFS. Unless specified
> >>>> otherwise,
> >>>> # HDFS file creation will use HDFS default settings with respect to
> >>>> block-size,
> >>>> # replication factor, etc.
> >>>> #
> >>>> # You can also directly specify the paths to hdfs-default.xml and
> >>>> hdfs-site.xml
> >>>> # via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'.
> >>>> #
> >>>> # fs.hdfs.hadoopconf: /path/to/hadoop/conf/
> >>>>
> >>>>
> >>>>> On Mar 5, 2015, at 2:03 PM, Till Rohrmann <trohrm...@apache.org>
> >> wrote:
> >>>>>
> >>>>> How did you start the flink cluster? Using the start-local.sh, the
> >>>>> start-cluster.sh or starting the job manager and task managers
> >>>> individually
> >>>>> using taskmanager.sh/jobmanager.sh. Could you maybe post the
> >>>>> flink-conf.yaml file, you're using?
> >>>>>
> >>>>> With your changes, everything works, right?
> >>>>>
> >>>>> On Thu, Mar 5, 2015 at 8:55 AM, Dulaj Viduranga <
> vidura...@icloud.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Till,
> >>>>>> I’m sorry. It doesn’t seem to solve the problem. The taskmanager
> still
> >>>>>> tries a 10.0.0.0/8 IP.
> >>>>>>
> >>>>>> Best regards.
> >>>>>>
> >>>>>>> On Mar 5, 2015, at 1:00 PM, Till Rohrmann <till.rohrm...@gmail.com
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi Dulaj,
> >>>>>>>
> >>>>>>> I looked through your commit and noticed that the JobClient might
> not
> >>>> be
> >>>>>>> listening on the right network interface. Your commit seems to fix
> >> it.
> >>>> I
> >>>>>>> just want to understand the problem properly and therefore I
> opened a
> >>>>>>> branch with a small change. Could you try out whether this change
> >> would
> >>>>>>> also fix your problem? You can find the code here [1]. Would be
> >> awesome
> >>>>>> if
> >>>>>>> you checked it out and let it run on your cluster setting. Thanks a
> >> lot
> >>>>>>> Dulaj!
> >>>>>>>
> >>>>>>> [1]
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://github.com/tillrohrmann/flink/tree/fixLocalFlinkMiniClusterJobClient
> >>>>>>>
> >>>>>>> On Thu, Mar 5, 2015 at 4:21 AM, Dulaj Viduranga <
> >> vidura...@icloud.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> The every change in the commit b7da22a is not required but I
> thought
> >>>>>> they
> >>>>>>>> are appropriate.
> >>>>>>>>
> >>>>>>>>> On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga <
> vidura...@icloud.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>> I found many other places “localhost” is hard coded. I changed
> them
> >>>> in
> >>>>>> a
> >>>>>>>> better way I think. I made a pull request. Please review. b7da22a
> <
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On Mar 4, 2015, at 8:17 PM, Stephan Ewen <se...@apache.org>
> >> wrote:
> >>>>>>>>>>
> >>>>>>>>>> If I recall correctly, we only hardcode "localhost" in the local
> >>>> mini
> >>>>>>>>>> cluster - do you think it is problematic there as well?
> >>>>>>>>>>
> >>>>>>>>>> Have you found any other places?
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga <
> >>>>>> vidura...@icloud.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> In some places of the code, "localhost" is hard coded. When it
> is
> >>>>>>>> resolved
> >>>>>>>>>>> by the DNS, it is posible to be directed  to a different IP
> other
> >>>>>> than
> >>>>>>>>>>> 127.0.0.1 (like private range 10.0.0.0/8). I changed those
> >> places
> >>>> to
> >>>>>>>>>>> 127.0.0.1 and it works like a charm.
> >>>>>>>>>>> But hard coding 127.0.0.1 is not a good option because when the
> >>>>>>>> jobmanager
> >>>>>>>>>>> ip is changed, this becomes an issue again. I'm thinking of
> >> setting
> >>>>>>>>>>> jobmanager ip from the config.yaml to these places.
> >>>>>>>>>>> If you have a better idea on doing this with your experience,
> >>>> please
> >>>>>>>> let
> >>>>>>>>>>> me know.
> >>>>>>>>>>>
> >>>>>>>>>>> Best.
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Reply via email to