[jira] [Commented] (GIRAPH-1139) Resuming from checkpoint doesn't work
[ https://issues.apache.org/jira/browse/GIRAPH-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979370#comment-15979370 ] ASF GitHub Bot commented on GIRAPH-1139: Github user majakabiljo commented on the issue: https://github.com/apache/giraph/pull/30 Can we get rid of getHostnamePartitionId() to avoid incorrectly using it in the future? I see various other places where taskPartition is used for identifier, do any of them need to be updated too? > Resuming from checkpoint doesn't work > - > > Key: GIRAPH-1139 > URL: https://issues.apache.org/jira/browse/GIRAPH-1139 > Project: Giraph > Issue Type: Bug > Components: bsp >Affects Versions: 1.2.0 >Reporter: Nic Eggert > > I ran into a couple of issues when trying to get Giraph to resume from > checkpoints (using mapreduce.max.attempts rather than GiraphJobRetryChecker). > * If we just wrote a checkpoint, the master expects the workers to checkpoint > again, while the workers (correctly) clear the checkpointing flag. > * When workers restart, they take their task id from the partition number, > which stays the same across multiple attempts. This gets transferred to the > Netty clientId, and the server starts ignoring messages from restarted > workers because it thinks it processed them already. > I believe I've fixed these issues. I'll send a GitHub PR shortly. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GIRAPH-1144) Out-of-core control flow sends messages after Netty shuts down
[ https://issues.apache.org/jira/browse/GIRAPH-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979192#comment-15979192 ] Hassan Eslami commented on GIRAPH-1144: --- This is the "resume signal" sent after the Netty shuts down. This is a harmless corner case, we can add a few lines to avoid this. When Netty shuts down, the computation is done too, so even if we don't send the "resume signal" things are all good. > Out-of-core control flow sends messages after Netty shuts down > -- > > Key: GIRAPH-1144 > URL: https://issues.apache.org/jira/browse/GIRAPH-1144 > Project: Giraph > Issue Type: Bug >Reporter: Dionysios Logothetis >Assignee: Dionysios Logothetis > > java.util.concurrent.RejectedExecutionException: event executor terminated > at > io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:703) > at > io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:296) > at > io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:691) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:415) > at > io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:60) > at > io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:48) > at > io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:64) > at > io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:305) > at io.netty.bootstrap.Bootstrap.doConnect(Bootstrap.java:133) > at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:115) > at > org.apache.giraph.comm.netty.NettyClient.getNextChannel(NettyClient.java:714) > at > org.apache.giraph.comm.netty.NettyClient.writeRequestToChannel(NettyClient.java:799) > at org.apache.giraph.comm.netty.NettyClient.doSend(NettyClient.java:789) > at > org.apache.giraph.comm.flow_control.CreditBasedFlowControl.sendResumeSignal(CreditBasedFlowControl.java:273) > at > org.apache.giraph.comm.flow_control.CreditBasedFlowControl.access$200(CreditBasedFlowControl.java:77) > at > org.apache.giraph.comm.flow_control.CreditBasedFlowControl$1.run(CreditBasedFlowControl.java:219) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GIRAPH-1144) Out-of-core control flow sends messages after Netty shuts down
[ https://issues.apache.org/jira/browse/GIRAPH-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979163#comment-15979163 ] Dionysios Logothetis commented on GIRAPH-1144: -- Either the netty client must wait until control-flow sends all unsent messages, or control-flow stops sending when the netty client stops. If the other side doesn't depend on receiving pending control-flow messages, we can shut it down as soon as netty stops. > Out-of-core control flow sends messages after Netty shuts down > -- > > Key: GIRAPH-1144 > URL: https://issues.apache.org/jira/browse/GIRAPH-1144 > Project: Giraph > Issue Type: Bug >Reporter: Dionysios Logothetis >Assignee: Dionysios Logothetis > > java.util.concurrent.RejectedExecutionException: event executor terminated > at > io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:703) > at > io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:296) > at > io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:691) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:415) > at > io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:60) > at > io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:48) > at > io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:64) > at > io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:305) > at io.netty.bootstrap.Bootstrap.doConnect(Bootstrap.java:133) > at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:115) > at > org.apache.giraph.comm.netty.NettyClient.getNextChannel(NettyClient.java:714) > at > org.apache.giraph.comm.netty.NettyClient.writeRequestToChannel(NettyClient.java:799) > at org.apache.giraph.comm.netty.NettyClient.doSend(NettyClient.java:789) > at > org.apache.giraph.comm.flow_control.CreditBasedFlowControl.sendResumeSignal(CreditBasedFlowControl.java:273) > at > org.apache.giraph.comm.flow_control.CreditBasedFlowControl.access$200(CreditBasedFlowControl.java:77) > at > org.apache.giraph.comm.flow_control.CreditBasedFlowControl$1.run(CreditBasedFlowControl.java:219) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (GIRAPH-1144) Out-of-core control flow sends messages after Netty shuts down
[ https://issues.apache.org/jira/browse/GIRAPH-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dionysios Logothetis reassigned GIRAPH-1144: Assignee: Dionysios Logothetis Description: java.util.concurrent.RejectedExecutionException: event executor terminated at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:703) at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:296) at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:691) at io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:415) at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:60) at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:48) at io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:64) at io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:305) at io.netty.bootstrap.Bootstrap.doConnect(Bootstrap.java:133) at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:115) at org.apache.giraph.comm.netty.NettyClient.getNextChannel(NettyClient.java:714) at org.apache.giraph.comm.netty.NettyClient.writeRequestToChannel(NettyClient.java:799) at org.apache.giraph.comm.netty.NettyClient.doSend(NettyClient.java:789) at org.apache.giraph.comm.flow_control.CreditBasedFlowControl.sendResumeSignal(CreditBasedFlowControl.java:273) at org.apache.giraph.comm.flow_control.CreditBasedFlowControl.access$200(CreditBasedFlowControl.java:77) at org.apache.giraph.comm.flow_control.CreditBasedFlowControl$1.run(CreditBasedFlowControl.java:219) at java.lang.Thread.run(Thread.java:745) > Out-of-core control flow sends messages after Netty shuts down > -- > > Key: GIRAPH-1144 > URL: https://issues.apache.org/jira/browse/GIRAPH-1144 > Project: Giraph > Issue Type: Bug >Reporter: Dionysios Logothetis >Assignee: Dionysios Logothetis > > java.util.concurrent.RejectedExecutionException: event executor terminated > at > io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:703) > at > io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:296) > at > io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:691) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:415) > at > io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:60) > at > io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:48) > at > io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:64) > at > io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:305) > at io.netty.bootstrap.Bootstrap.doConnect(Bootstrap.java:133) > at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:115) > at > org.apache.giraph.comm.netty.NettyClient.getNextChannel(NettyClient.java:714) > at > org.apache.giraph.comm.netty.NettyClient.writeRequestToChannel(NettyClient.java:799) > at org.apache.giraph.comm.netty.NettyClient.doSend(NettyClient.java:789) > at > org.apache.giraph.comm.flow_control.CreditBasedFlowControl.sendResumeSignal(CreditBasedFlowControl.java:273) > at > org.apache.giraph.comm.flow_control.CreditBasedFlowControl.access$200(CreditBasedFlowControl.java:77) > at > org.apache.giraph.comm.flow_control.CreditBasedFlowControl$1.run(CreditBasedFlowControl.java:219) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (GIRAPH-1144) Out-of-core control flow sends messages after Netty shuts down
Dionysios Logothetis created GIRAPH-1144: Summary: Out-of-core control flow sends messages after Netty shuts down Key: GIRAPH-1144 URL: https://issues.apache.org/jira/browse/GIRAPH-1144 Project: Giraph Issue Type: Bug Reporter: Dionysios Logothetis -- This message was sent by Atlassian JIRA (v6.3.15#6346)