[jira] [Commented] (GIRAPH-1139) Resuming from checkpoint doesn't work

2017-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979370#comment-15979370
 ] 

ASF GitHub Bot commented on GIRAPH-1139:


Github user majakabiljo commented on the issue:

https://github.com/apache/giraph/pull/30
  
Can we get rid of getHostnamePartitionId() to avoid incorrectly using it in 
the future? I see various other places where taskPartition is used for 
identifier, do any of them need to be updated too?


> Resuming from checkpoint doesn't work
> -
>
> Key: GIRAPH-1139
> URL: https://issues.apache.org/jira/browse/GIRAPH-1139
> Project: Giraph
>  Issue Type: Bug
>  Components: bsp
>Affects Versions: 1.2.0
>Reporter: Nic Eggert
>
> I ran into a couple of issues when trying to get Giraph to resume from 
> checkpoints (using mapreduce.max.attempts rather than GiraphJobRetryChecker).
> * If we just wrote a checkpoint, the master expects the workers to checkpoint 
> again, while the workers (correctly) clear the checkpointing flag.
> * When workers restart, they take their task id from the partition number, 
> which stays the same across multiple attempts. This gets transferred to the 
> Netty clientId, and the server starts ignoring messages from restarted 
> workers because it thinks it processed them already.
> I believe I've fixed these issues. I'll send a GitHub PR shortly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GIRAPH-1144) Out-of-core control flow sends messages after Netty shuts down

2017-04-21 Thread Hassan Eslami (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979192#comment-15979192
 ] 

Hassan Eslami commented on GIRAPH-1144:
---

This is the "resume signal" sent after the Netty shuts down. This is a harmless 
corner case, we can add a few lines to avoid this. When Netty shuts down, the 
computation is done too, so even if we don't send the "resume signal" things 
are all good.

> Out-of-core control flow sends messages after Netty shuts down
> --
>
> Key: GIRAPH-1144
> URL: https://issues.apache.org/jira/browse/GIRAPH-1144
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dionysios Logothetis
>Assignee: Dionysios Logothetis
>
> java.util.concurrent.RejectedExecutionException: event executor terminated
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:703)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:296)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:691)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:415)
> at 
> io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:60)
> at 
> io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:48)
> at 
> io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:64)
> at 
> io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:305)
> at io.netty.bootstrap.Bootstrap.doConnect(Bootstrap.java:133)
> at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:115)
> at 
> org.apache.giraph.comm.netty.NettyClient.getNextChannel(NettyClient.java:714)
> at 
> org.apache.giraph.comm.netty.NettyClient.writeRequestToChannel(NettyClient.java:799)
> at org.apache.giraph.comm.netty.NettyClient.doSend(NettyClient.java:789)
> at 
> org.apache.giraph.comm.flow_control.CreditBasedFlowControl.sendResumeSignal(CreditBasedFlowControl.java:273)
> at 
> org.apache.giraph.comm.flow_control.CreditBasedFlowControl.access$200(CreditBasedFlowControl.java:77)
> at 
> org.apache.giraph.comm.flow_control.CreditBasedFlowControl$1.run(CreditBasedFlowControl.java:219)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GIRAPH-1144) Out-of-core control flow sends messages after Netty shuts down

2017-04-21 Thread Dionysios Logothetis (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979163#comment-15979163
 ] 

Dionysios Logothetis commented on GIRAPH-1144:
--

Either the netty client must wait until control-flow sends all unsent messages, 
or control-flow stops sending when the netty client stops.  If the other side 
doesn't depend on receiving pending control-flow messages, we can shut it down 
as soon as netty stops.

> Out-of-core control flow sends messages after Netty shuts down
> --
>
> Key: GIRAPH-1144
> URL: https://issues.apache.org/jira/browse/GIRAPH-1144
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dionysios Logothetis
>Assignee: Dionysios Logothetis
>
> java.util.concurrent.RejectedExecutionException: event executor terminated
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:703)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:296)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:691)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:415)
> at 
> io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:60)
> at 
> io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:48)
> at 
> io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:64)
> at 
> io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:305)
> at io.netty.bootstrap.Bootstrap.doConnect(Bootstrap.java:133)
> at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:115)
> at 
> org.apache.giraph.comm.netty.NettyClient.getNextChannel(NettyClient.java:714)
> at 
> org.apache.giraph.comm.netty.NettyClient.writeRequestToChannel(NettyClient.java:799)
> at org.apache.giraph.comm.netty.NettyClient.doSend(NettyClient.java:789)
> at 
> org.apache.giraph.comm.flow_control.CreditBasedFlowControl.sendResumeSignal(CreditBasedFlowControl.java:273)
> at 
> org.apache.giraph.comm.flow_control.CreditBasedFlowControl.access$200(CreditBasedFlowControl.java:77)
> at 
> org.apache.giraph.comm.flow_control.CreditBasedFlowControl$1.run(CreditBasedFlowControl.java:219)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (GIRAPH-1144) Out-of-core control flow sends messages after Netty shuts down

2017-04-21 Thread Dionysios Logothetis (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dionysios Logothetis reassigned GIRAPH-1144:


   Assignee: Dionysios Logothetis
Description: 
java.util.concurrent.RejectedExecutionException: event executor terminated
at 
io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:703)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:296)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:691)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:415)
at 
io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:60)
at 
io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:48)
at 
io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:64)
at 
io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:305)
at io.netty.bootstrap.Bootstrap.doConnect(Bootstrap.java:133)
at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:115)
at org.apache.giraph.comm.netty.NettyClient.getNextChannel(NettyClient.java:714)
at 
org.apache.giraph.comm.netty.NettyClient.writeRequestToChannel(NettyClient.java:799)
at org.apache.giraph.comm.netty.NettyClient.doSend(NettyClient.java:789)
at 
org.apache.giraph.comm.flow_control.CreditBasedFlowControl.sendResumeSignal(CreditBasedFlowControl.java:273)
at 
org.apache.giraph.comm.flow_control.CreditBasedFlowControl.access$200(CreditBasedFlowControl.java:77)
at 
org.apache.giraph.comm.flow_control.CreditBasedFlowControl$1.run(CreditBasedFlowControl.java:219)
at java.lang.Thread.run(Thread.java:745)

> Out-of-core control flow sends messages after Netty shuts down
> --
>
> Key: GIRAPH-1144
> URL: https://issues.apache.org/jira/browse/GIRAPH-1144
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dionysios Logothetis
>Assignee: Dionysios Logothetis
>
> java.util.concurrent.RejectedExecutionException: event executor terminated
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:703)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:296)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:691)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:415)
> at 
> io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:60)
> at 
> io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:48)
> at 
> io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:64)
> at 
> io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:305)
> at io.netty.bootstrap.Bootstrap.doConnect(Bootstrap.java:133)
> at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:115)
> at 
> org.apache.giraph.comm.netty.NettyClient.getNextChannel(NettyClient.java:714)
> at 
> org.apache.giraph.comm.netty.NettyClient.writeRequestToChannel(NettyClient.java:799)
> at org.apache.giraph.comm.netty.NettyClient.doSend(NettyClient.java:789)
> at 
> org.apache.giraph.comm.flow_control.CreditBasedFlowControl.sendResumeSignal(CreditBasedFlowControl.java:273)
> at 
> org.apache.giraph.comm.flow_control.CreditBasedFlowControl.access$200(CreditBasedFlowControl.java:77)
> at 
> org.apache.giraph.comm.flow_control.CreditBasedFlowControl$1.run(CreditBasedFlowControl.java:219)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (GIRAPH-1144) Out-of-core control flow sends messages after Netty shuts down

2017-04-21 Thread Dionysios Logothetis (JIRA)
Dionysios Logothetis created GIRAPH-1144:


 Summary: Out-of-core control flow sends messages after Netty shuts 
down
 Key: GIRAPH-1144
 URL: https://issues.apache.org/jira/browse/GIRAPH-1144
 Project: Giraph
  Issue Type: Bug
Reporter: Dionysios Logothetis






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)