Hi,

I am trying to run distributed parameter server training with one server 
and 5 workers. 
I am trying to train the MNIST dataset using an SGD optimizer. It gives me 
the following error on the workers.

2022-04-18 11:08:51.638470: E 
tensorflow/core/common_runtime/eager/context_distributed_manager.cc:486] 
Connection reset by peer
Additional GRPC error information from remote target 
/job:ps/replica:0/task:0:
:{"created":"@1650301731.638226382","description":"Error received from peer 
ipv4:192.168.1.1:12341","file":"external/com_github_grpc_grpc/src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Connection
 
reset by peer","grpc_status":14}
E0418 11:08:51.639363565   31165 completion_queue.cc:244]    assertion 
failed: queue.num_items() == 0

Can you please help me with this?

Thank you,
Paridhika


-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/bf7f9fcb-fa66-462f-9691-d837b8210bc4n%40googlegroups.com.

Reply via email to