Hi, I am trying to run distributed parameter server training with one server and 5 workers. I am trying to train the MNIST dataset using an SGD optimizer. It gives me the following error on the workers.
2022-04-18 11:08:51.638470: E tensorflow/core/common_runtime/eager/context_distributed_manager.cc:486] Connection reset by peer Additional GRPC error information from remote target /job:ps/replica:0/task:0: :{"created":"@1650301731.638226382","description":"Error received from peer ipv4:192.168.1.1:12341","file":"external/com_github_grpc_grpc/src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Connection reset by peer","grpc_status":14} E0418 11:08:51.639363565 31165 completion_queue.cc:244] assertion failed: queue.num_items() == 0 Can you please help me with this? Thank you, Paridhika -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/bf7f9fcb-fa66-462f-9691-d837b8210bc4n%40googlegroups.com.