[ https://issues.apache.org/jira/browse/TOREE-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902999#comment-15902999 ]
Jim Rhyness commented on TOREE-391: ----------------------------------- I have coded a fix, changing the ids at the protocol level to be Array[Byte] and changing some code under communication that deals with zmq. I'm not entirely sure that's the best solution, but I'll look to creating a pr with that. > Messages to Jupyter kernel gateway are dropped in jeromq > -------------------------------------------------------- > > Key: TOREE-391 > URL: https://issues.apache.org/jira/browse/TOREE-391 > Project: TOREE > Issue Type: Bug > Affects Versions: 0.1.0 > Environment: Linux ( RHEL 7.3 ) > Reporter: Jim Rhyness > Labels: newbie > > Kernel restart from Jupyter kernel gateway is failing with a timeout. The > kernel is restarted, but kernel gateway times out waiting for a > kernel_info_reply message that it is > expecting in response to kernel_info_request that it sends after initiating > the restart. > The problem is reproducible most of the time with something like this: > curl -v -X POST --data '{ "name":"apache_toree_scala" }' > http://127.0.0.1:8888/api/kernels > curl -v -X POST --data '{}' > http://127.0.0.1:8888/api/kernels/<kernelid-from-above>/restart > From the IPython message protocol doc, this is the message format: > [ > b'u-u-i-d', # zmq identity(ies) > b'<IDS|MSG>', # delimiter > b'baddad42', # HMAC signature > b'{header}', # serialized header dict > b'{parent_header}', # serialized parent header dict > b'{metadata}', # serialized metadata dict > b'{content}, # serialized content dict > b'blob', # extra raw data buffer(s) > ... > ] > The first frame of the message contains zmq identities which, in some cases > in a Router-type socket, are generated by jeromq and then consist of five > bytes - 0 followed by a random int. > In Toree, all frames are treated as Strings. Conversion to UTF-8 corrupts > the zmq id, replacing non-UTF-8 characters by the replacement character > 0xEFBFBD. > When the corrupted id is used in a message sent to the Router socket, the > peer to send the message to is not found and the message is dropped. > This affects other messages as well, not just kernel_info_reply. -- This message was sent by Atlassian JIRA (v6.3.15#6346)