Jim Rhyness created TOREE-391: --------------------------------- Summary: Messages to Jupyter kernel gateway are dropped in jeromq Key: TOREE-391 URL: https://issues.apache.org/jira/browse/TOREE-391 Project: TOREE Issue Type: Bug Affects Versions: 0.1.0 Environment: Linux ( RHEL 7.3 ) Reporter: Jim Rhyness
Kernel restart from Jupyter kernel gateway is failing with a timeout. The kernel is restarted, but kernel gateway times out waiting for a kernel_info_reply message that it is expecting in response to kernel_info_request that it sends after initiating the restart. The problem is reproducible most of the time with something like this: curl -v -X POST --data '{ "name":"apache_toree_scala" }' http://127.0.0.1:8888/api/kernels curl -v -X POST --data '{}' http://127.0.0.1:8888/api/kernels/<kernelid-from-above>/restart >From the IPython message protocol doc, this is the message format: [ b'u-u-i-d', # zmq identity(ies) b'<IDS|MSG>', # delimiter b'baddad42', # HMAC signature b'{header}', # serialized header dict b'{parent_header}', # serialized parent header dict b'{metadata}', # serialized metadata dict b'{content}, # serialized content dict b'blob', # extra raw data buffer(s) ... ] The first frame of the message contains zmq identities which, in some cases in a Router-type socket, are generated by jeromq and then consist of five bytes - 0 followed by a random int. In Toree, all frames are treated as Strings. Conversion to UTF-8 corrupts the zmq id, replacing non-UTF-8 characters by the replacement character 0xEFBFBD. When the corrupted id is used in a message sent to the Router socket, the peer to send the message to is not found and the message is dropped. This affects other messages as well, not just kernel_info_reply. -- This message was sent by Atlassian JIRA (v6.3.15#6346)