[ https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14967962#comment-14967962 ]
Steven Schlansker commented on MESOS-3771: ------------------------------------------ Okay, I have distilled down the reproduction case. Using the Python test-framework with the following diff applied: {code} diff --git a/src/examples/python/test_framework.py b/src/examples/python/test_framework.py index 6af6d22..95abb97 100755 --- a/src/examples/python/test_framework.py +++ b/src/examples/python/test_framework.py @@ -150,6 +150,7 @@ class TestScheduler(mesos.interface.Scheduler): print "but received", self.messagesReceived sys.exit(1) print "All tasks done, and all messages received, exiting" + time.sleep(30) driver.stop() if __name__ == "__main__": @@ -158,6 +159,7 @@ if __name__ == "__main__": sys.exit(1) executor = mesos_pb2.ExecutorInfo() + executor.data = b'\xAC\xED' executor.executor_id.value = "default" executor.command.value = os.path.abspath("./test-executor") executor.name = "Test Executor (Python)" {code} if you run the test framework, and during the 30 second wait after it finishes, try to grab the {{/master/state.json}} endpoint, you will get a response that has invalid UTF8 in it: {code} Caused by: com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xac at [Source: org.jboss.netty.buffer.ChannelBufferInputStream@54c8158d; line: 1, column: 6432] {code} I tested against both 0.24.1 and current master, both exhibit the bad behavior. > Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII > handling > ----------------------------------------------------------------------------------- > > Key: MESOS-3771 > URL: https://issues.apache.org/jira/browse/MESOS-3771 > Project: Mesos > Issue Type: Bug > Components: HTTP API > Affects Versions: 0.24.1, 0.26.0 > Reporter: Steven Schlansker > Priority: Critical > > Spark encodes some binary data into the ExecutorInfo.data field. This field > is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data. > If you have such a field, it seems that it is splatted out into JSON without > any regards to proper character encoding: > {code} > 0006b0b0 2e 73 70 61 72 6b 2e 65 78 65 63 75 74 6f 72 2e |.spark.executor.| > 0006b0c0 4d 65 73 6f 73 45 78 65 63 75 74 6f 72 42 61 63 |MesosExecutorBac| > 0006b0d0 6b 65 6e 64 22 7d 2c 22 64 61 74 61 22 3a 22 ac |kend"},"data":".| > 0006b0e0 ed 5c 75 30 30 30 30 5c 75 30 30 30 35 75 72 5c |.\u0000\u0005ur\| > 0006b0f0 75 30 30 30 30 5c 75 30 30 30 66 5b 4c 73 63 61 |u0000\u000f[Lsca| > 0006b100 6c 61 2e 54 75 70 6c 65 32 3b 2e cc 5c 75 30 30 |la.Tuple2;..\u00| > {code} > I suspect this is because the HTTP api emits the executorInfo.data directly: > {code} > JSON::Object model(const ExecutorInfo& executorInfo) > { > JSON::Object object; > object.values["executor_id"] = executorInfo.executor_id().value(); > object.values["name"] = executorInfo.name(); > object.values["data"] = executorInfo.data(); > object.values["framework_id"] = executorInfo.framework_id().value(); > object.values["command"] = model(executorInfo.command()); > object.values["resources"] = model(executorInfo.resources()); > return object; > } > {code} > I think this may be because the custom JSON processing library in stout seems > to not have any idea of what a byte array is. I'm guessing that some > implicit conversion makes it get written as a String instead, but: > {code} > inline std::ostream& operator<<(std::ostream& out, const String& string) > { > // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII. > // See RFC4627 for the JSON string specificiation. > return out << picojson::value(string.value).serialize(); > } > {code} > Thank you for any assistance here. Our cluster is currently entirely down -- > the frameworks cannot handle parsing the invalid JSON produced (it is not > even valid utf-8) -- This message was sent by Atlassian JIRA (v6.3.4#6332)