[ https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph Wu reassigned MESOS-3771: -------------------------------- Assignee: Joseph Wu > Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII > handling > ----------------------------------------------------------------------------------- > > Key: MESOS-3771 > URL: https://issues.apache.org/jira/browse/MESOS-3771 > Project: Mesos > Issue Type: Bug > Components: HTTP API > Affects Versions: 0.24.1, 0.26.0 > Reporter: Steven Schlansker > Assignee: Joseph Wu > Priority: Critical > Labels: mesosphere > > Spark encodes some binary data into the ExecutorInfo.data field. This field > is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data. > If you have such a field, it seems that it is splatted out into JSON without > any regards to proper character encoding: > {code} > 0006b0b0 2e 73 70 61 72 6b 2e 65 78 65 63 75 74 6f 72 2e |.spark.executor.| > 0006b0c0 4d 65 73 6f 73 45 78 65 63 75 74 6f 72 42 61 63 |MesosExecutorBac| > 0006b0d0 6b 65 6e 64 22 7d 2c 22 64 61 74 61 22 3a 22 ac |kend"},"data":".| > 0006b0e0 ed 5c 75 30 30 30 30 5c 75 30 30 30 35 75 72 5c |.\u0000\u0005ur\| > 0006b0f0 75 30 30 30 30 5c 75 30 30 30 66 5b 4c 73 63 61 |u0000\u000f[Lsca| > 0006b100 6c 61 2e 54 75 70 6c 65 32 3b 2e cc 5c 75 30 30 |la.Tuple2;..\u00| > {code} > I suspect this is because the HTTP api emits the executorInfo.data directly: > {code} > JSON::Object model(const ExecutorInfo& executorInfo) > { > JSON::Object object; > object.values["executor_id"] = executorInfo.executor_id().value(); > object.values["name"] = executorInfo.name(); > object.values["data"] = executorInfo.data(); > object.values["framework_id"] = executorInfo.framework_id().value(); > object.values["command"] = model(executorInfo.command()); > object.values["resources"] = model(executorInfo.resources()); > return object; > } > {code} > I think this may be because the custom JSON processing library in stout seems > to not have any idea of what a byte array is. I'm guessing that some > implicit conversion makes it get written as a String instead, but: > {code} > inline std::ostream& operator<<(std::ostream& out, const String& string) > { > // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII. > // See RFC4627 for the JSON string specificiation. > return out << picojson::value(string.value).serialize(); > } > {code} > Thank you for any assistance here. Our cluster is currently entirely down -- > the frameworks cannot handle parsing the invalid JSON produced (it is not > even valid utf-8) -- This message was sent by Atlassian JIRA (v6.3.4#6332)