[ https://issues.apache.org/jira/browse/THRIFT-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223556#comment-16223556 ]
ASF GitHub Bot commented on THRIFT-4207: ---------------------------------------- Github user nsuke commented on a diff in the pull request: https://github.com/apache/thrift/pull/1274#discussion_r147556105 --- Diff: lib/py/src/ext/protocol.tcc --- @@ -419,18 +419,30 @@ bool ProtocolBase<Impl>::encodeValue(PyObject* value, TType type, PyObject* type case T_STRING: { ScopedPyObject nval; + Py_ssize_t len; + char *str; if (PyUnicode_Check(value)) { nval.reset(PyUnicode_AsUTF8String(value)); if (!nval) { return false; } } else { + if (isUtf8(typeargs)) { + if (PyBytes_AsStringAndSize(value, &str, &len) < 0) { + return false; + } + // Check that input is a valid UTF-8 string. + nval.reset(PyUnicode_DecodeUTF8(str, len, 0)); + if (!nval) { + return false; + } + } --- End diff -- Doesn't this affect every user's performance who are passing relatively large utf8-encoded `byte` ? The problem might be that we're not rejecting `byte` in the first place. Although "fixing" that wouldn't be backward compatible. What do you think ? > Accelerated version of TBinaryProtocol allows invalid input to string fields. > ----------------------------------------------------------------------------- > > Key: THRIFT-4207 > URL: https://issues.apache.org/jira/browse/THRIFT-4207 > Project: Thrift > Issue Type: Bug > Components: Python - Library > Affects Versions: 0.10.0 > Reporter: Elvis Pranskevichus > Assignee: James E. King, III > Fix For: 0.11.0 > > > {{TBinaryProtocolAccelerated}} and {{TCompactProtocolAccelerated}} currently > accept arbitrary bytes as input to string fields even when {{py:utf8strings}} > is on. -- This message was sent by Atlassian JIRA (v6.4.14#64029)