[
https://issues.apache.org/jira/browse/THRIFT-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jens Geyer updated THRIFT-5652:
-------------------------------
Description:
[~fishywang] wrote:
in the example this is a valid UUId string (in the 8-4-4-4-12 canonical form),
but what if it is invalid?
looking at the current compiler code (via `git grep -i uuid` under
`compiler/cpp/src`) I don't see any actual parsing of the uuid string on the
compiler level (please let me know if I missed it), which means from the
compiler's point of view, it just has this string literal that's supposed to be
an uuid, and it's the language library's responsibility to convert that from
string to uuid. there's also no way to gracefully handle errors/exceptions for
thrift generated language code at const definitions/default value definitions,
which means if the string cannot be converted to uuid, it must be a runtime
exception/panic/etc.
compare these 2 examples:
const string FOO = 123
vs.
const uuid FOO = "123"
the first thrift file will cause a compiler error, while the second will cause
a runtime error instead.
so here are two options/approaches I can think of for now:
1. the compiler should actually parse the uuid string (using boost:uuid or
something), reject any invalid uuid literals, and feed the bytes to generated
code (vs. the string via language libraries' parse function)
2. we accept that for invalid uuid literals we'll have runtime exceptions/panics
but even if we accept runtime exceptions, there's still an issue with "lenient
imparity" between the language libraries. for example, all language libaries
should support the canonical 8-4-4-4-12 form, as that's what we defined as the
form to be used by TJSONProtocol. but some language libraries can be more
lenient than others, e.g. some might also accept
{8-4-4-4-12}
form, some might accept urn:uuid:8-4-4-4-12 form, some might accept 32-hex
form. so when someone put one of the non-8-4-4-4-12 form literal in thrift
file, some generated language code will have runtime exceptions and some won't.
this can lead to bugs (e.g. someone created a thrift file and tested it in one
language and it works, but it breaks for another language when someone else
tries to use this same thrift file).
was:Tbd
> IDL uuid literals can be improved
> ----------------------------------
>
> Key: THRIFT-5652
> URL: https://issues.apache.org/jira/browse/THRIFT-5652
> Project: Thrift
> Issue Type: Sub-task
> Components: Compiler (General)
> Reporter: Jens Geyer
> Assignee: Jens Geyer
> Priority: Major
>
> [~fishywang] wrote:
> in the example this is a valid UUId string (in the 8-4-4-4-12 canonical
> form), but what if it is invalid?
> looking at the current compiler code (via `git grep -i uuid` under
> `compiler/cpp/src`) I don't see any actual parsing of the uuid string on the
> compiler level (please let me know if I missed it), which means from the
> compiler's point of view, it just has this string literal that's supposed to
> be an uuid, and it's the language library's responsibility to convert that
> from string to uuid. there's also no way to gracefully handle
> errors/exceptions for thrift generated language code at const
> definitions/default value definitions, which means if the string cannot be
> converted to uuid, it must be a runtime exception/panic/etc.
> compare these 2 examples:
> const string FOO = 123
> vs.
> const uuid FOO = "123"
> the first thrift file will cause a compiler error, while the second will
> cause a runtime error instead.
> so here are two options/approaches I can think of for now:
> 1. the compiler should actually parse the uuid string (using boost:uuid or
> something), reject any invalid uuid literals, and feed the bytes to generated
> code (vs. the string via language libraries' parse function)
> 2. we accept that for invalid uuid literals we'll have runtime
> exceptions/panics
> but even if we accept runtime exceptions, there's still an issue with
> "lenient imparity" between the language libraries. for example, all language
> libaries should support the canonical 8-4-4-4-12 form, as that's what we
> defined as the form to be used by TJSONProtocol. but some language libraries
> can be more lenient than others, e.g. some might also accept
> {8-4-4-4-12}
> form, some might accept urn:uuid:8-4-4-4-12 form, some might accept 32-hex
> form. so when someone put one of the non-8-4-4-4-12 form literal in thrift
> file, some generated language code will have runtime exceptions and some
> won't. this can lead to bugs (e.g. someone created a thrift file and tested
> it in one language and it works, but it breaks for another language when
> someone else tries to use this same thrift file).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)