[ 
https://issues.apache.org/jira/browse/THRIFT-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jens Geyer updated THRIFT-5652:
-------------------------------
    Description: 
[~fishywang] wrote:

in the example this is a valid UUId string (in the 8-4-4-4-12 canonical form), 
but what if it is invalid?

looking at the current compiler code (via `git grep -i uuid` under 
`compiler/cpp/src`) I don't see any actual parsing of the uuid string on the 
compiler level (please let me know if I missed it), which means from the 
compiler's point of view, it just has this string literal that's supposed to be 
an uuid, and it's the language library's responsibility to convert that from 
string to uuid. there's also no way to gracefully handle errors/exceptions for 
thrift generated language code at const definitions/default value definitions, 
which means if the string cannot be converted to uuid, it must be a runtime 
exception/panic/etc.

compare these 2 examples:

const string FOO = 123
vs.

const uuid FOO = "123"
the first thrift file will cause a compiler error, while the second will cause 
a runtime error instead.

so here are two options/approaches I can think of for now:

1. the compiler should actually parse the uuid string (using boost:uuid or 
something), reject any invalid uuid literals, and feed the bytes to generated 
code (vs. the string via language libraries' parse function)
2. we accept that for invalid uuid literals we'll have runtime exceptions/panics

but even if we accept runtime exceptions, there's still an issue with "lenient 
imparity" between the language libraries. for example, all language libaries 
should support the canonical 8-4-4-4-12 form, as that's what we defined as the 
form to be used by TJSONProtocol. but some language libraries can be more 
lenient than others, e.g. some might also accept

{8-4-4-4-12}
form, some might accept urn:uuid:8-4-4-4-12 form, some might accept 32-hex 
form. so when someone put one of the non-8-4-4-4-12 form literal in thrift 
file, some generated language code will have runtime exceptions and some won't. 
this can lead to bugs (e.g. someone created a thrift file and tested it in one 
language and it works, but it breaks for another language when someone else 
tries to use this same thrift file).

  was:Tbd


> IDL uuid literals can be improved 
> ----------------------------------
>
>                 Key: THRIFT-5652
>                 URL: https://issues.apache.org/jira/browse/THRIFT-5652
>             Project: Thrift
>          Issue Type: Sub-task
>          Components: Compiler (General)
>            Reporter: Jens Geyer
>            Assignee: Jens Geyer
>            Priority: Major
>
> [~fishywang] wrote:
> in the example this is a valid UUId string (in the 8-4-4-4-12 canonical 
> form), but what if it is invalid?
> looking at the current compiler code (via `git grep -i uuid` under 
> `compiler/cpp/src`) I don't see any actual parsing of the uuid string on the 
> compiler level (please let me know if I missed it), which means from the 
> compiler's point of view, it just has this string literal that's supposed to 
> be an uuid, and it's the language library's responsibility to convert that 
> from string to uuid. there's also no way to gracefully handle 
> errors/exceptions for thrift generated language code at const 
> definitions/default value definitions, which means if the string cannot be 
> converted to uuid, it must be a runtime exception/panic/etc.
> compare these 2 examples:
> const string FOO = 123
> vs.
> const uuid FOO = "123"
> the first thrift file will cause a compiler error, while the second will 
> cause a runtime error instead.
> so here are two options/approaches I can think of for now:
> 1. the compiler should actually parse the uuid string (using boost:uuid or 
> something), reject any invalid uuid literals, and feed the bytes to generated 
> code (vs. the string via language libraries' parse function)
> 2. we accept that for invalid uuid literals we'll have runtime 
> exceptions/panics
> but even if we accept runtime exceptions, there's still an issue with 
> "lenient imparity" between the language libraries. for example, all language 
> libaries should support the canonical 8-4-4-4-12 form, as that's what we 
> defined as the form to be used by TJSONProtocol. but some language libraries 
> can be more lenient than others, e.g. some might also accept
> {8-4-4-4-12}
> form, some might accept urn:uuid:8-4-4-4-12 form, some might accept 32-hex 
> form. so when someone put one of the non-8-4-4-4-12 form literal in thrift 
> file, some generated language code will have runtime exceptions and some 
> won't. this can lead to bugs (e.g. someone created a thrift file and tested 
> it in one language and it works, but it breaks for another language when 
> someone else tries to use this same thrift file).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to