[ https://issues.apache.org/jira/browse/THRIFT-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jens Geyer updated THRIFT-5652: ------------------------------- Description: [~fishywang] wrote: in the example this is a valid UUId string (in the 8-4-4-4-12 canonical form), but what if it is invalid? looking at the current compiler code (via `git grep -i uuid` under `compiler/cpp/src`) I don't see any actual parsing of the uuid string on the compiler level (please let me know if I missed it), which means from the compiler's point of view, it just has this string literal that's supposed to be an uuid, and it's the language library's responsibility to convert that from string to uuid. there's also no way to gracefully handle errors/exceptions for thrift generated language code at const definitions/default value definitions, which means if the string cannot be converted to uuid, it must be a runtime exception/panic/etc. compare these 2 examples: const string FOO = 123 vs. const uuid FOO = "123" the first thrift file will cause a compiler error, while the second will cause a runtime error instead. so here are two options/approaches I can think of for now: 1. the compiler should actually parse the uuid string (using boost:uuid or something), reject any invalid uuid literals, and feed the bytes to generated code (vs. the string via language libraries' parse function) 2. we accept that for invalid uuid literals we'll have runtime exceptions/panics but even if we accept runtime exceptions, there's still an issue with "lenient imparity" between the language libraries. for example, all language libaries should support the canonical 8-4-4-4-12 form, as that's what we defined as the form to be used by TJSONProtocol. but some language libraries can be more lenient than others, e.g. some might also accept {8-4-4-4-12} form, some might accept urn:uuid:8-4-4-4-12 form, some might accept 32-hex form. so when someone put one of the non-8-4-4-4-12 form literal in thrift file, some generated language code will have runtime exceptions and some won't. this can lead to bugs (e.g. someone created a thrift file and tested it in one language and it works, but it breaks for another language when someone else tries to use this same thrift file). was:Tbd > IDL uuid literals can be improved > ---------------------------------- > > Key: THRIFT-5652 > URL: https://issues.apache.org/jira/browse/THRIFT-5652 > Project: Thrift > Issue Type: Sub-task > Components: Compiler (General) > Reporter: Jens Geyer > Assignee: Jens Geyer > Priority: Major > > [~fishywang] wrote: > in the example this is a valid UUId string (in the 8-4-4-4-12 canonical > form), but what if it is invalid? > looking at the current compiler code (via `git grep -i uuid` under > `compiler/cpp/src`) I don't see any actual parsing of the uuid string on the > compiler level (please let me know if I missed it), which means from the > compiler's point of view, it just has this string literal that's supposed to > be an uuid, and it's the language library's responsibility to convert that > from string to uuid. there's also no way to gracefully handle > errors/exceptions for thrift generated language code at const > definitions/default value definitions, which means if the string cannot be > converted to uuid, it must be a runtime exception/panic/etc. > compare these 2 examples: > const string FOO = 123 > vs. > const uuid FOO = "123" > the first thrift file will cause a compiler error, while the second will > cause a runtime error instead. > so here are two options/approaches I can think of for now: > 1. the compiler should actually parse the uuid string (using boost:uuid or > something), reject any invalid uuid literals, and feed the bytes to generated > code (vs. the string via language libraries' parse function) > 2. we accept that for invalid uuid literals we'll have runtime > exceptions/panics > but even if we accept runtime exceptions, there's still an issue with > "lenient imparity" between the language libraries. for example, all language > libaries should support the canonical 8-4-4-4-12 form, as that's what we > defined as the form to be used by TJSONProtocol. but some language libraries > can be more lenient than others, e.g. some might also accept > {8-4-4-4-12} > form, some might accept urn:uuid:8-4-4-4-12 form, some might accept 32-hex > form. so when someone put one of the non-8-4-4-4-12 form literal in thrift > file, some generated language code will have runtime exceptions and some > won't. this can lead to bugs (e.g. someone created a thrift file and tested > it in one language and it works, but it breaks for another language when > someone else tries to use this same thrift file). -- This message was sent by Atlassian Jira (v8.20.10#820010)