Re: [jira] [Commented] (THRIFT-1477) Allow readFieldBegin() to pass back the field name instead of the field id

Benjy Weinberger Tue, 03 Jan 2012 09:54:21 -0800

Sadly no. I have tried this extensively, but if read() can't accept
this information directly then the TProtocol implementation has to
receive it out-of-band somehow. It seemed like it would be possible to
generate it up front from the available FieldValueMetaData - starting
at the root class and working your way down. Unfortunately this:

A) greatly complicates the TProtocol implementation

and

B) is defeated by the presence of typedefs, since, unfortunately and
for reasons I'm not clear on, the FieldValueMetaData for a typedef
describes the typedef, not the thing it refers to. (*)

There is a trick using stack traces (see (**) for details), but it:

A) greatly complicates the TProtocol implementation

B) is fragile

C) performs poorly and is unsuitable for production use

D) is defeated by the presence of unions.

My conclusion, and believe me this was after several days of
deep-diving into the issue, is that it is much simpler and more
natural to fix up the read() method.

It's simpler because it isn't a particularly big change. And it's
probably better to add complication to generated code than to
hand-written code.

It's more natural because the read() method already has the
information it needs to read a field by name. Forcing the TProtocol to
do the conversion means passing it out-of-band information that
ultimately has to come from the generated struct code anyway. And the
TProtocol should be about wire representation, not about id<->field
mappings. That seems properly the role of the generated code.

What do you think? I could put together a change to the compiler and
we can see the degree of complexity it adds to the generated code is a
problem.

Benjy

(*) This typedef metadata doesn't seem to be useful at runtime, since
it just states the typedef alias, but this doesn't refer to anything
you can reflect on at runtime.

(**) FWIW there is a "TTextProtocol" in the twitter/commons libraries on
github. It is essentially a JSON protocol that uses names.

However because of the problem mentioned below it has to do some
non-trivial state tracking, including the following trick to figure
out which struct it's currently reading (so it knows which struct's
metadata to use to map names to ids):

In readStructBegin() it generates a stack trace and walks up it frame
by frame, calling Class.forName() on the declaring class of each frame
and tests it for assignability to TBase. The first TBase subclass
encountered is assumed to be the currently read struct - calling
readStructBegin() from its read() method.

This is fragile (e.g., it doesn't work with unions because the first
TBase subclass encountered will be TUnion) and has poor performance,
so it's only used in tests.

On Tue, Jan 3, 2012 at 9:40 AM, Bryan Duxbury (Commented) (JIRA)
<[email protected]> wrote:
>
>    [ 
> https://issues.apache.org/jira/browse/THRIFT-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178850#comment-13178850
>  ]
>
> Bryan Duxbury commented on THRIFT-1477:
> ---------------------------------------
>
> This feels like it would unnecessarily complicate the read() method. Can you 
> just figure out a way to do the name->id mapping yourself?
>
>> Allow readFieldBegin() to pass back the field name instead of the field id
>> --------------------------------------------------------------------------
>>
>>                 Key: THRIFT-1477
>>                 URL: https://issues.apache.org/jira/browse/THRIFT-1477
>>             Project: Thrift
>>          Issue Type: Improvement
>>          Components: Java - Compiler
>>            Reporter: Benjy Weinberger
>>            Priority: Minor
>>
>> [Apologies if this has been addressed in another issue. I couldn't find 
>> anything relevant on JIRA or the mailing list archives.]
>> Background: I'm implementing a BSON protocol, in order to write Thrift 
>> messages to MongoDB (technically the protocol generates the object 
>> representation that the MongoDB driver expects, not a raw BSON string 
>> directly to the transport, but that's an unimportant detail here).
>> BSON, like JSON, naturally uses human-readable string field names.
>> When reading, the generated Thrift code (at least in Java) requires that 
>> readFieldBegin() pass back a TField with the id field set. It ignores the 
>> name field. Therefore the ids must appear in the stream. It's possible to 
>> contort these protocols to use ids instead of human-readable names (as 
>> TJSONProtocol does) but this isn't helpful in dealing with prior BSON or 
>> JSON data that we're trying to back-port into Thrift schemata.
>> However, the generated read() method already knows how to map names to ids. 
>> So I propose allowing a TProtocol's readFieldBegin() method to pass back a 
>> TField with the name set and no id set (indicated, say, by id==-1), and let 
>> the read() method figure out the id to then switch on.
>> In some cases we could also allow the TField to omit the type information, 
>> which, again, is not naturally present in JSON. (BSON does embed type 
>> information, but its type system does not align fully with Thrift's, so it 
>> can't be used without further context). If the field is unknown, the only 
>> use for the type is for skipping the field value. But protocols like JSON 
>> and BSON can skip fields without this type information, since fields are 
>> delimited in the protocol in a type-independent way.
>> Basically, what I propose is that readFieldBegin() be allowed to pass back 
>> just an id or just a name (and, for some protocols, no type information), 
>> since that is all read() needs in order to figure out how to read or skip 
>> the field.
>> I'm wondering what the Thrift elders think of this. Has it been discussed? 
>> Thanks!
>> PS This does have the downside that if Thrift were to implement a 
>> pass-through feature for unrecognized fields (so that new messages read with 
>> old protocol versions will serialize back out with no loss) - we wouldn't be 
>> able to preserve fields for which we only had a name and no id. Or rather, 
>> we wouldn't be able to write them out to a protocol that requires ids, like 
>> the binary protocols. However this feature doesn't exist anyway, and I don't 
>> know if it's on the roadmap.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA 
> administrators: 
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>

Re: [jira] [Commented] (THRIFT-1477) Allow readFieldBegin() to pass back the field name instead of the field id

Reply via email to