Re: [jira] [Commented] (THRIFT-1477) Allow readFieldBegin() to pass back the field name instead of the field id

Bryan Duxbury Tue, 03 Jan 2012 10:00:19 -0800

If you're working on 0.8+, you could always just create a new Scheme to go
with your Protocol that never uses IDs and always uses names. See the
TupleProtocol for an example of using a non-standard scheme.


On Tue, Jan 3, 2012 at 9:53 AM, Benjy Weinberger <[email protected]> wrote:

> Sadly no. I have tried this extensively, but if read() can't accept
> this information directly then the TProtocol implementation has to
> receive it out-of-band somehow. It seemed like it would be possible to
> generate it up front from the available FieldValueMetaData - starting
> at the root class and working your way down. Unfortunately this:
>
> A) greatly complicates the TProtocol implementation
>
> and
>
> B) is defeated by the presence of typedefs, since, unfortunately and
> for reasons I'm not clear on, the FieldValueMetaData for a typedef
> describes the typedef, not the thing it refers to. (*)
>
>
> There is a trick using stack traces (see (**) for details), but it:
>
> A) greatly complicates the TProtocol implementation
>
> B) is fragile
>
> C) performs poorly and is unsuitable for production use
>
> D) is defeated by the presence of unions.
>
>
> My conclusion, and believe me this was after several days of
> deep-diving into the issue, is that it is much simpler and more
> natural to fix up the read() method.
>
> It's simpler because it isn't a particularly big change. And it's
> probably better to add complication to generated code than to
> hand-written code.
>
> It's more natural because the read() method already has the
> information it needs to read a field by name. Forcing the TProtocol to
> do the conversion means passing it out-of-band information that
> ultimately has to come from the generated struct code anyway. And the
> TProtocol should be about wire representation, not about id<->field
> mappings. That seems properly the role of the generated code.
>
> What do you think? I could put together a change to the compiler and
> we can see the degree of complexity it adds to the generated code is a
> problem.
>
> Benjy
>
>
>
>
> (*) This typedef metadata doesn't seem to be useful at runtime, since
> it just states the typedef alias, but this doesn't refer to anything
> you can reflect on at runtime.
>
> (**) FWIW there is a "TTextProtocol" in the twitter/commons libraries on
> github. It is essentially a JSON protocol that uses names.
>
> However because of the problem mentioned below it has to do some
> non-trivial state tracking, including the following trick to figure
> out which struct it's currently reading (so it knows which struct's
> metadata to use to map names to ids):
>
> In readStructBegin() it generates a stack trace and walks up it frame
> by frame, calling Class.forName() on the declaring class of each frame
> and tests it for assignability to TBase. The first TBase subclass
> encountered is assumed to be the currently read struct - calling
> readStructBegin() from its read() method.
>
> This is fragile (e.g., it doesn't work with unions because the first
> TBase subclass encountered will be TUnion) and has poor performance,
> so it's only used in tests.
>
>
> On Tue, Jan 3, 2012 at 9:40 AM, Bryan Duxbury (Commented) (JIRA)
> <[email protected]> wrote:
> >
> >    [
> https://issues.apache.org/jira/browse/THRIFT-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178850#comment-13178850]
> >
> > Bryan Duxbury commented on THRIFT-1477:
> > ---------------------------------------
> >
> > This feels like it would unnecessarily complicate the read() method. Can
> you just figure out a way to do the name->id mapping yourself?
> >
> >> Allow readFieldBegin() to pass back the field name instead of the field
> id
> >>
> --------------------------------------------------------------------------
> >>
> >>                 Key: THRIFT-1477
> >>                 URL: https://issues.apache.org/jira/browse/THRIFT-1477
> >>             Project: Thrift
> >>          Issue Type: Improvement
> >>          Components: Java - Compiler
> >>            Reporter: Benjy Weinberger
> >>            Priority: Minor
> >>
> >> [Apologies if this has been addressed in another issue. I couldn't find
> anything relevant on JIRA or the mailing list archives.]
> >> Background: I'm implementing a BSON protocol, in order to write Thrift
> messages to MongoDB (technically the protocol generates the object
> representation that the MongoDB driver expects, not a raw BSON string
> directly to the transport, but that's an unimportant detail here).
> >> BSON, like JSON, naturally uses human-readable string field names.
> >> When reading, the generated Thrift code (at least in Java) requires
> that readFieldBegin() pass back a TField with the id field set. It ignores
> the name field. Therefore the ids must appear in the stream. It's possible
> to contort these protocols to use ids instead of human-readable names (as
> TJSONProtocol does) but this isn't helpful in dealing with prior BSON or
> JSON data that we're trying to back-port into Thrift schemata.
> >> However, the generated read() method already knows how to map names to
> ids. So I propose allowing a TProtocol's readFieldBegin() method to pass
> back a TField with the name set and no id set (indicated, say, by id==-1),
> and let the read() method figure out the id to then switch on.
> >> In some cases we could also allow the TField to omit the type
> information, which, again, is not naturally present in JSON. (BSON does
> embed type information, but its type system does not align fully with
> Thrift's, so it can't be used without further context). If the field is
> unknown, the only use for the type is for skipping the field value. But
> protocols like JSON and BSON can skip fields without this type information,
> since fields are delimited in the protocol in a type-independent way.
> >> Basically, what I propose is that readFieldBegin() be allowed to pass
> back just an id or just a name (and, for some protocols, no type
> information), since that is all read() needs in order to figure out how to
> read or skip the field.
> >> I'm wondering what the Thrift elders think of this. Has it been
> discussed? Thanks!
> >> PS This does have the downside that if Thrift were to implement a
> pass-through feature for unrecognized fields (so that new messages read
> with old protocol versions will serialize back out with no loss) - we
> wouldn't be able to preserve fields for which we only had a name and no id.
> Or rather, we wouldn't be able to write them out to a protocol that
> requires ids, like the binary protocols. However this feature doesn't exist
> anyway, and I don't know if it's on the roadmap.
> >
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
> >
>

Re: [jira] [Commented] (THRIFT-1477) Allow readFieldBegin() to pass back the field name instead of the field id

Reply via email to