I would like to know if there are any changes that anyone would like to
suggest for the pull request <https://github.com/apache/thrift/pull/2439>
that adds support for partial Thrift deserialization.

On Mon, Aug 16, 2021 at 12:37 PM Bhalchandra Pandit <kpan...@pinterest.com>
wrote:

> I created a pull request for this change:
> https://github.com/apache/thrift/pull/2439
>
> I have also added a README.md file that provides technical details of
> partial Thrift deserialization so that it is easier to understand the Java
> implementation. It will be also useful if this support is to be added for
> other languages.
>
> Please let me know if you have any questions or suggestions for changes to
> the code.
>
> On Wed, Aug 4, 2021 at 12:10 AM Jens Geyer <je...@apache.org> wrote:
>
>> Hi,
>>
>> thanks!
>>
>> What I can say from a more general/formal viewpoint, the package
>> namespaces
>> etc must be adjusted (e.g. package com.pinterest.commons.thrift) and all
>> files you plan to contribute need to have the appropriate ASF license
>> header.
>>
>> What would be really nice, especially when we look forward to integrate
>> it
>> into other languages, adding some technical documentation/specification
>> to
>> make it easier to implement and have some common, documented view on the
>> feature.
>>
>> Last not least having appropriate test coverage would be great.
>>
>> Anything else beyond that ... whatever makes sense.
>>
>> Have fun,
>> JensG
>>
>>
>> -----Ursprüngliche Nachricht-----
>> From: Bhalchandra Pandit
>> Sent: Wednesday, August 4, 2021 3:33 AM
>> To: dev@thrift.apache.org
>> Subject: Re: partial deserialization of Thrift
>>
>> I have added basic set of files at the location below. Currently they are
>> unchanged from their Pinterest version. I would like to get some guidance
>> on the type of changes I should make. All of them have many tests though
>> they are not included in this initial skeleton for now.
>>
>>
>> https://github.com/bhalchandrap/thrift/tree/sample/lib/java/src/org/apache/thrift/partial
>>
>> On Wed, Jul 28, 2021 at 4:59 PM Bhalchandra Pandit <kpan...@pinterest.com
>> >
>> wrote:
>>
>> > I opened the following jira epic for this contribution:
>> > https://issues.apache.org/jira/browse/THRIFT-5443
>> >
>> > On Mon, Jul 12, 2021 at 11:58 PM Duru Can Celasun <dcela...@apache.org>
>> > wrote:
>> >
>> >> On Tue, 13 Jul 2021, at 01:07, Bhalchandra Pandit wrote:
>> >> > Thanks for showing interest and for raising valid questions.
>> >> >
>> >> > I do not currently have a forked branch to show how it is done.
>> >> However, I
>> >> > can certainly make it happen. It will take me a couple of weeks or so
>> >> to do
>> >> > it. I will need to get myself familiarized with the contribution
>> >> process.
>> >>
>> >> The short version is that you need to open a ticket on Jira [1] and
>> then
>> >> send a PR against the master branch on Github [2] with the title
>> >> "THRIFT-NNNN: Your title". Everything else can be figured out later.
>> >>
>> >> [1] https://issues.apache.org/jira/projects/THRIFT/issues
>> >>
>> >> [2] https://github.com/apache/thrift/
>> >>
>> >> >
>> >> > To answer other questions:
>> >> >
>> >> >    1. The current implementation is in Java. However, it can also be
>> >> easily
>> >> >    ported to other languages. Happy to help in that direction as
>> well.
>> >> >    2. The solution does not modify the definition of any structure
>> (eg,
>> >> >    Payload => SlimPayload). The output instance has the same type
>> >> regardless
>> >> >    of whether we use full deserialization or partial deserialization.
>> >> In that
>> >> >    sense, it is 100% compatible in both directions (partial <=>
>> full).
>> >> The
>> >> >    solution enables selectively deserializing any arbitrary nested
>> >> subset.
>> >> >
>> >> > Kumar
>> >> >
>> >> > On Mon, Jul 12, 2021 at 4:10 PM Duru Can Celasun <
>> dcela...@apache.org>
>> >> > wrote:
>> >> >
>> >> > > This is definitely an interesting problem at scale and it'd be
>> great
>> >> to
>> >> > > have a solution upstream.
>> >> > >
>> >> > > I second Yuxuan's questions. From the blog post it seems you have
>> an
>> >> > > implementation for Java, but it would be great to have at least one
>> >> more.
>> >> > >
>> >> > > On Tue, 13 Jul 2021, at 00:00, Yuxuan Wang wrote:
>> >> > > >  Hi Bhalchandra,
>> >> > > >
>> >> > > > Do you have any open source code (e.g. forked thrift) to show how
>> >> you
>> >> > > > implemented it? The blog post stated what's the problem but
>> really
>> >> lack
>> >> > > > information on the solution side:
>> >> > > >
>> >> > > > 1. How did you solve the problem?
>> >> > > > 2. In which language(s) did you implement the solution?
>> >> > > >
>> >> > > > Without that information it's really not much we can do here.
>> >> > > >
>> >> > > > Also just based on the problem you described, a simple solution
>> >> would be
>> >> > > > just to duplicate the struct definition and remove the fields you
>> >> don't
>> >> > > > need, for example:
>> >> > > >
>> >> > > > // Original struct
>> >> > > > struct Payload {
>> >> > > >   1: optional Type1 field1,
>> >> > > >   2: optional Type2 field2,
>> >> > > >   3: optional Type3 field3,
>> >> > > >   ...
>> >> > > > }
>> >> > > >
>> >> > > > // Slim struct
>> >> > > > struct SlimPayload {
>> >> > > >   1: optional Type1 field1,
>> >> > > >   // field2 removed because we don't care about it in this use
>> case
>> >> > > >   3: optional SlimType3 field3,
>> >> > > >   ...
>> >> > > > }
>> >> > > >
>> >> > > > But of course it's hard to keep SlimPayload in sync with the
>> >> original
>> >> > > > Payload so I can see there are some values to have some helpers
>> to
>> >> help
>> >> > > > that, but as long as you don't make breaking changes into Payload
>> >> "keep
>> >> > > > them in sync" is a false problem.
>> >> > > >
>> >> > > > On Mon, Jul 12, 2021 at 12:56 PM Bhalchandra Pandit
>> >> > > > <kpan...@pinterest.com.invalid> wrote:
>> >> > > >
>> >> > > > > Hi All,
>> >> > > > > I work for Pinterest. I developed a technique for partial
>> >> > > deserialization
>> >> > > > > of Thrift that has been very useful in significantly improving
>> >> > > efficiency
>> >> > > > > of the data processing at Pinterest. I would like to contribute
>> >> that
>> >> > > > > feature to Apache Thrift. More details on this technique are
>> >> available
>> >> > > in
>> >> > > > > this blog I recently wrote:
>> >> > > > >
>> >> > > > >
>> >> > >
>> >>
>> https://medium.com/pinterest-engineering/improving-data-processing-efficiency-using-partial-deserialization-of-thrift-16bc3a4a38b4
>> >> > > > >
>> >> > > > > I would like to know if any of you are interested in helping
>> with
>> >> > > > > contributing this work to the main branch.
>> >> > > > >
>> >> > > > > Kumar
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>>
>>

Reply via email to