Hi Jens,
Thanks for reviewing. For the others who may want to review, I have added
README.md file that provides technical details on the overall design and
includes description of the major implementation components.

Kumar

On Wed, Aug 25, 2021 at 12:05 AM Jens Geyer <je...@apache.org> wrote:

> Hi,
>
> I added myself already as a reviewer, but it certainly would help if some
> other (especially Java) people could also review.
>
> Stay tuned, will do my part soon.
>
> Have fun,
> JensG
>
> -----Ursprüngliche Nachricht-----
> From: Bhalchandra Pandit
> Sent: Tuesday, August 24, 2021 4:25 AM
> To: dev@thrift.apache.org ; Jens Geyer
> Subject: Re: partial deserialization of Thrift
>
> I would like to know if there are any changes that anyone would like to
> suggest for the pull request <https://github.com/apache/thrift/pull/2439>
> that adds support for partial Thrift deserialization.
>
> On Mon, Aug 16, 2021 at 12:37 PM Bhalchandra Pandit <kpan...@pinterest.com
> >
> wrote:
>
> > I created a pull request for this change:
> > https://github.com/apache/thrift/pull/2439
> >
> > I have also added a README.md file that provides technical details of
> > partial Thrift deserialization so that it is easier to understand the
> Java
> > implementation. It will be also useful if this support is to be added for
> > other languages.
> >
> > Please let me know if you have any questions or suggestions for changes
> to
> > the code.
> >
> > On Wed, Aug 4, 2021 at 12:10 AM Jens Geyer <je...@apache.org> wrote:
> >
> >> Hi,
> >>
> >> thanks!
> >>
> >> What I can say from a more general/formal viewpoint, the package
> >> namespaces
> >> etc must be adjusted (e.g. package com.pinterest.commons.thrift) and all
> >> files you plan to contribute need to have the appropriate ASF license
> >> header.
> >>
> >> What would be really nice, especially when we look forward to integrate
> >> it
> >> into other languages, adding some technical documentation/specification
> >> to
> >> make it easier to implement and have some common, documented view on the
> >> feature.
> >>
> >> Last not least having appropriate test coverage would be great.
> >>
> >> Anything else beyond that ... whatever makes sense.
> >>
> >> Have fun,
> >> JensG
> >>
> >>
> >> -----Ursprüngliche Nachricht-----
> >> From: Bhalchandra Pandit
> >> Sent: Wednesday, August 4, 2021 3:33 AM
> >> To: dev@thrift.apache.org
> >> Subject: Re: partial deserialization of Thrift
> >>
> >> I have added basic set of files at the location below. Currently they
> are
> >> unchanged from their Pinterest version. I would like to get some
> guidance
> >> on the type of changes I should make. All of them have many tests though
> >> they are not included in this initial skeleton for now.
> >>
> >>
> >>
> https://github.com/bhalchandrap/thrift/tree/sample/lib/java/src/org/apache/thrift/partial
> >>
> >> On Wed, Jul 28, 2021 at 4:59 PM Bhalchandra Pandit <
> kpan...@pinterest.com
> >> >
> >> wrote:
> >>
> >> > I opened the following jira epic for this contribution:
> >> > https://issues.apache.org/jira/browse/THRIFT-5443
> >> >
> >> > On Mon, Jul 12, 2021 at 11:58 PM Duru Can Celasun <
> dcela...@apache.org>
> >> > wrote:
> >> >
> >> >> On Tue, 13 Jul 2021, at 01:07, Bhalchandra Pandit wrote:
> >> >> > Thanks for showing interest and for raising valid questions.
> >> >> >
> >> >> > I do not currently have a forked branch to show how it is done.
> >> >> However, I
> >> >> > can certainly make it happen. It will take me a couple of weeks or
> >> >> > so
> >> >> to do
> >> >> > it. I will need to get myself familiarized with the contribution
> >> >> process.
> >> >>
> >> >> The short version is that you need to open a ticket on Jira [1] and
> >> then
> >> >> send a PR against the master branch on Github [2] with the title
> >> >> "THRIFT-NNNN: Your title". Everything else can be figured out later.
> >> >>
> >> >> [1] https://issues.apache.org/jira/projects/THRIFT/issues
> >> >>
> >> >> [2] https://github.com/apache/thrift/
> >> >>
> >> >> >
> >> >> > To answer other questions:
> >> >> >
> >> >> >    1. The current implementation is in Java. However, it can also
> be
> >> >> easily
> >> >> >    ported to other languages. Happy to help in that direction as
> >> well.
> >> >> >    2. The solution does not modify the definition of any structure
> >> (eg,
> >> >> >    Payload => SlimPayload). The output instance has the same type
> >> >> regardless
> >> >> >    of whether we use full deserialization or partial
> >> >> > deserialization.
> >> >> In that
> >> >> >    sense, it is 100% compatible in both directions (partial <=>
> >> full).
> >> >> The
> >> >> >    solution enables selectively deserializing any arbitrary nested
> >> >> subset.
> >> >> >
> >> >> > Kumar
> >> >> >
> >> >> > On Mon, Jul 12, 2021 at 4:10 PM Duru Can Celasun <
> >> dcela...@apache.org>
> >> >> > wrote:
> >> >> >
> >> >> > > This is definitely an interesting problem at scale and it'd be
> >> great
> >> >> to
> >> >> > > have a solution upstream.
> >> >> > >
> >> >> > > I second Yuxuan's questions. From the blog post it seems you have
> >> an
> >> >> > > implementation for Java, but it would be great to have at least
> >> >> > > one
> >> >> more.
> >> >> > >
> >> >> > > On Tue, 13 Jul 2021, at 00:00, Yuxuan Wang wrote:
> >> >> > > >  Hi Bhalchandra,
> >> >> > > >
> >> >> > > > Do you have any open source code (e.g. forked thrift) to show
> >> >> > > > how
> >> >> you
> >> >> > > > implemented it? The blog post stated what's the problem but
> >> really
> >> >> lack
> >> >> > > > information on the solution side:
> >> >> > > >
> >> >> > > > 1. How did you solve the problem?
> >> >> > > > 2. In which language(s) did you implement the solution?
> >> >> > > >
> >> >> > > > Without that information it's really not much we can do here.
> >> >> > > >
> >> >> > > > Also just based on the problem you described, a simple solution
> >> >> would be
> >> >> > > > just to duplicate the struct definition and remove the fields
> >> >> > > > you
> >> >> don't
> >> >> > > > need, for example:
> >> >> > > >
> >> >> > > > // Original struct
> >> >> > > > struct Payload {
> >> >> > > >   1: optional Type1 field1,
> >> >> > > >   2: optional Type2 field2,
> >> >> > > >   3: optional Type3 field3,
> >> >> > > >   ...
> >> >> > > > }
> >> >> > > >
> >> >> > > > // Slim struct
> >> >> > > > struct SlimPayload {
> >> >> > > >   1: optional Type1 field1,
> >> >> > > >   // field2 removed because we don't care about it in this use
> >> case
> >> >> > > >   3: optional SlimType3 field3,
> >> >> > > >   ...
> >> >> > > > }
> >> >> > > >
> >> >> > > > But of course it's hard to keep SlimPayload in sync with the
> >> >> original
> >> >> > > > Payload so I can see there are some values to have some helpers
> >> to
> >> >> help
> >> >> > > > that, but as long as you don't make breaking changes into
> >> >> > > > Payload
> >> >> "keep
> >> >> > > > them in sync" is a false problem.
> >> >> > > >
> >> >> > > > On Mon, Jul 12, 2021 at 12:56 PM Bhalchandra Pandit
> >> >> > > > <kpan...@pinterest.com.invalid> wrote:
> >> >> > > >
> >> >> > > > > Hi All,
> >> >> > > > > I work for Pinterest. I developed a technique for partial
> >> >> > > deserialization
> >> >> > > > > of Thrift that has been very useful in significantly
> improving
> >> >> > > efficiency
> >> >> > > > > of the data processing at Pinterest. I would like to
> >> >> > > > > contribute
> >> >> that
> >> >> > > > > feature to Apache Thrift. More details on this technique are
> >> >> available
> >> >> > > in
> >> >> > > > > this blog I recently wrote:
> >> >> > > > >
> >> >> > > > >
> >> >> > >
> >> >>
> >>
> https://medium.com/pinterest-engineering/improving-data-processing-efficiency-using-partial-deserialization-of-thrift-16bc3a4a38b4
> >> >> > > > >
> >> >> > > > > I would like to know if any of you are interested in helping
> >> with
> >> >> > > > > contributing this work to the main branch.
> >> >> > > > >
> >> >> > > > > Kumar
> >> >> > > > >
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >> >
> >>
> >>
>
>

Reply via email to