Hi,
thanks!
What I can say from a more general/formal viewpoint, the package namespaces
etc must be adjusted (e.g. package com.pinterest.commons.thrift) and all
files you plan to contribute need to have the appropriate ASF license
header.
What would be really nice, especially when we look forward to integrate it
into other languages, adding some technical documentation/specification to
make it easier to implement and have some common, documented view on the
feature.
Last not least having appropriate test coverage would be great.
Anything else beyond that ... whatever makes sense.
Have fun,
JensG
-----Ursprüngliche Nachricht-----
From: Bhalchandra Pandit
Sent: Wednesday, August 4, 2021 3:33 AM
To: dev@thrift.apache.org
Subject: Re: partial deserialization of Thrift
I have added basic set of files at the location below. Currently they are
unchanged from their Pinterest version. I would like to get some guidance
on the type of changes I should make. All of them have many tests though
they are not included in this initial skeleton for now.
https://github.com/bhalchandrap/thrift/tree/sample/lib/java/src/org/apache/thrift/partial
On Wed, Jul 28, 2021 at 4:59 PM Bhalchandra Pandit <kpan...@pinterest.com>
wrote:
I opened the following jira epic for this contribution:
https://issues.apache.org/jira/browse/THRIFT-5443
On Mon, Jul 12, 2021 at 11:58 PM Duru Can Celasun <dcela...@apache.org>
wrote:
On Tue, 13 Jul 2021, at 01:07, Bhalchandra Pandit wrote:
> Thanks for showing interest and for raising valid questions.
>
> I do not currently have a forked branch to show how it is done.
However, I
> can certainly make it happen. It will take me a couple of weeks or so
to do
> it. I will need to get myself familiarized with the contribution
process.
The short version is that you need to open a ticket on Jira [1] and then
send a PR against the master branch on Github [2] with the title
"THRIFT-NNNN: Your title". Everything else can be figured out later.
[1] https://issues.apache.org/jira/projects/THRIFT/issues
[2] https://github.com/apache/thrift/
>
> To answer other questions:
>
> 1. The current implementation is in Java. However, it can also be
easily
> ported to other languages. Happy to help in that direction as well.
> 2. The solution does not modify the definition of any structure (eg,
> Payload => SlimPayload). The output instance has the same type
regardless
> of whether we use full deserialization or partial deserialization.
In that
> sense, it is 100% compatible in both directions (partial <=> full).
The
> solution enables selectively deserializing any arbitrary nested
subset.
>
> Kumar
>
> On Mon, Jul 12, 2021 at 4:10 PM Duru Can Celasun <dcela...@apache.org>
> wrote:
>
> > This is definitely an interesting problem at scale and it'd be great
to
> > have a solution upstream.
> >
> > I second Yuxuan's questions. From the blog post it seems you have an
> > implementation for Java, but it would be great to have at least one
more.
> >
> > On Tue, 13 Jul 2021, at 00:00, Yuxuan Wang wrote:
> > > Hi Bhalchandra,
> > >
> > > Do you have any open source code (e.g. forked thrift) to show how
you
> > > implemented it? The blog post stated what's the problem but really
lack
> > > information on the solution side:
> > >
> > > 1. How did you solve the problem?
> > > 2. In which language(s) did you implement the solution?
> > >
> > > Without that information it's really not much we can do here.
> > >
> > > Also just based on the problem you described, a simple solution
would be
> > > just to duplicate the struct definition and remove the fields you
don't
> > > need, for example:
> > >
> > > // Original struct
> > > struct Payload {
> > > 1: optional Type1 field1,
> > > 2: optional Type2 field2,
> > > 3: optional Type3 field3,
> > > ...
> > > }
> > >
> > > // Slim struct
> > > struct SlimPayload {
> > > 1: optional Type1 field1,
> > > // field2 removed because we don't care about it in this use case
> > > 3: optional SlimType3 field3,
> > > ...
> > > }
> > >
> > > But of course it's hard to keep SlimPayload in sync with the
original
> > > Payload so I can see there are some values to have some helpers to
help
> > > that, but as long as you don't make breaking changes into Payload
"keep
> > > them in sync" is a false problem.
> > >
> > > On Mon, Jul 12, 2021 at 12:56 PM Bhalchandra Pandit
> > > <kpan...@pinterest.com.invalid> wrote:
> > >
> > > > Hi All,
> > > > I work for Pinterest. I developed a technique for partial
> > deserialization
> > > > of Thrift that has been very useful in significantly improving
> > efficiency
> > > > of the data processing at Pinterest. I would like to contribute
that
> > > > feature to Apache Thrift. More details on this technique are
available
> > in
> > > > this blog I recently wrote:
> > > >
> > > >
> >
https://medium.com/pinterest-engineering/improving-data-processing-efficiency-using-partial-deserialization-of-thrift-16bc3a4a38b4
> > > >
> > > > I would like to know if any of you are interested in helping with
> > > > contributing this work to the main branch.
> > > >
> > > > Kumar
> > > >
> > >
> >
>