I have added basic set of files at the location below. Currently they are
unchanged from their Pinterest version. I would like to get some guidance
on the type of changes I should make. All of them have many tests though
they are not included in this initial skeleton for now.

https://github.com/bhalchandrap/thrift/tree/sample/lib/java/src/org/apache/thrift/partial

On Wed, Jul 28, 2021 at 4:59 PM Bhalchandra Pandit <kpan...@pinterest.com>
wrote:

> I opened the following jira epic for this contribution:
> https://issues.apache.org/jira/browse/THRIFT-5443
>
> On Mon, Jul 12, 2021 at 11:58 PM Duru Can Celasun <dcela...@apache.org>
> wrote:
>
>> On Tue, 13 Jul 2021, at 01:07, Bhalchandra Pandit wrote:
>> > Thanks for showing interest and for raising valid questions.
>> >
>> > I do not currently have a forked branch to show how it is done.
>> However, I
>> > can certainly make it happen. It will take me a couple of weeks or so
>> to do
>> > it. I will need to get myself familiarized with the contribution
>> process.
>>
>> The short version is that you need to open a ticket on Jira [1] and then
>> send a PR against the master branch on Github [2] with the title
>> "THRIFT-NNNN: Your title". Everything else can be figured out later.
>>
>> [1] https://issues.apache.org/jira/projects/THRIFT/issues
>>
>> [2] https://github.com/apache/thrift/
>>
>> >
>> > To answer other questions:
>> >
>> >    1. The current implementation is in Java. However, it can also be
>> easily
>> >    ported to other languages. Happy to help in that direction as well.
>> >    2. The solution does not modify the definition of any structure (eg,
>> >    Payload => SlimPayload). The output instance has the same type
>> regardless
>> >    of whether we use full deserialization or partial deserialization.
>> In that
>> >    sense, it is 100% compatible in both directions (partial <=> full).
>> The
>> >    solution enables selectively deserializing any arbitrary nested
>> subset.
>> >
>> > Kumar
>> >
>> > On Mon, Jul 12, 2021 at 4:10 PM Duru Can Celasun <dcela...@apache.org>
>> > wrote:
>> >
>> > > This is definitely an interesting problem at scale and it'd be great
>> to
>> > > have a solution upstream.
>> > >
>> > > I second Yuxuan's questions. From the blog post it seems you have an
>> > > implementation for Java, but it would be great to have at least one
>> more.
>> > >
>> > > On Tue, 13 Jul 2021, at 00:00, Yuxuan Wang wrote:
>> > > >  Hi Bhalchandra,
>> > > >
>> > > > Do you have any open source code (e.g. forked thrift) to show how
>> you
>> > > > implemented it? The blog post stated what's the problem but really
>> lack
>> > > > information on the solution side:
>> > > >
>> > > > 1. How did you solve the problem?
>> > > > 2. In which language(s) did you implement the solution?
>> > > >
>> > > > Without that information it's really not much we can do here.
>> > > >
>> > > > Also just based on the problem you described, a simple solution
>> would be
>> > > > just to duplicate the struct definition and remove the fields you
>> don't
>> > > > need, for example:
>> > > >
>> > > > // Original struct
>> > > > struct Payload {
>> > > >   1: optional Type1 field1,
>> > > >   2: optional Type2 field2,
>> > > >   3: optional Type3 field3,
>> > > >   ...
>> > > > }
>> > > >
>> > > > // Slim struct
>> > > > struct SlimPayload {
>> > > >   1: optional Type1 field1,
>> > > >   // field2 removed because we don't care about it in this use case
>> > > >   3: optional SlimType3 field3,
>> > > >   ...
>> > > > }
>> > > >
>> > > > But of course it's hard to keep SlimPayload in sync with the
>> original
>> > > > Payload so I can see there are some values to have some helpers to
>> help
>> > > > that, but as long as you don't make breaking changes into Payload
>> "keep
>> > > > them in sync" is a false problem.
>> > > >
>> > > > On Mon, Jul 12, 2021 at 12:56 PM Bhalchandra Pandit
>> > > > <kpan...@pinterest.com.invalid> wrote:
>> > > >
>> > > > > Hi All,
>> > > > > I work for Pinterest. I developed a technique for partial
>> > > deserialization
>> > > > > of Thrift that has been very useful in significantly improving
>> > > efficiency
>> > > > > of the data processing at Pinterest. I would like to contribute
>> that
>> > > > > feature to Apache Thrift. More details on this technique are
>> available
>> > > in
>> > > > > this blog I recently wrote:
>> > > > >
>> > > > >
>> > >
>> https://medium.com/pinterest-engineering/improving-data-processing-efficiency-using-partial-deserialization-of-thrift-16bc3a4a38b4
>> > > > >
>> > > > > I would like to know if any of you are interested in helping with
>> > > > > contributing this work to the main branch.
>> > > > >
>> > > > > Kumar
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to