Hi Jens, Thanks for reviewing. For the others who may want to review, I have added README.md file that provides technical details on the overall design and includes description of the major implementation components.
Kumar On Wed, Aug 25, 2021 at 12:05 AM Jens Geyer <je...@apache.org> wrote: > Hi, > > I added myself already as a reviewer, but it certainly would help if some > other (especially Java) people could also review. > > Stay tuned, will do my part soon. > > Have fun, > JensG > > -----Ursprüngliche Nachricht----- > From: Bhalchandra Pandit > Sent: Tuesday, August 24, 2021 4:25 AM > To: dev@thrift.apache.org ; Jens Geyer > Subject: Re: partial deserialization of Thrift > > I would like to know if there are any changes that anyone would like to > suggest for the pull request <https://github.com/apache/thrift/pull/2439> > that adds support for partial Thrift deserialization. > > On Mon, Aug 16, 2021 at 12:37 PM Bhalchandra Pandit <kpan...@pinterest.com > > > wrote: > > > I created a pull request for this change: > > https://github.com/apache/thrift/pull/2439 > > > > I have also added a README.md file that provides technical details of > > partial Thrift deserialization so that it is easier to understand the > Java > > implementation. It will be also useful if this support is to be added for > > other languages. > > > > Please let me know if you have any questions or suggestions for changes > to > > the code. > > > > On Wed, Aug 4, 2021 at 12:10 AM Jens Geyer <je...@apache.org> wrote: > > > >> Hi, > >> > >> thanks! > >> > >> What I can say from a more general/formal viewpoint, the package > >> namespaces > >> etc must be adjusted (e.g. package com.pinterest.commons.thrift) and all > >> files you plan to contribute need to have the appropriate ASF license > >> header. > >> > >> What would be really nice, especially when we look forward to integrate > >> it > >> into other languages, adding some technical documentation/specification > >> to > >> make it easier to implement and have some common, documented view on the > >> feature. > >> > >> Last not least having appropriate test coverage would be great. > >> > >> Anything else beyond that ... whatever makes sense. > >> > >> Have fun, > >> JensG > >> > >> > >> -----Ursprüngliche Nachricht----- > >> From: Bhalchandra Pandit > >> Sent: Wednesday, August 4, 2021 3:33 AM > >> To: dev@thrift.apache.org > >> Subject: Re: partial deserialization of Thrift > >> > >> I have added basic set of files at the location below. Currently they > are > >> unchanged from their Pinterest version. I would like to get some > guidance > >> on the type of changes I should make. All of them have many tests though > >> they are not included in this initial skeleton for now. > >> > >> > >> > https://github.com/bhalchandrap/thrift/tree/sample/lib/java/src/org/apache/thrift/partial > >> > >> On Wed, Jul 28, 2021 at 4:59 PM Bhalchandra Pandit < > kpan...@pinterest.com > >> > > >> wrote: > >> > >> > I opened the following jira epic for this contribution: > >> > https://issues.apache.org/jira/browse/THRIFT-5443 > >> > > >> > On Mon, Jul 12, 2021 at 11:58 PM Duru Can Celasun < > dcela...@apache.org> > >> > wrote: > >> > > >> >> On Tue, 13 Jul 2021, at 01:07, Bhalchandra Pandit wrote: > >> >> > Thanks for showing interest and for raising valid questions. > >> >> > > >> >> > I do not currently have a forked branch to show how it is done. > >> >> However, I > >> >> > can certainly make it happen. It will take me a couple of weeks or > >> >> > so > >> >> to do > >> >> > it. I will need to get myself familiarized with the contribution > >> >> process. > >> >> > >> >> The short version is that you need to open a ticket on Jira [1] and > >> then > >> >> send a PR against the master branch on Github [2] with the title > >> >> "THRIFT-NNNN: Your title". Everything else can be figured out later. > >> >> > >> >> [1] https://issues.apache.org/jira/projects/THRIFT/issues > >> >> > >> >> [2] https://github.com/apache/thrift/ > >> >> > >> >> > > >> >> > To answer other questions: > >> >> > > >> >> > 1. The current implementation is in Java. However, it can also > be > >> >> easily > >> >> > ported to other languages. Happy to help in that direction as > >> well. > >> >> > 2. The solution does not modify the definition of any structure > >> (eg, > >> >> > Payload => SlimPayload). The output instance has the same type > >> >> regardless > >> >> > of whether we use full deserialization or partial > >> >> > deserialization. > >> >> In that > >> >> > sense, it is 100% compatible in both directions (partial <=> > >> full). > >> >> The > >> >> > solution enables selectively deserializing any arbitrary nested > >> >> subset. > >> >> > > >> >> > Kumar > >> >> > > >> >> > On Mon, Jul 12, 2021 at 4:10 PM Duru Can Celasun < > >> dcela...@apache.org> > >> >> > wrote: > >> >> > > >> >> > > This is definitely an interesting problem at scale and it'd be > >> great > >> >> to > >> >> > > have a solution upstream. > >> >> > > > >> >> > > I second Yuxuan's questions. From the blog post it seems you have > >> an > >> >> > > implementation for Java, but it would be great to have at least > >> >> > > one > >> >> more. > >> >> > > > >> >> > > On Tue, 13 Jul 2021, at 00:00, Yuxuan Wang wrote: > >> >> > > > Hi Bhalchandra, > >> >> > > > > >> >> > > > Do you have any open source code (e.g. forked thrift) to show > >> >> > > > how > >> >> you > >> >> > > > implemented it? The blog post stated what's the problem but > >> really > >> >> lack > >> >> > > > information on the solution side: > >> >> > > > > >> >> > > > 1. How did you solve the problem? > >> >> > > > 2. In which language(s) did you implement the solution? > >> >> > > > > >> >> > > > Without that information it's really not much we can do here. > >> >> > > > > >> >> > > > Also just based on the problem you described, a simple solution > >> >> would be > >> >> > > > just to duplicate the struct definition and remove the fields > >> >> > > > you > >> >> don't > >> >> > > > need, for example: > >> >> > > > > >> >> > > > // Original struct > >> >> > > > struct Payload { > >> >> > > > 1: optional Type1 field1, > >> >> > > > 2: optional Type2 field2, > >> >> > > > 3: optional Type3 field3, > >> >> > > > ... > >> >> > > > } > >> >> > > > > >> >> > > > // Slim struct > >> >> > > > struct SlimPayload { > >> >> > > > 1: optional Type1 field1, > >> >> > > > // field2 removed because we don't care about it in this use > >> case > >> >> > > > 3: optional SlimType3 field3, > >> >> > > > ... > >> >> > > > } > >> >> > > > > >> >> > > > But of course it's hard to keep SlimPayload in sync with the > >> >> original > >> >> > > > Payload so I can see there are some values to have some helpers > >> to > >> >> help > >> >> > > > that, but as long as you don't make breaking changes into > >> >> > > > Payload > >> >> "keep > >> >> > > > them in sync" is a false problem. > >> >> > > > > >> >> > > > On Mon, Jul 12, 2021 at 12:56 PM Bhalchandra Pandit > >> >> > > > <kpan...@pinterest.com.invalid> wrote: > >> >> > > > > >> >> > > > > Hi All, > >> >> > > > > I work for Pinterest. I developed a technique for partial > >> >> > > deserialization > >> >> > > > > of Thrift that has been very useful in significantly > improving > >> >> > > efficiency > >> >> > > > > of the data processing at Pinterest. I would like to > >> >> > > > > contribute > >> >> that > >> >> > > > > feature to Apache Thrift. More details on this technique are > >> >> available > >> >> > > in > >> >> > > > > this blog I recently wrote: > >> >> > > > > > >> >> > > > > > >> >> > > > >> >> > >> > https://medium.com/pinterest-engineering/improving-data-processing-efficiency-using-partial-deserialization-of-thrift-16bc3a4a38b4 > >> >> > > > > > >> >> > > > > I would like to know if any of you are interested in helping > >> with > >> >> > > > > contributing this work to the main branch. > >> >> > > > > > >> >> > > > > Kumar > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> > > >> > >> > >