Added a section for flume based on the feedback. Thanks
On Mon, Sep 26, 2016 at 8:51 AM, Pramod Immaneni <[email protected]> wrote: > Hi Thomas, > > My responses are inline > > On Sun, Sep 25, 2016 at 11:39 AM, Thomas Weise <[email protected]> > wrote: > >> Thanks for putting it together. It looks like there are really only 2 >> operators? >> > > There were others but looked like they were already good implementations > or alternatives for it in Malhar. For example, enrichment and deduper have > implementations already, for laggards operator looked like the concept is > already covered in the new windowing work. > > >> >> +1 for the Flume connector. It would be good to also look what has changed >> in Flume since it was written. It needs its own Maven module and >> documentation is also needed. >> > > Yes in the table in the document I have it going to its own module and > path. Will make a note in the document about checking against newer flume > versions and documentation. > > >> I don't agree with the proposed "as-is" move for the dimension compute >> operator into contrib. It does not belong there. Contrib is for new, >> incomplete work ("immature" and under the radar WRT CI etc.), with >> particular focus to provide an easier entry path for new contributors. >> >> I would like to see the following changes to dimension computation: >> * Replace HDHT with managed state (or spillable DS) >> * Move to org.apache.apex.malhar.lib.* >> * Documentation (your draft is a good start towards that), it also needs >> to >> cover query support. >> >> I think it is a very valuable operator that should be a first class >> citizen >> and the folks familiar with the operator and state management should take >> up the work to port it. Tim indicated he may be able to take it up. >> >> In the meantime, the operator can remain in the Megh repository under >> existing name and consumed from there. >> > > I thought it could eventually have its own module under Malhar but > suggested contrib as an intermediate location till any porting is > completed. I agree with the documentation, I just wrote up something quick > to highlight the operator, Tim has more detailed docs for it I think. Since > the operator(s) are readily usable in production applications, implement > quite a bit of functionality and provide valuable functionality, I am of > the opinion that we do the minimal now to make it available and parallely > start the work on porting some of the internal subsystems to newer > components. > > Thanks > > >> >> Thomas >> >> On Sat, Sep 24, 2016 at 12:29 PM, Pramod Immaneni <[email protected] >> > >> wrote: >> >> > Hi, >> > >> > Here is the initial proposal. Please go through it and you can comment >> > right on the document. Regarding the discussions around Dimensional >> > operators, there is a specific section for it and future plans. After >> the >> > comments are addressed, I can start with one of the components such as >> > flume and document the steps involved. Then others can take up the other >> > components and use the steps in a similar fashion. >> > >> > https://docs.google.com/document/d/1BzWAwJDEUs0G42DWTuGYvM5sm0Uu5 >> > nTP7cUQOAlVs0g >> > >> > Thanks >> > >> > On Sat, Sep 10, 2016 at 10:29 AM, Amol Kekre <[email protected]> >> wrote: >> > >> > > Thomas, >> > > IMHO we should also look at the cost to users on keeping code in a >> github >> > > (even if under ASF 2.0 license) outside Malhar. There is value to >> > > deprecating code in Megh, and moving it to Malhar. Volunteers in this >> > > effort could decide on how much overlap means "mark as overlapping", >> My >> > > suggesstion is to absorb overlapping operators into a directory in >> Malhar >> > > that marks it as such. A lot of these operators are being used in >> > > production and it make sense to absorb them into Apache gitHub. >> > > >> > > Thks >> > > Amol >> > > >> > > >> > > >> > > >> > > On Sat, Sep 10, 2016 at 7:20 AM, Pramod Immaneni < >> [email protected] >> > > >> > > wrote: >> > > >> > > > It would be great to have Tim's help with dimension computation but >> I >> > > > think we can still debate whether HDHT dependency needs to be >> removed >> > > > before contribution or whether it can be done as a two step process >> > > > since we also have a place to put experimental code contrib and HDHT >> > > > could go in there till we can determine/port it to use managed. >> state. >> > > > >> > > > My thought on this is that if it is going to be a significant >> porting >> > > > effort then we do it as a two step process. >> > > > >> > > > Thanks >> > > > >> > > > > On Sep 9, 2016, at 11:52 PM, Thomas Weise <[email protected] >> > >> > > > wrote: >> > > > > >> > > > > Tim, >> > > > > >> > > > > The functionality of the dimension compute operator should be >> > available >> > > > in >> > > > > Malhar. My concern is moving things without regard to code >> > duplication >> > > > and >> > > > > long term maintenance cost. There are several pieces to the >> dimension >> > > > > compute operator that in fact are (or should be) reusable >> components >> > by >> > > > > themselves. Live querying (queryable state) with schemas is one >> such >> > > > > example. It's a major feature and not limited to the dimension >> > compute >> > > > > operator. It should ideally work with the new windowing support as >> > > well. >> > > > > But the main area that needs work is the state store - the >> dependency >> > > on >> > > > > HDHT needs to be removed and replaced with managed state. Also I'm >> > > > curious >> > > > > why the window operator should not scale for large time buckets? >> Are >> > > you >> > > > > referring to the current intermediate implementation or the work >> in >> > > > > progress that will use incremental state saving? If so, please >> bring >> > it >> > > > up >> > > > > on APEXMALHAR-2130 as it is pretty important. >> > > > > >> > > > > Since you have written almost all of the dimension compute code, >> > could >> > > > you >> > > > > help with the changes needed to bring it over? It would also be >> good >> > to >> > > > see >> > > > > the user documentation in Malhar. >> > > > > >> > > > > Thanks, >> > > > > Thomas >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > On Fri, Sep 9, 2016 at 10:52 PM, Timothy Farkas < >> > > > [email protected]> >> > > > > wrote: >> > > > > >> > > > >> Hi Thomas, >> > > > >> >> > > > >> With respect to the dimension operator, I would like to learn >> more >> > > about >> > > > >> the underlying framework you mentioned and the code duplication. >> If >> > > you >> > > > are >> > > > >> talking about the Window operator framework, that framework is >> not >> > > > suitable >> > > > >> for the dimension computation use case because it doesn't scale >> for >> > > > large >> > > > >> timebuckets. Furthermore that framework has no support for >> Querying. >> > > The >> > > > >> dimension operators support live queries of the aggregated data. >> > > > Querying >> > > > >> of live data streams is a popular feature in other open source >> > > > platforms, >> > > > >> and I believe it is a worthwhile addition to Malhar. >> > > > >> >> > > > >> Given the fact that the dimension framework has been used in many >> > POCs >> > > > and >> > > > >> is even running in production and has novel features like live >> > > > querying, it >> > > > >> more than meets the bar for a malhar contribution. If a concrete >> > > > argument >> > > > >> cannot be provided to prevent this work from going into Malhar, >> then >> > > > these >> > > > >> efforts should not be blocked. >> > > > >> >> > > > >> Thanks, >> > > > >> Tim >> > > > >> >> > > > >>> On 2016-09-09 17:18 (-0700), Thomas Weise < >> [email protected]> >> > > > wrote: >> > > > >>> I see no reason to move the dimension operator along with >> > everything >> > > it >> > > > >>> duplicates to Malhar. It's available to use for everyone as it >> is >> > and >> > > > >> there >> > > > >>> should be an initiative to make it confirm to the underlying >> > > framework >> > > > to >> > > > >>> be part of Malhar. >> > > > >>> >> > > > >>> Also there is already an enrichment operator, there is even >> > > > documentation >> > > > >>> for it. >> > > > >>> >> > > > >>> Hence, this needs to be analyzed properly. >> > > > >>> >> > > > >>> Thomas >> > > > >>> >> > > > >>> On Fri, Sep 9, 2016 at 5:10 PM, Pramod Immaneni < >> > > > [email protected]> >> > > > >>> wrote: >> > > > >>> >> > > > >>>> Yes, I do plan to come up with a proposal with a list. The ones >> > that >> > > > >> come >> > > > >>>> to mind are flume, enrichment, various dimensional operators >> and >> > any >> > > > >> custom >> > > > >>>> partitioners. The dimensional operators are in a mature state >> and >> > > > >> usable >> > > > >>>> today, in future they could also be ported onto the new >> windowing >> > > and >> > > > >>>> managed state operator framework. >> > > > >>>> >> > > > >>>> Thanks >> > > > >>>> >> > > > >>>> On Fri, Sep 9, 2016 at 4:29 PM, Thomas Weise < >> > > [email protected]> >> > > > >>>> wrote: >> > > > >>>> >> > > > >>>>> A cursory look suggests there is a lot of overlap. I'm looking >> > > > >> forward to >> > > > >>>>> see a proposal that reflects a vision how to evolve Malhar >> rather >> > > > >> than >> > > > >>>> just >> > > > >>>>> moving around code. >> > > > >>>>> >> > > > >>>>> Thomas >> > > > >>>>> >> > > > >>>>> >> > > > >>>>> On Thu, Sep 8, 2016 at 2:40 PM, Pramod Immaneni < >> > > > >> [email protected]> >> > > > >>>>> wrote: >> > > > >>>>> >> > > > >>>>>> Hi, >> > > > >>>>>> >> > > > >>>>>> DataTorrent, the initial contributor to Apex and the company >> I >> > > work >> > > > >>>> for, >> > > > >>>>>> has opened up a library of operators called Megh recently to >> the >> > > > >> public >> > > > >>>>> and >> > > > >>>>>> has made the repository available under the Apache License. >> The >> > > > >> link to >> > > > >>>>> the >> > > > >>>>>> repository is below. These operators, for the most part, >> contain >> > > > >>>>>> functionality that is complementary to what Malhar library >> > > > >> provides and >> > > > >>>>>> were developed to solve business use cases that arose over >> time. >> > > > >> Also, >> > > > >>>>> some >> > > > >>>>>> operators in Malhar were inspired from early implementations >> in >> > > the >> > > > >>>> Megh >> > > > >>>>>> library and were built upon knowledge gained in doing the >> > original >> > > > >>>>>> implementations. >> > > > >>>>>> >> > > > >>>>>> Our goal is to not have Megh as a separate library but rather >> > > bring >> > > > >>>> these >> > > > >>>>>> operators into Malhar in a fashion that it is consistent with >> > the >> > > > >>>> Malhar >> > > > >>>>>> project and repository. In the upcoming days, in a gradual >> > > > >> fashion, we >> > > > >>>>> will >> > > > >>>>>> have more details on the individual operators that we would >> like >> > > to >> > > > >>>>>> contribute. Also, if you are interested in helping with this >> > > effort >> > > > >>>>> please >> > > > >>>>>> raise your hand. >> > > > >>>>>> >> > > > >>>>>> https://github.com/DataTorrent/Megh/ >> > > > >>>>>> >> > > > >>>>>> Thanks >> > > > >> >> > > > >> > > >> > >> > >
