Hi Thomas,

My responses are inline

On Sun, Sep 25, 2016 at 11:39 AM, Thomas Weise <[email protected]>
wrote:

> Thanks for putting it together. It looks like there are really only 2
> operators?
>

There were others but looked like they were already good implementations or
alternatives for it in Malhar. For example, enrichment and deduper have
implementations already, for laggards operator looked like the concept is
already covered in the new windowing work.


>
> +1 for the Flume connector. It would be good to also look what has changed
> in Flume since it was written. It needs its own Maven module and
> documentation is also needed.
>

Yes in the table in the document I have it going to its own module and
path. Will make a note in the document about checking against newer flume
versions and documentation.


> I don't agree with the proposed "as-is" move for the dimension compute
> operator into contrib. It does not belong there. Contrib is for new,
> incomplete work ("immature" and under the radar WRT CI etc.), with
> particular focus to provide an easier entry path for new contributors.
>
> I would like to see the following changes to dimension computation:
> * Replace HDHT with managed state (or spillable DS)
> * Move to org.apache.apex.malhar.lib.*
> * Documentation (your draft is a good start towards that), it also needs to
> cover query support.
>
> I think it is a very valuable operator that should be a first class citizen
> and the folks familiar with the operator and state management should take
> up the work to port it. Tim indicated he may be able to take it up.
>
> In the meantime, the operator can remain in the Megh repository under
> existing name and consumed from there.
>

I thought it could eventually have its own module under Malhar but
suggested contrib as an intermediate location till any porting is
completed. I agree with the documentation, I just wrote up something quick
to highlight the operator, Tim has more detailed docs for it I think. Since
the operator(s) are readily usable in production applications, implement
quite a bit of functionality and provide valuable functionality, I am of
the opinion that we do the minimal now to make it available and parallely
start the work on porting some of the internal subsystems to newer
components.

Thanks


>
> Thomas
>
> On Sat, Sep 24, 2016 at 12:29 PM, Pramod Immaneni <[email protected]>
> wrote:
>
> > Hi,
> >
> > Here is the initial proposal. Please go through it and you can comment
> > right on the document. Regarding the discussions around Dimensional
> > operators, there is a specific section for it and future plans. After the
> > comments are addressed, I can start with one of the components such as
> > flume and document the steps involved. Then others can take up the other
> > components and use the steps in a similar fashion.
> >
> > https://docs.google.com/document/d/1BzWAwJDEUs0G42DWTuGYvM5sm0Uu5
> > nTP7cUQOAlVs0g
> >
> > Thanks
> >
> > On Sat, Sep 10, 2016 at 10:29 AM, Amol Kekre <[email protected]>
> wrote:
> >
> > > Thomas,
> > > IMHO we should also look at the cost to users on keeping code in a
> github
> > > (even if under ASF 2.0 license) outside Malhar. There is value to
> > > deprecating code in Megh, and moving it to Malhar. Volunteers in this
> > > effort could decide on how much overlap means "mark as overlapping", My
> > > suggesstion is to absorb overlapping operators into a directory in
> Malhar
> > > that marks it as such. A lot of these operators are being used in
> > > production and it make sense to absorb them into Apache gitHub.
> > >
> > > Thks
> > > Amol
> > >
> > >
> > >
> > >
> > > On Sat, Sep 10, 2016 at 7:20 AM, Pramod Immaneni <
> [email protected]
> > >
> > > wrote:
> > >
> > > > It would be great to have Tim's help with dimension computation but I
> > > > think we can still debate whether HDHT dependency needs to be removed
> > > > before contribution or whether it can be done as a two step process
> > > > since we also have a place to put experimental code contrib and HDHT
> > > > could go in there till we can determine/port it to use managed.
> state.
> > > >
> > > > My thought on this is that if it is going to be a significant porting
> > > > effort then we do it as a two step process.
> > > >
> > > > Thanks
> > > >
> > > > > On Sep 9, 2016, at 11:52 PM, Thomas Weise <[email protected]>
> > > > wrote:
> > > > >
> > > > > Tim,
> > > > >
> > > > > The functionality of the dimension compute operator should be
> > available
> > > > in
> > > > > Malhar. My concern is moving things without regard to code
> > duplication
> > > > and
> > > > > long term maintenance cost. There are several pieces to the
> dimension
> > > > > compute operator that in fact are (or should be) reusable
> components
> > by
> > > > > themselves. Live querying (queryable state) with schemas is one
> such
> > > > > example. It's a major feature and not limited to the dimension
> > compute
> > > > > operator. It should ideally work with the new windowing support as
> > > well.
> > > > > But the main area that needs work is the state store - the
> dependency
> > > on
> > > > > HDHT needs to be removed and replaced with managed state. Also I'm
> > > > curious
> > > > > why the window operator should not scale for large time buckets?
> Are
> > > you
> > > > > referring to the current intermediate implementation or the work in
> > > > > progress that will use incremental state saving? If so, please
> bring
> > it
> > > > up
> > > > > on APEXMALHAR-2130 as it is pretty important.
> > > > >
> > > > > Since you have written almost all of the dimension compute code,
> > could
> > > > you
> > > > > help with the changes needed to bring it over? It would also be
> good
> > to
> > > > see
> > > > > the user documentation in Malhar.
> > > > >
> > > > > Thanks,
> > > > > Thomas
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Sep 9, 2016 at 10:52 PM, Timothy Farkas <
> > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > >> Hi Thomas,
> > > > >>
> > > > >> With respect to the dimension operator, I would like to learn more
> > > about
> > > > >> the underlying framework you mentioned and the code duplication.
> If
> > > you
> > > > are
> > > > >> talking about the Window operator framework, that framework is not
> > > > suitable
> > > > >> for the dimension computation use case because it doesn't scale
> for
> > > > large
> > > > >> timebuckets. Furthermore that framework has no support for
> Querying.
> > > The
> > > > >> dimension operators support live queries of the aggregated data.
> > > > Querying
> > > > >> of live data streams is a popular feature in other open source
> > > > platforms,
> > > > >> and I believe it is a worthwhile addition to Malhar.
> > > > >>
> > > > >> Given the fact that the dimension framework has been used in many
> > POCs
> > > > and
> > > > >> is even running in production and has novel features like live
> > > > querying, it
> > > > >> more than meets the bar for a malhar contribution. If a concrete
> > > > argument
> > > > >> cannot be provided to prevent this work from going into Malhar,
> then
> > > > these
> > > > >> efforts should not be blocked.
> > > > >>
> > > > >> Thanks,
> > > > >> Tim
> > > > >>
> > > > >>> On 2016-09-09 17:18 (-0700), Thomas Weise <
> [email protected]>
> > > > wrote:
> > > > >>> I see no reason to move the dimension operator along with
> > everything
> > > it
> > > > >>> duplicates to Malhar. It's available to use for everyone as it is
> > and
> > > > >> there
> > > > >>> should be an initiative to make it confirm to the underlying
> > > framework
> > > > to
> > > > >>> be part of Malhar.
> > > > >>>
> > > > >>> Also there is already an enrichment operator, there is even
> > > > documentation
> > > > >>> for it.
> > > > >>>
> > > > >>> Hence, this needs to be analyzed properly.
> > > > >>>
> > > > >>> Thomas
> > > > >>>
> > > > >>> On Fri, Sep 9, 2016 at 5:10 PM, Pramod Immaneni <
> > > > [email protected]>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> Yes, I do plan to come up with a proposal with a list. The ones
> > that
> > > > >> come
> > > > >>>> to mind are flume, enrichment, various dimensional operators and
> > any
> > > > >> custom
> > > > >>>> partitioners. The dimensional operators are in a mature state
> and
> > > > >> usable
> > > > >>>> today, in future they could also be ported onto the new
> windowing
> > > and
> > > > >>>> managed state operator framework.
> > > > >>>>
> > > > >>>> Thanks
> > > > >>>>
> > > > >>>> On Fri, Sep 9, 2016 at 4:29 PM, Thomas Weise <
> > > [email protected]>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> A cursory look suggests there is a lot of overlap. I'm looking
> > > > >> forward to
> > > > >>>>> see a proposal that reflects a vision how to evolve Malhar
> rather
> > > > >> than
> > > > >>>> just
> > > > >>>>> moving around code.
> > > > >>>>>
> > > > >>>>> Thomas
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Thu, Sep 8, 2016 at 2:40 PM, Pramod Immaneni <
> > > > >> [email protected]>
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>>> Hi,
> > > > >>>>>>
> > > > >>>>>> DataTorrent, the initial contributor to Apex and the company I
> > > work
> > > > >>>> for,
> > > > >>>>>> has opened up a library of operators called Megh recently to
> the
> > > > >> public
> > > > >>>>> and
> > > > >>>>>> has made the repository available under the Apache License.
> The
> > > > >> link to
> > > > >>>>> the
> > > > >>>>>> repository is below. These operators, for the most part,
> contain
> > > > >>>>>> functionality that is complementary to what Malhar library
> > > > >> provides and
> > > > >>>>>> were developed to solve business use cases that arose over
> time.
> > > > >> Also,
> > > > >>>>> some
> > > > >>>>>> operators in Malhar were inspired from early implementations
> in
> > > the
> > > > >>>> Megh
> > > > >>>>>> library and were built upon knowledge gained in doing the
> > original
> > > > >>>>>> implementations.
> > > > >>>>>>
> > > > >>>>>> Our goal is to not have Megh as a separate library but rather
> > > bring
> > > > >>>> these
> > > > >>>>>> operators into Malhar in a fashion that it is consistent with
> > the
> > > > >>>> Malhar
> > > > >>>>>> project and repository. In the upcoming days, in a gradual
> > > > >> fashion, we
> > > > >>>>> will
> > > > >>>>>> have more details on the individual operators that we would
> like
> > > to
> > > > >>>>>> contribute. Also, if you are interested in helping with this
> > > effort
> > > > >>>>> please
> > > > >>>>>> raise your hand.
> > > > >>>>>>
> > > > >>>>>> https://github.com/DataTorrent/Megh/
> > > > >>>>>>
> > > > >>>>>> Thanks
> > > > >>
> > > >
> > >
> >
>

Reply via email to