Presentation of the architectural proposal

2020-08-15 Thread Lucas Cardoso Silva
Hi guys,

To continue our goal to evaluate the architecture, we have to present the
proposal for all involved. I will take the opportunity to discuss some
points about the implementation of the daemon and new CLI.

Each engine has a Dockerfile for development where all dependencies related
to the operating system will be declared. When the CLI is invoked for the
engine, the image is created and the container starts, with the daemon as
the main execution. All modifications to the engine are saved directly to
the files (via bind mount). All commands, except the creational and
versioning commands, run on the daemon (called by the CLI). Do you agree
with this approach? It’s a little different from that represented in the
image, which represents the use of a volume.

Architectural proposal image:
https://github.com/lucasbm88/incubator-apache-marvin/wiki/Development-Resources#v005---refactoring-architecture

Image of the current architecture:
https://docs.google.com/drawings/d/1UOR8Bk0fpLAOnotdeAn4Ww1GBbRxhdtM5mfoTJKQNVo/edit?usp=sharing

Thanks a lot,
Lucas


Re: Marvin’s mission discussion

2020-08-15 Thread Lucas Cardoso Silva
Good! I agree.

The Apache Marvin-AI platform aims to offer a practical and standardized
solution to help its users to perform data exploration, model development
and application lifecycle management for artificial intelligence tasks,
aiming to offer: scalability, language agnosticism and a standardized
pipeline.

Something like this?


Em sáb., 15 de ago. de 2020 às 16:03, Daniel Takabayashi <
daniel.takabaya...@gmail.com> escreveu:

> +1
>
> Em sáb., 15 de ago. de 2020 às 08:57, Lucas Bonatto Miguel <
> lucasb...@apache.org> escreveu:
>
> > It's good, the only thing I would change would be to mention what sort of
> > applications. Although we have AI in the name, one may mistakenly think
> > Marvin is intended to serve any type of application.
> >
> > Best
> >
> > On Fri, Aug 14, 2020 at 11:37 AM Lucas Cardoso Silva <
> > cardosolucas61@gmail.com> wrote:
> >
> > > Hi guys,
> > >
> > > Here comes the summarized Marvin mission:
> > >
> > > The Apache Marvin-AI platform aims to offer a practical and
> standardized
> > > solution to help its users to perform data exploration, model
> development
> > > and application lifecycle management, aiming to offer: scalability,
> > > language agnosticism and a standardized pipeline.
> > >
> > > Thanks for the help,
> > > Lucas Cardoso
> > >
> > > Em qua., 29 de jul. de 2020 às 17:05, Lucas Cardoso Silva <
> > > cardosolucas61@gmail.com> escreveu:
> > >
> > > > Hi guys!
> > > > Great Lucas, I will wait a couple of days to see if anyone has other
> > > > things to add, and then we can close this phase!
> > > >
> > > > Wei, we can discuss how to make the data pipelines easier to the
> users
> > in
> > > > another step of the evaluation. With the experience of the users and
> > > > developers with this topic we can track their needs better and make
> > > > use-case scenarios. I agree with you that data preparation is messy
> and
> > > can
> > > > take a lot of time and will be great if Marvin could help in that.
> > > >
> > > > Best regards,
> > > > Lucas
> > > >
> > > >
> > > > Em qua., 29 de jul. de 2020 às 11:59, Wei Chen 
> > > > escreveu:
> > > >
> > > >> Hello Lucas,
> > > >>
> > > >> I am thinking of processing JSON or XML files with a hierarchy
> dynamic
> > > >> structure.
> > > >> Or building a pipeline to crop image with object detection metadata.
> > > >> Data preparation can be very messy,
> > > >> I wonder if we can have a stage to handle both batch and streaming
> > > >> processing well.
> > > >>
> > > >> I simply think we don't need to focus on this part since we can
> > utilize
> > > a
> > > >> wide variety of tools for our specific needs.
> > > >>
> > > >> Best Regards,
> > > >> Wei
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Jul 29, 2020 at 8:48 PM Lucas Bonatto Miguel <
> > > >> lucasb...@apache.org>
> > > >> wrote:
> > > >>
> > > >> > Hi folks,
> > > >> >
> > > >> > In regards to the mission, you're correct. If I could summarize
> it,
> > it
> > > >> > would be like: *to help its users to perform data exploration,
> model
> > > >> > development and application lifecycle management*.
> > > >> >
> > > >> > I'm all in for having a better integration with Kubernetes. I
> think
> > > that
> > > >> > the first step is to create a new thread in order to design
> > something
> > > >> > following their operator pattern:
> > > >> > https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
> > > >> >
> > > >> > Wei, currently one already can perform merges and joins in the
> > > >> > transformation step. Could you comment a bit more on what you
> think
> > we
> > > >> > could improve there? Maybe something for a new thread as well?
> > > >> >
> > > >> > Best!
> > > >> > Lucas
> > > >> >
> > > >> > On Wed, Jul 29, 2020 at 1:24 AM Wei Chen 
> > wrote:
> > > >> >
> > > >> > > I think deploying to K8S does expend our capabilities for
> > inference
> > > >> > scaling
> > > >> > > and managing.
> > > >> > > I am not familiar with Luigi, but it makes sense since we are
> > going
> > > to
> > > >> > > setup data pipelines.
> > > >> > >
> > > >> > > Best Regards,
> > > >> > > Wei
> > > >> > >
> > > >> > > On Wed, Jul 29, 2020 at 5:32 AM Lucas Cardoso Silva <
> > > >> > > cardosolucas61@gmail.com> wrote:
> > > >> > >
> > > >> > > > Great Wei! I find the suggestions really interesting. I think
> we
> > > can
> > > >> > work
> > > >> > > > with the deployment on K8s. The idea of it in Marvin would be,
> > > after
> > > >> > > > development, the user would give some parameters and a script
> > > would
> > > >> > > > facilitate a deployment in a kubernetes cluster, right?
> > Regarding
> > > >> data
> > > >> > > > acquisition, I think it would be great if we were able to
> > > integrate
> > > >> > some
> > > >> > > > third party library like Luigi. Thanks!
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > Em qua., 22 de jul. de 2020 às 14:27, Wei Chen <
> > > weic...@apache.org>
> > > >> > > > escreveu:
> > > >> > > >
> > > >> > > > > Hello Lucas,

Re: Marvin’s mission discussion

2020-08-15 Thread Daniel Takabayashi
+1

Em sáb., 15 de ago. de 2020 às 08:57, Lucas Bonatto Miguel <
lucasb...@apache.org> escreveu:

> It's good, the only thing I would change would be to mention what sort of
> applications. Although we have AI in the name, one may mistakenly think
> Marvin is intended to serve any type of application.
>
> Best
>
> On Fri, Aug 14, 2020 at 11:37 AM Lucas Cardoso Silva <
> cardosolucas61@gmail.com> wrote:
>
> > Hi guys,
> >
> > Here comes the summarized Marvin mission:
> >
> > The Apache Marvin-AI platform aims to offer a practical and standardized
> > solution to help its users to perform data exploration, model development
> > and application lifecycle management, aiming to offer: scalability,
> > language agnosticism and a standardized pipeline.
> >
> > Thanks for the help,
> > Lucas Cardoso
> >
> > Em qua., 29 de jul. de 2020 às 17:05, Lucas Cardoso Silva <
> > cardosolucas61@gmail.com> escreveu:
> >
> > > Hi guys!
> > > Great Lucas, I will wait a couple of days to see if anyone has other
> > > things to add, and then we can close this phase!
> > >
> > > Wei, we can discuss how to make the data pipelines easier to the users
> in
> > > another step of the evaluation. With the experience of the users and
> > > developers with this topic we can track their needs better and make
> > > use-case scenarios. I agree with you that data preparation is messy and
> > can
> > > take a lot of time and will be great if Marvin could help in that.
> > >
> > > Best regards,
> > > Lucas
> > >
> > >
> > > Em qua., 29 de jul. de 2020 às 11:59, Wei Chen 
> > > escreveu:
> > >
> > >> Hello Lucas,
> > >>
> > >> I am thinking of processing JSON or XML files with a hierarchy dynamic
> > >> structure.
> > >> Or building a pipeline to crop image with object detection metadata.
> > >> Data preparation can be very messy,
> > >> I wonder if we can have a stage to handle both batch and streaming
> > >> processing well.
> > >>
> > >> I simply think we don't need to focus on this part since we can
> utilize
> > a
> > >> wide variety of tools for our specific needs.
> > >>
> > >> Best Regards,
> > >> Wei
> > >>
> > >>
> > >>
> > >> On Wed, Jul 29, 2020 at 8:48 PM Lucas Bonatto Miguel <
> > >> lucasb...@apache.org>
> > >> wrote:
> > >>
> > >> > Hi folks,
> > >> >
> > >> > In regards to the mission, you're correct. If I could summarize it,
> it
> > >> > would be like: *to help its users to perform data exploration, model
> > >> > development and application lifecycle management*.
> > >> >
> > >> > I'm all in for having a better integration with Kubernetes. I think
> > that
> > >> > the first step is to create a new thread in order to design
> something
> > >> > following their operator pattern:
> > >> > https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
> > >> >
> > >> > Wei, currently one already can perform merges and joins in the
> > >> > transformation step. Could you comment a bit more on what you think
> we
> > >> > could improve there? Maybe something for a new thread as well?
> > >> >
> > >> > Best!
> > >> > Lucas
> > >> >
> > >> > On Wed, Jul 29, 2020 at 1:24 AM Wei Chen 
> wrote:
> > >> >
> > >> > > I think deploying to K8S does expend our capabilities for
> inference
> > >> > scaling
> > >> > > and managing.
> > >> > > I am not familiar with Luigi, but it makes sense since we are
> going
> > to
> > >> > > setup data pipelines.
> > >> > >
> > >> > > Best Regards,
> > >> > > Wei
> > >> > >
> > >> > > On Wed, Jul 29, 2020 at 5:32 AM Lucas Cardoso Silva <
> > >> > > cardosolucas61@gmail.com> wrote:
> > >> > >
> > >> > > > Great Wei! I find the suggestions really interesting. I think we
> > can
> > >> > work
> > >> > > > with the deployment on K8s. The idea of it in Marvin would be,
> > after
> > >> > > > development, the user would give some parameters and a script
> > would
> > >> > > > facilitate a deployment in a kubernetes cluster, right?
> Regarding
> > >> data
> > >> > > > acquisition, I think it would be great if we were able to
> > integrate
> > >> > some
> > >> > > > third party library like Luigi. Thanks!
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > Em qua., 22 de jul. de 2020 às 14:27, Wei Chen <
> > weic...@apache.org>
> > >> > > > escreveu:
> > >> > > >
> > >> > > > > Hello Lucas,
> > >> > > > >
> > >> > > > > I have some ideas:
> > >> > > > >
> > >> > > > > 1. Should we consider to use K8S or similar tools for
> inference
> > >> > > container
> > >> > > > > scaling and management?
> > >> > > > > Marvin's current container management is not as powerful as
> some
> > >> > > > container
> > >> > > > > focus projects.
> > >> > > > > K8S can also be deployed into most environments now.
> > >> > > > >
> > >> > > > > 2. Is our current data cleaning stage flexible enough for
> > multiple
> > >> > data
> > >> > > > > sources with table join?
> > >> > > > > Or if we should cut the data preparation stage out for the
> user
> > to
> > >> > make
> > >> > > > > their own data pipeline on their data stor

Re: Marvin’s mission discussion

2020-08-15 Thread Lucas Bonatto Miguel
It's good, the only thing I would change would be to mention what sort of
applications. Although we have AI in the name, one may mistakenly think
Marvin is intended to serve any type of application.

Best

On Fri, Aug 14, 2020 at 11:37 AM Lucas Cardoso Silva <
cardosolucas61@gmail.com> wrote:

> Hi guys,
>
> Here comes the summarized Marvin mission:
>
> The Apache Marvin-AI platform aims to offer a practical and standardized
> solution to help its users to perform data exploration, model development
> and application lifecycle management, aiming to offer: scalability,
> language agnosticism and a standardized pipeline.
>
> Thanks for the help,
> Lucas Cardoso
>
> Em qua., 29 de jul. de 2020 às 17:05, Lucas Cardoso Silva <
> cardosolucas61@gmail.com> escreveu:
>
> > Hi guys!
> > Great Lucas, I will wait a couple of days to see if anyone has other
> > things to add, and then we can close this phase!
> >
> > Wei, we can discuss how to make the data pipelines easier to the users in
> > another step of the evaluation. With the experience of the users and
> > developers with this topic we can track their needs better and make
> > use-case scenarios. I agree with you that data preparation is messy and
> can
> > take a lot of time and will be great if Marvin could help in that.
> >
> > Best regards,
> > Lucas
> >
> >
> > Em qua., 29 de jul. de 2020 às 11:59, Wei Chen 
> > escreveu:
> >
> >> Hello Lucas,
> >>
> >> I am thinking of processing JSON or XML files with a hierarchy dynamic
> >> structure.
> >> Or building a pipeline to crop image with object detection metadata.
> >> Data preparation can be very messy,
> >> I wonder if we can have a stage to handle both batch and streaming
> >> processing well.
> >>
> >> I simply think we don't need to focus on this part since we can utilize
> a
> >> wide variety of tools for our specific needs.
> >>
> >> Best Regards,
> >> Wei
> >>
> >>
> >>
> >> On Wed, Jul 29, 2020 at 8:48 PM Lucas Bonatto Miguel <
> >> lucasb...@apache.org>
> >> wrote:
> >>
> >> > Hi folks,
> >> >
> >> > In regards to the mission, you're correct. If I could summarize it, it
> >> > would be like: *to help its users to perform data exploration, model
> >> > development and application lifecycle management*.
> >> >
> >> > I'm all in for having a better integration with Kubernetes. I think
> that
> >> > the first step is to create a new thread in order to design something
> >> > following their operator pattern:
> >> > https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
> >> >
> >> > Wei, currently one already can perform merges and joins in the
> >> > transformation step. Could you comment a bit more on what you think we
> >> > could improve there? Maybe something for a new thread as well?
> >> >
> >> > Best!
> >> > Lucas
> >> >
> >> > On Wed, Jul 29, 2020 at 1:24 AM Wei Chen  wrote:
> >> >
> >> > > I think deploying to K8S does expend our capabilities for inference
> >> > scaling
> >> > > and managing.
> >> > > I am not familiar with Luigi, but it makes sense since we are going
> to
> >> > > setup data pipelines.
> >> > >
> >> > > Best Regards,
> >> > > Wei
> >> > >
> >> > > On Wed, Jul 29, 2020 at 5:32 AM Lucas Cardoso Silva <
> >> > > cardosolucas61@gmail.com> wrote:
> >> > >
> >> > > > Great Wei! I find the suggestions really interesting. I think we
> can
> >> > work
> >> > > > with the deployment on K8s. The idea of it in Marvin would be,
> after
> >> > > > development, the user would give some parameters and a script
> would
> >> > > > facilitate a deployment in a kubernetes cluster, right? Regarding
> >> data
> >> > > > acquisition, I think it would be great if we were able to
> integrate
> >> > some
> >> > > > third party library like Luigi. Thanks!
> >> > > >
> >> > > >
> >> > > >
> >> > > > Em qua., 22 de jul. de 2020 às 14:27, Wei Chen <
> weic...@apache.org>
> >> > > > escreveu:
> >> > > >
> >> > > > > Hello Lucas,
> >> > > > >
> >> > > > > I have some ideas:
> >> > > > >
> >> > > > > 1. Should we consider to use K8S or similar tools for inference
> >> > > container
> >> > > > > scaling and management?
> >> > > > > Marvin's current container management is not as powerful as some
> >> > > > container
> >> > > > > focus projects.
> >> > > > > K8S can also be deployed into most environments now.
> >> > > > >
> >> > > > > 2. Is our current data cleaning stage flexible enough for
> multiple
> >> > data
> >> > > > > sources with table join?
> >> > > > > Or if we should cut the data preparation stage out for the user
> to
> >> > make
> >> > > > > their own data pipeline on their data storage.
> >> > > > > I figured that preprocessing might be too complex to be
> >> generalized
> >> > for
> >> > > > > different ML projects.
> >> > > > >
> >> > > > > Best Regards
> >> > > > > Wei
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Thu, Jul 23, 2020 at 12:26 AM Lucas Cardoso Silva <
> >> > > > > cardosolucas61@gmail.com> wrote:
> >> > > > >
> >> > > > > > Hi guys.
> >> >