Hi all,

Thanks for the feedback.

1. Indeed, the codebase is still under a private repository. We intend to
have it ready to share publicly later this March.
2. The project is built in Python and Java.This is due to the fact that we
have deep integrations with open source projects written in these languages.
We also considered the fact that it is used by both data scientists and
data engineers and we believe a combination of Python/Java will promote
collaboration and contribution.
3. Rainbow project intends to facilitate and simplify the composition of
complex pipelines, which are based on other open source projects.
As such it does not compete or overlap but rather complement these projects.
4. Re: DLAB project - as we see it this project focuses in the research
phase, while Rainbow's focus is in the production phase.
Seems the 2 projects complement each other and it would be very interesting
for us to collaborate with the DLAB team.
5. We will adjust the proposal to provide more details on how other Apache
projects are used in Rainbow.
We currently mainly use Apache Airflow in order to run pipelines defined by
users in our APIs (YAML, with plans of UI/REST), this reduces the
engineering requirements for transitioning data science code into
production. We also leverage Apache Spark and Apache Hive for data
preparation features and there are plans to integrate with Apache Karaf as
well.

Thanks,
Aviem

On Sat, Feb 22, 2020 at 4:29 AM Paul King <pa...@asert.com.au> wrote:

> Indeed, it does sound interesting.
>
> I would find it useful if the "existing Apache projects" bit of "Rainbow is
> in development, leveraging existing Apache projects." could be expanded in
> any way. I know there is a list of external dependencies later but  any
> further description of how those technologies are used would be helpful.
>
> Also, I'd be interested in knowing how the proposal relates to DLAB:
> https://dlab.apache.org/
>
> Nice work.
>
> Cheers, Paul.
>
>
>
> On Sat, Feb 22, 2020 at 2:34 AM larry mccay <lmc...@apache.org> wrote:
>
> > This seems like an interesting proposal.
> >
> > Couple points/questions:
> >
> > * The existing source is not available for viewing as it is still in
> > private repos?
> > * Is it a primarily java project?
> > * It seems the intent of Rainbow is to not compete or overlap with the
> > Hadoop ecosystem projects but rather to provide an efficient interface
> > above them - correct?
> >
> >
> > On Fri, Feb 21, 2020 at 8:51 AM Aviem Zur <aviem...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > We would like to propose Rainbow as an Apache incubator project.
> Rainbow
> > is
> > > an end-to-end platform for data engineers & scientists, allowing them
> to
> > > build, train and deploy machine learning models in a robust and agile
> > way.
> > > The project's goal is to operationalize the machine learning process,
> > > allowing data scientists to quickly transition from a successful
> > experiment
> > > to an automated pipeline in production.
> > >
> > > The proposal can be found here:
> > > https://cwiki.apache.org/confluence/display/INCUBATOR/Apache+Rainbow
> > >
> > > We would appreciate your feedback and thoughts on the proposal.
> > >
> > > Thanks,
> > > Aviem
> > >
> >
>

Reply via email to