Re: [DISCUSS] Apache Amoro proposal

2024-02-23 Thread
Hi JB,

Yes, you can say it is an abstraction layer on top of data lake table
formats and query engines and we often call it the service layer in
Lakehouse architecture. The service layer primarily provides unified
metadata and access control, as well as common audit services, and so on.
Of course, Amoro is currently focusing on automatic optimizing, helping
users to more easily use the data lake and achieve the desired analytical
performance on it. Amoro can work with other software in the service layer
and can also extend plugins to integrate more capabilities.

On Fri, Feb 23, 2024 at 10:18 PM Jean-Baptiste Onofré 
wrote:

> Hi Justin
>
> Even if it looks interesting, I'm not sure to understand exactly the
> purpose of the proposal.
>
> What lakehouse management system means exactly ? Is it an abstraction
> layer on top of Iceberg, Paimon + query engine powered by Flink,
> Spark, Trino ?
>
> Please let me know if you want an additional mentor, I would be happy to
> help.
>
> Thanks !
> Regards
> JB
>
> On Fri, Feb 23, 2024 at 9:44 AM Justin Mclean 
> wrote:
> >
> > Hi,
> >
> > I would like to propose a new project to the ASF incubator - Apache
> Amoro. I’m one of the mentors, but there are a lot of other people involved
> who have done all of the hard work.
> >
> > Amoro is a Lakehouse management system built on open data lake formats
> like Apache Iceberg and Apache Paimon (Incubating). Working with compute
> engines including Apache Flink, Apache Spark, and Trino, Amoro brings
> pluggable and self-managed features for Lakehouse to provide out-of-the-box
> data warehouse experience, and helps data platforms or products easily
> build infra-decoupled, stream-and-batch-fused and lake-native architecture.
> You can find the proposal here. [1]
> >
> > We are looking forward to anyone's feedback or questions.
> >
> > Thanks,
> > Justin
> >
> > [1] https://cwiki.apache.org/confluence/display/INCUBATOR/AmoroProposal
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [DISCUSS] Apache Amoro proposal

2024-02-23 Thread
Hi Ayush,

I am Jinsong from Amoro community.
Thank you very much for your attention and feedback on Amoro. Amoro aims to
support multiple versions of Hadoop and Hive clusters as much as possible,
allowing users to specify versions during build time, but just as you said,
our default version should remain the latest. I have created an issue[1] to
track this problem and will work on it to resolve it as soon as possible.

[1] https://github.com/NetEase/amoro/issues/2564

On Fri, Feb 23, 2024 at 17:13 PM Ayush Saxnea  wrote:

>
>   +1,
> I remember exploring this while exploring a way for compaction for iceberg
> tables for a Hive usecase, got some good pointers for cleaning up orphan
> files, I think it was using a pretty old version of Hive(3.1.1 I believe),
> so couldn't pull it in as dependency in Hive master branch itself, which
> was my initial plan.
>
> But overall, it was some good code.
>
> Good Luck!!!
>
> -Ayush
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>