RE: Re: [DISCUSS] Apache Amoro proposal

2024-02-23 Thread nathan ma
hi, JB

As co-creator of this project, I’d love to explain more about the
positioning of lakehouse management system.

When discussing databases or traditional data warehouses, we often used the
term DBMS (Database Management System) to describe them. Traditional
databases, including MPP databases, are typically considered “out-of-box”
solutions. Unlike big data systems, they don’t require various components
like compute engines, data lake formats, or metadata stores. When we need a
database management tool, lightweight options like Navicat are commonly
used.

If we further abstract the capabilities of a DBMS and map them to the
modern data stack, we find that the data read/write part of a DBMS is now
shared among different compute engines such as Spark, Flink, Trino, and
cloud-native services like Athena. Another part of a DBMS deals with data
files, index files, and metadata (also known as the information schema)
maintenance. Currently, there are successful open-source and commercial
projects dedicated to managing metadata, such as HiveMetastore,
UnityCatalog, and more recently, Gravitino. In practice, developers often
combine these projects with compute engines to optimize data files. For
example, many commercial compute engines include an optimize command.

Amoro, as a lakehouse management system, aims to encapsulate the
maintenance and management of data lake files, index files, and metadata in
a way that is transparent and easy-to-use for users. The richness of
diverse computing engines is a distinctive feature of the modern data
stack, opening up a multitude of possibilities for various application
scenarios. Additionally, concerning the part analogous to DBMS, we aspire
to have a mature system in place—one that seamlessly accommodates data
written to the lakehouse by any engine, in any manner, ensuring high data
availability across all other engines. For instance, when Flink writes to
Iceberg, Amoro’s self-optimizing mechanism ensures efficient data analysis
performance by Trino or other engines while controlling compacting costs.
Additionally, Amoro handles historical data, snapshots, and orphan file
cleanup in the background.

By positioning Amoro in this way, we aim to provide an ‘out-of-box’
experience that feels as straightforward as traditional DBMS while keeping
openness to various computing engines. At the same time, Amoro hopes to
empower data product builders with a lightweight solution that integrates
seamlessly into their modern data workflows.



Thanks.
Jin Ma


On 2024/02/23 14:16:43 Jean-Baptiste Onofré wrote:
> Hi Justin
>
> Even if it looks interesting, I'm not sure to understand exactly the
> purpose of the proposal.
>
> What lakehouse management system means exactly ? Is it an abstraction
> layer on top of Iceberg, Paimon + query engine powered by Flink,
> Spark, Trino ?
>
> Please let me know if you want an additional mentor, I would be happy to
help.
>
> Thanks !
> Regards
> JB
>
> On Fri, Feb 23, 2024 at 9:44 AM Justin Mclean 
wrote:
> >
> > Hi,
> >
> > I would like to propose a new project to the ASF incubator - Apache
Amoro. I’m one of the mentors, but there are a lot of other people involved
who have done all of the hard work.
> >
> > Amoro is a Lakehouse management system built on open data lake formats
like Apache Iceberg and Apache Paimon (Incubating). Working with compute
engines including Apache Flink, Apache Spark, and Trino, Amoro brings
pluggable and self-managed features for Lakehouse to provide out-of-the-box
data warehouse experience, and helps data platforms or products easily
build infra-decoupled, stream-and-batch-fused and lake-native architecture.
You can find the proposal here. [1]
> >
> > We are looking forward to anyone's feedback or questions.
> >
> > Thanks,
> > Justin
> >
> > [1] https://cwiki.apache.org/confluence/display/INCUBATOR/AmoroProposal
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [DISCUSS] Graduate Apache SDAP (Incubating) as a Top Level Project

2024-02-23 Thread Riley Kuttruff
Thank you for finding those issues. I've updated the site (sdap.a.o shows the 
changes but sdap.i.a.o still hasn't updated at the time I'm writing this). I 
believe I've the issues you've found. Please let me know if this is not the 
case.

On 2024/02/23 15:59:07 sebb wrote:
> The Downloads page has some issues:
> https://sdap.apache.org/downloads
> 
> No link to KEYS file
> Links for older releases are broken
> Copyright page is 2023
> 
> On Fri, 23 Feb 2024 at 14:30, PJ Fanning  wrote:
> >
> > +1 (binding)
> >
> > I had a look at the mailing lists and the community seems in a pretty good 
> > state.
> >
> > As a matter of interest, are you still looking at completing the v1.2.0 
> > release [1]? If so, I could have a look at the RC over the weekend.
> >
> > [1] https://lists.apache.org/thread/vr4zf6zhg2yp41bjwvlpm1mp2nrycqcw
> >
> > On 2024/02/22 18:01:31 Riley Kuttruff wrote:
> > > Hi all,
> > >
> > > Apache SDAP joined Incubator in October 2017. In the time since, we've
> > > made significant progress towards maturing our community and our
> > > project and adopting the Apache Way.
> > >
> > > After community discussion [1][2][3], the community has voted [4] that we
> > > would like to proceed with graduation [5]. We now call upon the Incubator
> > > PMC to review and discuss our progress and would appreciate any and all
> > > feedback towards graduation.
> > >
> > > Below are some facts and project highlights from the incubation phase as
> > > well as the draft resolution:
> > >
> > > - Our community consists of 21 committers, with 2 being mentors and
> > > the remaining 19 serving as our PPMC
> > > - Several pending and planned invites to bring on new committers and/or
> > > PPMC members from additional organizations
> > > - Completed 2 releases with 2 release managers - with a 3rd release run by
> > > a 3rd release manager in progress
> > > - Our software is currently being utilized by organizations such as NASA
> > > Jet Propulsion Laboratory, NSF National Center for Atmospheric Research,
> > > Florida State University, and George Mason University in support of 
> > > projects
> > > such as the NASA Sea Level Change Portal, Estimating the Circulation and
> > > Climate of the Ocean (ECCO) project, GRACE/GRACE-FO, Cloud-based
> > > Data Match-Up Service, Integrated Digital Earth Analysis System (IDEAS),
> > > and many others.
> > > - Opened 400+ PRs across 3 main code repositories, 350+ of which are
> > > merged or closed (some are pending our next release)
> > > - Maturity model self assessment [6]
> > >
> > > We have resolved all branding issues we are aware of: logo, GitHub,
> > > Website, etc
> > >
> > > We’d like to also extend a sincere thank you to our mentors, current and
> > > former for their invaluable insight and assistance with getting us to this
> > > point.
> > >
> > > Thank you, Julian, Jörn, Trevor, Lewis, Suneel, and Raphael!
> > >
> > > ---
> > >
> > > Establish the Apache SDAP Project
> > >
> > > WHEREAS, the Board of Directors deems it to be in the best interests of
> > > the Foundation and consistent with the Foundation's purpose to establish
> > > a Project Management Committee charged with the creation and maintenance
> > > of open-source software, for distribution at no charge to the public,
> > > related to an integrated data analytic center for Big Science problems.
> > >
> > > NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
> > > (PMC), to be known as the "Apache SDAP Project", be and hereby is
> > > established pursuant to Bylaws of the Foundation; and be it further
> > >
> > > RESOLVED, that the Apache SDAP Project be and hereby is responsible
> > > for the creation and maintenance of software related to an integrated data
> > > analytic center for Big Science problems; and be it further
> > >
> > > RESOLVED, that the office of "Vice President, Apache SDAP" be and
> > > hereby is created, the person holding such office to serve at the
> > > direction of the Board of Directors as the chair of the Apache SDAP
> > > Project, and to have primary responsibility for management of the
> > > projects within the scope of responsibility of the Apache SDAP
> > > Project; and be it further
> > >
> > > RESOLVED, that the persons listed immediately below be and hereby are
> > > appointed to serve as the initial members of the Apache SDAP Project:
> > >
> > > - Edward M Armstrong 
> > > - Nga Thien Chung 
> > > - Thomas Cram 
> > > - Frank Greguska 
> > > - Thomas Huang 
> > > - Julian Hyde 
> > > - Joseph C. Jacob 
> > > - Jason Kang 
> > > - Riley Kuttruff 
> > > - Thomas G Loubrieu 
> > > - Kevin Marlis 
> > > - Stepheny Perez 
> > > - Wai Linn Phyo 
> > >
> > > NOW, THEREFORE, BE IT FURTHER RESOLVED, that Nga Thien Chung
> > > be appointed to the office of Vice President, Apache SDAP, to serve in
> > > accordance with and subject to the direction of the Board of Directors
> > > and the Bylaws of the

Re: [DISCUSS] Graduate Apache SDAP (Incubating) as a Top Level Project

2024-02-23 Thread Riley Kuttruff
Thank you very much. We have actually found an issue with that release (GPL 
licensed dependency) and have been working on preparing another release 
candidate. It seems we have forgotten to cancel that vote.

On 2024/02/23 14:27:52 PJ Fanning wrote:
> +1 (binding)
> 
> I had a look at the mailing lists and the community seems in a pretty good 
> state.
> 
> As a matter of interest, are you still looking at completing the v1.2.0 
> release [1]? If so, I could have a look at the RC over the weekend.
> 
> [1] https://lists.apache.org/thread/vr4zf6zhg2yp41bjwvlpm1mp2nrycqcw
> 
> On 2024/02/22 18:01:31 Riley Kuttruff wrote:
> > Hi all,
> > 
> > Apache SDAP joined Incubator in October 2017. In the time since, we've 
> > made significant progress towards maturing our community and our 
> > project and adopting the Apache Way.
> > 
> > After community discussion [1][2][3], the community has voted [4] that we 
> > would like to proceed with graduation [5]. We now call upon the Incubator 
> > PMC to review and discuss our progress and would appreciate any and all 
> > feedback towards graduation.
> > 
> > Below are some facts and project highlights from the incubation phase as 
> > well as the draft resolution:
> > 
> > - Our community consists of 21 committers, with 2 being mentors and 
> > the remaining 19 serving as our PPMC
> > - Several pending and planned invites to bring on new committers and/or
> > PPMC members from additional organizations
> > - Completed 2 releases with 2 release managers - with a 3rd release run by
> > a 3rd release manager in progress
> > - Our software is currently being utilized by organizations such as NASA 
> > Jet Propulsion Laboratory, NSF National Center for Atmospheric Research, 
> > Florida State University, and George Mason University in support of 
> > projects 
> > such as the NASA Sea Level Change Portal, Estimating the Circulation and 
> > Climate of the Ocean (ECCO) project, GRACE/GRACE-FO, Cloud-based 
> > Data Match-Up Service, Integrated Digital Earth Analysis System (IDEAS), 
> > and many others.  
> > - Opened 400+ PRs across 3 main code repositories, 350+ of which are
> > merged or closed (some are pending our next release)
> > - Maturity model self assessment [6]
> > 
> > We have resolved all branding issues we are aware of: logo, GitHub, 
> > Website, etc
> > 
> > We’d like to also extend a sincere thank you to our mentors, current and
> > former for their invaluable insight and assistance with getting us to this
> > point.
> > 
> > Thank you, Julian, Jörn, Trevor, Lewis, Suneel, and Raphael!
> > 
> > ---
> > 
> > Establish the Apache SDAP Project
> > 
> > WHEREAS, the Board of Directors deems it to be in the best interests of
> > the Foundation and consistent with the Foundation's purpose to establish
> > a Project Management Committee charged with the creation and maintenance
> > of open-source software, for distribution at no charge to the public,
> > related to an integrated data analytic center for Big Science problems.
> > 
> > NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
> > (PMC), to be known as the "Apache SDAP Project", be and hereby is
> > established pursuant to Bylaws of the Foundation; and be it further
> > 
> > RESOLVED, that the Apache SDAP Project be and hereby is responsible
> > for the creation and maintenance of software related to an integrated data 
> > analytic center for Big Science problems; and be it further
> > 
> > RESOLVED, that the office of "Vice President, Apache SDAP" be and
> > hereby is created, the person holding such office to serve at the
> > direction of the Board of Directors as the chair of the Apache SDAP
> > Project, and to have primary responsibility for management of the
> > projects within the scope of responsibility of the Apache SDAP
> > Project; and be it further
> > 
> > RESOLVED, that the persons listed immediately below be and hereby are
> > appointed to serve as the initial members of the Apache SDAP Project:
> > 
> > - Edward M Armstrong 
> > - Nga Thien Chung 
> > - Thomas Cram 
> > - Frank Greguska 
> > - Thomas Huang 
> > - Julian Hyde 
> > - Joseph C. Jacob 
> > - Jason Kang 
> > - Riley Kuttruff 
> > - Thomas G Loubrieu 
> > - Kevin Marlis 
> > - Stepheny Perez 
> > - Wai Linn Phyo 
> > 
> > NOW, THEREFORE, BE IT FURTHER RESOLVED, that Nga Thien Chung 
> > be appointed to the office of Vice President, Apache SDAP, to serve in 
> > accordance with and subject to the direction of the Board of Directors 
> > and the Bylaws of the Foundation until death, resignation, retirement, 
> > removal or disqualification, or until a successor is appointed; and be it 
> > further
> > 
> > RESOLVED, that the Apache SDAP Project be and hereby is tasked with
> > the migration and rationalization of the Apache Incubator SDAP
> > podling; and be it further
> > 
> > RESOLVED, that all responsibilities pertaining to the Apache Incubator
> > S

[NOTICE] Incubation Report for February 2024

2024-02-23 Thread tison
Hi,

I'm trying to create an Incubation Report page for February 2024 at
[1], including the following podlings according to the report group
info that they should report in this month:

* answer
* fury
* horaedb
* streampark

But it still lacks information that I need some help:

1. What is the desired timeline and shepherd assignments?
2. Missing IPMC level report. Perhaps the only way is going through
the mailing list?
3. Missing other projects need a report. I guess we have a script to
list out from the podling.xml file, but I don't find it. I may write
it when I have some spare time if there is no one. I don't promise :P

Best,
tison.

[1] https://cwiki.apache.org/confluence/display/INCUBATOR/February2024

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Graduate Apache SDAP (Incubating) as a Top Level Project

2024-02-23 Thread sebb
The Downloads page has some issues:
https://sdap.apache.org/downloads

No link to KEYS file
Links for older releases are broken
Copyright page is 2023

On Fri, 23 Feb 2024 at 14:30, PJ Fanning  wrote:
>
> +1 (binding)
>
> I had a look at the mailing lists and the community seems in a pretty good 
> state.
>
> As a matter of interest, are you still looking at completing the v1.2.0 
> release [1]? If so, I could have a look at the RC over the weekend.
>
> [1] https://lists.apache.org/thread/vr4zf6zhg2yp41bjwvlpm1mp2nrycqcw
>
> On 2024/02/22 18:01:31 Riley Kuttruff wrote:
> > Hi all,
> >
> > Apache SDAP joined Incubator in October 2017. In the time since, we've
> > made significant progress towards maturing our community and our
> > project and adopting the Apache Way.
> >
> > After community discussion [1][2][3], the community has voted [4] that we
> > would like to proceed with graduation [5]. We now call upon the Incubator
> > PMC to review and discuss our progress and would appreciate any and all
> > feedback towards graduation.
> >
> > Below are some facts and project highlights from the incubation phase as
> > well as the draft resolution:
> >
> > - Our community consists of 21 committers, with 2 being mentors and
> > the remaining 19 serving as our PPMC
> > - Several pending and planned invites to bring on new committers and/or
> > PPMC members from additional organizations
> > - Completed 2 releases with 2 release managers - with a 3rd release run by
> > a 3rd release manager in progress
> > - Our software is currently being utilized by organizations such as NASA
> > Jet Propulsion Laboratory, NSF National Center for Atmospheric Research,
> > Florida State University, and George Mason University in support of projects
> > such as the NASA Sea Level Change Portal, Estimating the Circulation and
> > Climate of the Ocean (ECCO) project, GRACE/GRACE-FO, Cloud-based
> > Data Match-Up Service, Integrated Digital Earth Analysis System (IDEAS),
> > and many others.
> > - Opened 400+ PRs across 3 main code repositories, 350+ of which are
> > merged or closed (some are pending our next release)
> > - Maturity model self assessment [6]
> >
> > We have resolved all branding issues we are aware of: logo, GitHub,
> > Website, etc
> >
> > We’d like to also extend a sincere thank you to our mentors, current and
> > former for their invaluable insight and assistance with getting us to this
> > point.
> >
> > Thank you, Julian, Jörn, Trevor, Lewis, Suneel, and Raphael!
> >
> > ---
> >
> > Establish the Apache SDAP Project
> >
> > WHEREAS, the Board of Directors deems it to be in the best interests of
> > the Foundation and consistent with the Foundation's purpose to establish
> > a Project Management Committee charged with the creation and maintenance
> > of open-source software, for distribution at no charge to the public,
> > related to an integrated data analytic center for Big Science problems.
> >
> > NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
> > (PMC), to be known as the "Apache SDAP Project", be and hereby is
> > established pursuant to Bylaws of the Foundation; and be it further
> >
> > RESOLVED, that the Apache SDAP Project be and hereby is responsible
> > for the creation and maintenance of software related to an integrated data
> > analytic center for Big Science problems; and be it further
> >
> > RESOLVED, that the office of "Vice President, Apache SDAP" be and
> > hereby is created, the person holding such office to serve at the
> > direction of the Board of Directors as the chair of the Apache SDAP
> > Project, and to have primary responsibility for management of the
> > projects within the scope of responsibility of the Apache SDAP
> > Project; and be it further
> >
> > RESOLVED, that the persons listed immediately below be and hereby are
> > appointed to serve as the initial members of the Apache SDAP Project:
> >
> > - Edward M Armstrong 
> > - Nga Thien Chung 
> > - Thomas Cram 
> > - Frank Greguska 
> > - Thomas Huang 
> > - Julian Hyde 
> > - Joseph C. Jacob 
> > - Jason Kang 
> > - Riley Kuttruff 
> > - Thomas G Loubrieu 
> > - Kevin Marlis 
> > - Stepheny Perez 
> > - Wai Linn Phyo 
> >
> > NOW, THEREFORE, BE IT FURTHER RESOLVED, that Nga Thien Chung
> > be appointed to the office of Vice President, Apache SDAP, to serve in
> > accordance with and subject to the direction of the Board of Directors
> > and the Bylaws of the Foundation until death, resignation, retirement,
> > removal or disqualification, or until a successor is appointed; and be it
> > further
> >
> > RESOLVED, that the Apache SDAP Project be and hereby is tasked with
> > the migration and rationalization of the Apache Incubator SDAP
> > podling; and be it further
> >
> > RESOLVED, that all responsibilities pertaining to the Apache Incubator
> > SDAP podling encumbered upon the Apache Incubator PMC are hereafter
> > discharged.
> >
> > [1] ht

Re: [DISCUSS] Apache Amoro proposal

2024-02-23 Thread 周劲松
Hi JB,

Yes, you can say it is an abstraction layer on top of data lake table
formats and query engines and we often call it the service layer in
Lakehouse architecture. The service layer primarily provides unified
metadata and access control, as well as common audit services, and so on.
Of course, Amoro is currently focusing on automatic optimizing, helping
users to more easily use the data lake and achieve the desired analytical
performance on it. Amoro can work with other software in the service layer
and can also extend plugins to integrate more capabilities.

On Fri, Feb 23, 2024 at 10:18 PM Jean-Baptiste Onofré 
wrote:

> Hi Justin
>
> Even if it looks interesting, I'm not sure to understand exactly the
> purpose of the proposal.
>
> What lakehouse management system means exactly ? Is it an abstraction
> layer on top of Iceberg, Paimon + query engine powered by Flink,
> Spark, Trino ?
>
> Please let me know if you want an additional mentor, I would be happy to
> help.
>
> Thanks !
> Regards
> JB
>
> On Fri, Feb 23, 2024 at 9:44 AM Justin Mclean 
> wrote:
> >
> > Hi,
> >
> > I would like to propose a new project to the ASF incubator - Apache
> Amoro. I’m one of the mentors, but there are a lot of other people involved
> who have done all of the hard work.
> >
> > Amoro is a Lakehouse management system built on open data lake formats
> like Apache Iceberg and Apache Paimon (Incubating). Working with compute
> engines including Apache Flink, Apache Spark, and Trino, Amoro brings
> pluggable and self-managed features for Lakehouse to provide out-of-the-box
> data warehouse experience, and helps data platforms or products easily
> build infra-decoupled, stream-and-batch-fused and lake-native architecture.
> You can find the proposal here. [1]
> >
> > We are looking forward to anyone's feedback or questions.
> >
> > Thanks,
> > Justin
> >
> > [1] https://cwiki.apache.org/confluence/display/INCUBATOR/AmoroProposal
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [DISCUSS] Apache Amoro proposal

2024-02-23 Thread 周劲松
Hi Ayush,

I am Jinsong from Amoro community.
Thank you very much for your attention and feedback on Amoro. Amoro aims to
support multiple versions of Hadoop and Hive clusters as much as possible,
allowing users to specify versions during build time, but just as you said,
our default version should remain the latest. I have created an issue[1] to
track this problem and will work on it to resolve it as soon as possible.

[1] https://github.com/NetEase/amoro/issues/2564

On Fri, Feb 23, 2024 at 17:13 PM Ayush Saxnea  wrote:

>
>   +1,
> I remember exploring this while exploring a way for compaction for iceberg
> tables for a Hive usecase, got some good pointers for cleaning up orphan
> files, I think it was using a pretty old version of Hive(3.1.1 I believe),
> so couldn't pull it in as dependency in Hive master branch itself, which
> was my initial plan.
>
> But overall, it was some good code.
>
> Good Luck!!!
>
> -Ayush
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [DISCUSS] Graduate Apache SDAP (Incubating) as a Top Level Project

2024-02-23 Thread PJ Fanning
+1 (binding)

I had a look at the mailing lists and the community seems in a pretty good 
state.

As a matter of interest, are you still looking at completing the v1.2.0 release 
[1]? If so, I could have a look at the RC over the weekend.

[1] https://lists.apache.org/thread/vr4zf6zhg2yp41bjwvlpm1mp2nrycqcw

On 2024/02/22 18:01:31 Riley Kuttruff wrote:
> Hi all,
> 
> Apache SDAP joined Incubator in October 2017. In the time since, we've 
> made significant progress towards maturing our community and our 
> project and adopting the Apache Way.
> 
> After community discussion [1][2][3], the community has voted [4] that we 
> would like to proceed with graduation [5]. We now call upon the Incubator 
> PMC to review and discuss our progress and would appreciate any and all 
> feedback towards graduation.
> 
> Below are some facts and project highlights from the incubation phase as 
> well as the draft resolution:
> 
> - Our community consists of 21 committers, with 2 being mentors and 
> the remaining 19 serving as our PPMC
> - Several pending and planned invites to bring on new committers and/or
> PPMC members from additional organizations
> - Completed 2 releases with 2 release managers - with a 3rd release run by
> a 3rd release manager in progress
> - Our software is currently being utilized by organizations such as NASA 
> Jet Propulsion Laboratory, NSF National Center for Atmospheric Research, 
> Florida State University, and George Mason University in support of projects 
> such as the NASA Sea Level Change Portal, Estimating the Circulation and 
> Climate of the Ocean (ECCO) project, GRACE/GRACE-FO, Cloud-based 
> Data Match-Up Service, Integrated Digital Earth Analysis System (IDEAS), 
> and many others.  
> - Opened 400+ PRs across 3 main code repositories, 350+ of which are
> merged or closed (some are pending our next release)
> - Maturity model self assessment [6]
> 
> We have resolved all branding issues we are aware of: logo, GitHub, 
> Website, etc
> 
> We’d like to also extend a sincere thank you to our mentors, current and
> former for their invaluable insight and assistance with getting us to this
> point.
> 
> Thank you, Julian, Jörn, Trevor, Lewis, Suneel, and Raphael!
> 
> ---
> 
> Establish the Apache SDAP Project
> 
> WHEREAS, the Board of Directors deems it to be in the best interests of
> the Foundation and consistent with the Foundation's purpose to establish
> a Project Management Committee charged with the creation and maintenance
> of open-source software, for distribution at no charge to the public,
> related to an integrated data analytic center for Big Science problems.
> 
> NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
> (PMC), to be known as the "Apache SDAP Project", be and hereby is
> established pursuant to Bylaws of the Foundation; and be it further
> 
> RESOLVED, that the Apache SDAP Project be and hereby is responsible
> for the creation and maintenance of software related to an integrated data 
> analytic center for Big Science problems; and be it further
> 
> RESOLVED, that the office of "Vice President, Apache SDAP" be and
> hereby is created, the person holding such office to serve at the
> direction of the Board of Directors as the chair of the Apache SDAP
> Project, and to have primary responsibility for management of the
> projects within the scope of responsibility of the Apache SDAP
> Project; and be it further
> 
> RESOLVED, that the persons listed immediately below be and hereby are
> appointed to serve as the initial members of the Apache SDAP Project:
> 
> - Edward M Armstrong 
> - Nga Thien Chung 
> - Thomas Cram 
> - Frank Greguska 
> - Thomas Huang 
> - Julian Hyde 
> - Joseph C. Jacob 
> - Jason Kang 
> - Riley Kuttruff 
> - Thomas G Loubrieu 
> - Kevin Marlis 
> - Stepheny Perez 
> - Wai Linn Phyo 
> 
> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Nga Thien Chung 
> be appointed to the office of Vice President, Apache SDAP, to serve in 
> accordance with and subject to the direction of the Board of Directors 
> and the Bylaws of the Foundation until death, resignation, retirement, 
> removal or disqualification, or until a successor is appointed; and be it 
> further
> 
> RESOLVED, that the Apache SDAP Project be and hereby is tasked with
> the migration and rationalization of the Apache Incubator SDAP
> podling; and be it further
> 
> RESOLVED, that all responsibilities pertaining to the Apache Incubator
> SDAP podling encumbered upon the Apache Incubator PMC are hereafter
> discharged.
> 
> [1] https://lists.apache.org/thread/vjwjmp0h2f22dv423h262cvdg5x7jl03
> [2] https://lists.apache.org/thread/m9vqwv23jdsofwgmhgxg25f5l1v2j7nz
> [3] https://lists.apache.org/thread/4o1qjsk2cly2ppxcsmm2swzd6pcg3lxj
> [4] https://lists.apache.org/thread/qtxlxl4gj6n33wvm164vdxxnwdlppttl
> [5] https://lists.apache.org/thread/rr74c35fojc7ythmcgnoplyjllhbslj4
> [6] https://github.com/apac

Re: [DISCUSS] Apache Amoro proposal

2024-02-23 Thread Jean-Baptiste Onofré
Hi Justin

Even if it looks interesting, I'm not sure to understand exactly the
purpose of the proposal.

What lakehouse management system means exactly ? Is it an abstraction
layer on top of Iceberg, Paimon + query engine powered by Flink,
Spark, Trino ?

Please let me know if you want an additional mentor, I would be happy to help.

Thanks !
Regards
JB

On Fri, Feb 23, 2024 at 9:44 AM Justin Mclean  wrote:
>
> Hi,
>
> I would like to propose a new project to the ASF incubator - Apache Amoro. 
> I’m one of the mentors, but there are a lot of other people involved who have 
> done all of the hard work.
>
> Amoro is a Lakehouse management system built on open data lake formats like 
> Apache Iceberg and Apache Paimon (Incubating). Working with compute engines 
> including Apache Flink, Apache Spark, and Trino, Amoro brings pluggable and 
> self-managed features for Lakehouse to provide out-of-the-box data warehouse 
> experience, and helps data platforms or products easily build 
> infra-decoupled, stream-and-batch-fused and lake-native architecture. You can 
> find the proposal here. [1]
>
> We are looking forward to anyone's feedback or questions.
>
> Thanks,
> Justin
>
> [1] https://cwiki.apache.org/confluence/display/INCUBATOR/AmoroProposal
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Apache Amoro proposal

2024-02-23 Thread Ayush Saxena
+1,
I remember exploring this while exploring a way for compaction for iceberg
tables for a Hive usecase, got some good pointers for cleaning up orphan
files, I think it was using a pretty old version of Hive(3.1.1 I believe),
so couldn't pull it in as dependency in Hive master branch itself, which
was my initial plan.

But overall, it was some good code.

Good Luck!!!

-Ayush

On Fri, 23 Feb 2024 at 14:15, Justin Mclean 
wrote:

> Hi,
>
> I would like to propose a new project to the ASF incubator - Apache Amoro.
> I’m one of the mentors, but there are a lot of other people involved who
> have done all of the hard work.
>
> Amoro is a Lakehouse management system built on open data lake formats
> like Apache Iceberg and Apache Paimon (Incubating). Working with compute
> engines including Apache Flink, Apache Spark, and Trino, Amoro brings
> pluggable and self-managed features for Lakehouse to provide out-of-the-box
> data warehouse experience, and helps data platforms or products easily
> build infra-decoupled, stream-and-batch-fused and lake-native architecture.
> You can find the proposal here. [1]
>
> We are looking forward to anyone's feedback or questions.
>
> Thanks,
> Justin
>
> [1] https://cwiki.apache.org/confluence/display/INCUBATOR/AmoroProposal
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


[DISCUSS] Apache Amoro proposal

2024-02-23 Thread Justin Mclean
Hi,

I would like to propose a new project to the ASF incubator - Apache Amoro. I’m 
one of the mentors, but there are a lot of other people involved who have done 
all of the hard work.

Amoro is a Lakehouse management system built on open data lake formats like 
Apache Iceberg and Apache Paimon (Incubating). Working with compute engines 
including Apache Flink, Apache Spark, and Trino, Amoro brings pluggable and 
self-managed features for Lakehouse to provide out-of-the-box data warehouse 
experience, and helps data platforms or products easily build infra-decoupled, 
stream-and-batch-fused and lake-native architecture. You can find the proposal 
here. [1]

We are looking forward to anyone's feedback or questions.

Thanks,
Justin

[1] https://cwiki.apache.org/confluence/display/INCUBATOR/AmoroProposal
-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org