Re: [DISCUSSION] Incubating Proposal of Firestorm

2022-05-23 Thread Daniel B. Widdis
This conjures up a mental image of a unicorn with the Apache feather
tickling its nose... I like it.

On Mon, May 23, 2022 at 8:36 AM Jerry Shao  wrote:

> Hi team,
>
> After discussing with the team, we figured out a new name "Uniffle"
> (Unified/Universal Shuffle). And we did a name searching, seems there's no
> conflict with this new name.
>
> So we'd go with "Uniffle" as new project name. What do you think?
>
> Best regards,
> Jerry
>
> -
> Uniffle Naming Search
> Github:
> Search for Uniffle returns 0 results
> https://github.com/search?q=uniffle
>
> SF.net:
> Search for Uniffle returns 0 results
>
> https://sourceforge.net/directory/os:mac/freshness:recently-updated/?q=Uniffle
>
> openhub.net
> Search for Uniffle returns 0 results
> https://www.openhub.net/p?ref=homepage&query=Uniffle
>
> Google code:
> Search for Uniffle returns 0 results
> https://opensource.google/s/results?q=Uniffle
>
> USPTO:
>  I use ` Word and/or Design Mark Search (Free Form` of TESS. And type the
> keyword '(uniffle)[BI,TI] and (software or computer)[GS] and (live)[LD]' .
> There are 0 results
> And I search
>
> https://search.uspto.gov/search?affiliate=web-sdmg-uspto.gov&sort_by=&query=uniffle
> There are 0 results, too.
>
>
> Trademarkia:
> Search for Uniffle returns 0 results
> https://www.trademarkia.com/trademarks-search.aspx?tn=uniffle
>
> EU Organization for Harmonization
> Search for Uniffle returns 0 results
> https://euipo.europa.eu/eSearch/#basic/1+1+1+1/uniffle
>
> Google:
> Search for Uniffle returns 0 results
> https://www.google.com/search?q=uniffle
>
> Bing:
> Search for Uniffle returns 0 results
> https://www.bing.com/search?q=uniffle
>
> Yahoo:
> Search for Uniffle returns 0 results
> https://search.yahoo.com/search?p=uniffle&guccounter=1
>
> Stackoverflow:
> Search for Uniffle returns 0 results
> https://stackoverflow.com/search?q=uniffle
>
> Saisai Shao  于2022年5月23日周一 20:50写道:
>
> > Thanks Justin for the explanation.
> >
> > We discussed internally and think that a new name would be better to
> avoid
> > potential issue.
> >
> > Will figure out a new name.
> >
> > Best regards,
> > Jerry
> >
> > Justin Mclean  于2022年5月23日周一 20:29写道:
> >
> >> Hi,
> >>
> >> > There’s a typo in this paragraph that makes it impossible to
> >> > understand/changes the original meaning.
> >>
> >> Apologies I meant to say "I don’t think that is the case.”. From what I
> >> can see trademarks have not approved FireStorm as a name. If the project
> >> wants to enter the incubator with that name, and understands the risks
> that
> >> involves then that is OK. Just be aware there is a risk that this may
> stop
> >> the project from graduating from the Incubator under that name.
> >>
> >> Kind Regards,
> >> Justin
> >
> >
>


-- 
Dan Widdis


Re: [DISCUSSION] Incubating Proposal of Firestorm

2022-05-16 Thread Daniel B. Widdis
+0.9 (non-binding).

I think the project is a great idea.  I think the name is going to run into
a lot of issues with trademarks and pre-existing software products and
would recommend the project consider a new name before incubating.

On Mon, May 16, 2022 at 6:44 AM Jerry Shao  wrote:

> Hi all,
>
> We would like to propose Firestorm[1] as a new Apache incubator project,
> you can find the proposal here [2] for more details.
>
> Firestorm is a high performance, general purpose Remote Shuffle Service for
> distributed compute engines like Apache Spark
> , Apache
> Hadoop MapReduce , Apache Flink
>  and so on. We are aiming to make Firestorm a
> universal shuffle service for distributed compute engines.
>
> Shuffle is the key part for a distributed compute engine to exchange the
> data between distributed tasks, the performance and stability of shuffle
> will directly affect the whole job. Current “local file pull-like shuffle
> style” has several limitations:
>
>1. Current shuffle is hard to support super large workloads, especially
>in a high load environment, the major problem is IO problem (random
> disk IO
>issue, network congestion and timeout).
>2. Current shuffle is hard to deploy on the disaggregated compute
>storage environment, as disk capacity is quite limited on compute nodes.
>3. The constraint of storing shuffle data locally makes it hard to scale
>elastically.
>
> Remote Shuffle Service is the key technology for enterprises to build big
> data platforms, to expand big data applications to disaggregated,
> online-offline hybrid environments, and to solve above problems.
>
> The implementation of Remote Shuffle Service -  “Firestorm”  - is heavily
> adopted in Tencent, and shows its advantages in production. Other
> enterprises also adopted or prepared to adopt Firestorm in their
> environments.
>
> Firestorm’s key idea is brought from Salfish shuffle
> <
> https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> >,
> it has several key design goals:
>
>1. High performance. Firestorm’s performance is close enough to local
>file based shuffle style for small workloads. For large workloads, it is
>far better than the current shuffle style.
>2. Fault tolerance. Firestorm provides high availability for Coordinated
>nodes, and failover for Shuffle nodes.
>3. Pluggable. Firestorm is highly pluggable, which could be suited to
>different compute engines, different backend storages, and different
>wire-protocols.
>
> We believe that Firestorm project will provide the great value for the
> community if it is accepted by the Apache incubator.
>
> I will help this project as champion and many thanks to the 3 mentors:
>
>- Junping du (junping...@apache.org)
>- Xun liu (liu...@apache.org)
>- Zhankun Tang (zt...@apache.org)
>
>
> [1] https://github.com/Tencent/Firestorm
> [2]
> https://cwiki.apache.org/confluence/display/INCUBATOR/FirestormProposal
>
> Best regards,
> Jerry
>


-- 
Dan Widdis


Re: [VOTE] Accept SeaTunnel into the Apache Incubator

2021-12-03 Thread Daniel B. Widdis
+1 (non binding)

On Fri, Dec 3, 2021 at 5:57 AM Willem Jiang  wrote:

> Hi all,
>
> Following up the [DISCUSS] thread on SeaTunnel [1] I would like to call a
> VOTE to accept into the Apache Incubator.
>
> Please cast your vote:
>
>   [ ] +1, bring  into the Incubator
>   [ ] +0, I don't care either way
>   [ ] -1, do not bring SeaTunnel into the Incubator, because...
>
> The vote will open at least for 72 hours and only votes from the
> Incubator PMC are binding, but votes from everyone are welcome.
>
> Please check out the SeaTunnel Proposal from the incubator wiki[2].
>
> [1]https://lists.apache.org/thread/nvp0sxnl0b69wgylxsvmdshfd70om1gk
> [2]https://cwiki.apache.org/confluence/display/INCUBATOR/SeaTunnelProposal
>
> Regards,
>
> Willem Jiang
>
> Twitter: willemjiang
> Weibo: 姜宁willem
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

-- 
Dan Widdis


Re: IP clearance officer for accepting Terraform

2021-06-25 Thread Daniel B. Widdis
> I would expect any one making total commits of 100s, 1,000s or more lines
of code major/significant.

That assumes it's their own code.  There are three contributors in the
1000s of lines range, and while these contributors have done a great
service to the project, the vast majority of those thousands of lines of
code do not represent their own intellectual property.

The top contributor appears to have added 352,925 lines of code, but most
of that is transferring code from other sources

The 16 commits [1] include:
 - transfer of cloudstack code provider from another source (4 commits:
1118 lines, 12,193 lines, 124,297 lines and 188,280 lines)
 - transfer of website docs from another source and a few commits to fix
various links
 - a makefile from another source
 - .gitignore list (hard to argue it's copyrightable)
 - An issue template (creative, but not code)
 - Typo fixes

The vast majority of the 3rd top contributor's 79,555 lines of code are a
single commit transferring 233.853 lines of code from another source [2],
much of which were later deleted when obsolete

The 6th top contributor's 19,413 lines are mostly from a single 19,378-line
commit of code from another source [3].

I do not mean to belittle the herculean effort by these individuals.  This
is a lot of work.  The project in its current form wouldn't exist without
the effort they put in.

I would be pressed to argue they have 352,925, 79,555, and 19,413 lines,
respectively, of IP for which they have legal standing to discuss the
licensing of, vs. the original license of the code they transferred.  I
would wager that the individual with the most IP in the current iteration
of the project is someone other than these three.

> I’m not sure that due diligence has been done here, please provide
documentation that shows otherwise.

Certainly this is a reasonable request, but due diligence should likely be
a review of every commit to the project, not just lines above some
arbitrary threshold, determining the license of code transferred from
elsewhere (and not the committer's opinion), eliminating non-copyrightable
contributions (whitespace, typo fixes, lists of files, URL changes), and
obtaining concurrence from anyone who has committed remaining code,
regardless of how few lines they committed.

1. Commits · xanzy/terraform-provider-cloudstack (github.com)

2. Merge pull request #64 from terraform-providers/svh/f-cs-4.12 ·
xanzy/terraform-provider-cloudstack@07febb7 (github.com)

3. vendor: github.com/hashicorp/terraform/...@v0.10.0 ·
xanzy/terraform-provider-cloudstack@29c9bb4

4. IP Clearance for Terraform Provider and Go SDK · Issue #5159 ·
apache/cloudstack (github.com)



On Wed, Jun 23, 2021 at 2:53 AM Justin Mclean 
wrote:

> Hi,
>
> > I've been following this thread and continue to see phrases such as
> "major contributors" and "significant contributions”.
>
> That may be a bit nebulous depending on the exact contributions involved,
> but I would expect any one making total commits of 100s, 1,000s or more
> lines of code major/significant. You can see the stats for this repo here
> [1]
>
> Thanks,
> Justin
>
> 1.
> https://github.com/xanzy/terraform-provider-cloudstack/graphs/contributors
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

-- 
Dan Widdis


Re: JPMS Projects?

2021-03-17 Thread Daniel B. Widdis
Can you clarify what you mean by an "Apache Java project"?
 - A TLP?
 - An incubating project?
 - A project anywhere that is released under the Apache license?

There's actually no need to "migrate code" in many cases, just add some
files. Is there a particular use case you are interested in?

On Wed, Mar 17, 2021 at 3:11 PM leerho  wrote:

> Folks,
> Is anyone aware of an Apache Java project that has actually migrated their
> code from Java 8 to the Java Platform Module System (JPMS)?
>
> Thanks,
> Lee.
>


-- 
Dan Widdis


Re: [VOTE] Release Apache TubeMQ (Incubating) 0.8.0-incubating RC2

2021-02-18 Thread Daniel B. Widdis
Craig,

As Justin pointed out, while the "Java Edition" includes the AL 2.0
license, there are additional restrictions on the website [1]:

"The our open source license permits you to use Berkeley DB, Berkeley DB
Java Edition or Berkeley DB XML at no charge under the condition that if
you use the software in an application you redistribute, the complete
source code for your application must be available and freely
redistributable under reasonable conditions. "

[1] - Oracle Berkeley DB Licensing Information
<https://www.oracle.com/database/technologies/related/berkeleydb/berkeleydb-licensing.html>


On Thu, Feb 18, 2021 at 4:00 PM Craig Russell  wrote:

> Which Oracle product is TubeMQ dependent on?
>
> From what I see, there are two distinct Oracle products, with different
> licenses:
>
> Berkeley DB uses a Category X non-commercial license, which Justin points
> out is not an open source license.
>
> Berkeley DB Java Edition uses a standard Apache License, v2.0 with an
> additional license that appears to be the 3-clause BSD license for an
> embedded third-party component; both of which are Category A licenses.
>
> So if the project uses only Berkeley DB Java Edition, all is well. If it
> uses Berkeley DB then we have a problem.
>
> Regards,
> Craig
>
> > On Feb 18, 2021, at 1:46 PM, Daniel B. Widdis  wrote:
> >
> > Thank you for the clarification, Justin!  I missed that non-commercial
> > limitation.
> >
> > I agree that it would be better to replace that dependency,
> >
> > On Thu, Feb 18, 2021 at 1:41 PM Justin Mclean 
> > wrote:
> >
> >> Hi,
> >>
> >> My concern is that while it claims it is under the Apache license that
> is
> >> only for non-commercial use. Having a restriction like that means it’s
> not
> >> really under the Apache license.
> >>
> >> It states on that page [1]
> >> "Our open source license is OSI-certified and permits use of Berkeley DB
> >> in open source projects or in applications that are not distributed to
> >> third parties.”
> >> "Our commercial license permits closed-source distribution of an
> >> application to third parties and provides business assurance"
> >>
> >> ASF releases need to be distributed to third parties, ASF releases can’t
> >> have non-commercial restrictions.
> >>
> >> IMO It would be best if the project replaced this dependancy with
> >> something else.
> >>
> >> Thanks,
> >> Justin
> >>
> >> 1.
> >>
> https://www.oracle.com/database/technologies/related/berkeleydb-downloads.html
> >> -
> >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> >> For additional commands, e-mail: general-h...@incubator.apache.org
> >>
> >>
> >
> > --
> > Dan Widdis
>
> Craig L Russell
> c...@apache.org
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

-- 
Dan Widdis


Re: [VOTE] Release Apache TubeMQ (Incubating) 0.8.0-incubating RC2

2021-02-18 Thread Daniel B. Widdis
Thank you for the clarification, Justin!  I missed that non-commercial
limitation.

I agree that it would be better to replace that dependency,

On Thu, Feb 18, 2021 at 1:41 PM Justin Mclean 
wrote:

> Hi,
>
> My concern is that while it claims it is under the Apache license that is
> only for non-commercial use. Having a restriction like that means it’s not
> really under the Apache license.
>
> It states on that page [1]
> "Our open source license is OSI-certified and permits use of Berkeley DB
> in open source projects or in applications that are not distributed to
> third parties.”
> "Our commercial license permits closed-source distribution of an
> application to third parties and provides business assurance"
>
> ASF releases need to be distributed to third parties, ASF releases can’t
> have non-commercial restrictions.
>
> IMO It would be best if the project replaced this dependancy with
> something else.
>
> Thanks,
> Justin
>
> 1.
> https://www.oracle.com/database/technologies/related/berkeleydb-downloads.html
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

-- 
Dan Widdis


Re: [VOTE] Release Apache TubeMQ (Incubating) 0.8.0-incubating RC2

2021-02-18 Thread Daniel B. Widdis
Can you be clear on what you mean by "has no standard license file" and "no
open source code channel"?

I started at this download page [1].

On that page near the top is a link to "more details" on licensing, which
points to [2].

The link there for "Open source license for Berkeley DB Java Edition"
points to [3] which is the Apache License, 2.0.

>From [1], I downloaded the file Berkeley DB Java Edition je-7.5.11.zip and
extracted it.  It contains a LICENSE file exactly matching the license
shown in [3], the Apache License, 2.0.

The extracted zip file also contains a complete copy of the source code.

You are correct that it is not published on Maven, but is that a
requirement for a dependency?  The only "obstacle" to downloading is that
one has to create a (free) Oracle account to download it.  Is this a
disqualifying factor?

I am unclear on why this is not permitted as a dependency for an Apache
project.



[1] -
https://www.oracle.com/database/technologies/related/berkeleydb-downloads.html

[2] -
https://www.oracle.com/database/technologies/related/berkeleydb/berkeleydb-licensing.html
[3] - https://www.oracle.com/downloads/licenses/berkeleydb-jeoslicense.html



On Thu, Feb 18, 2021 at 3:31 AM Goson zhang  wrote:

> Hi all:
>
> We carefully analyzed the dependency package berkeleydb-je. I think
> what @Justin
> Mclean   said is correct: it has no standard
> license file, no open source code channel, and cannot be downloaded from
> the central repository. All off all , It does have a problem.
>
> I will restore the WIP label until it is replaced with the new scheme.
>
> Thanks all!
>
>
>
> Goson zhang  于2021年2月11日周四 下午9:07写道:
>
> > Hi all:
> >
> > I made the following adjustments according to your suggestions:
> >
> > 1. Sorted out and supplemented the LICENSE of each binary dependency
> > package.
> >
> > 2. Delete tubemq-manager and tubemq-web modules related content:
> > In the process of combing, it is found that tubemq-manager contains a
> > dependency package of the GNU GPL V2 LICENSE. This version deletes the
> > module, and then merges it into the official version after finishing the
> > relevant checks for dependency packages.
> > Since more than 60 files of tubemq-web module are not authorized by
> > LICENSE, they will be merged into the mainline version after finishing
> the
> > LICENSE information checks of the code and dependent packages.
> >
> > 3. Solved the problem of compiling:
> >  When compiling with the mvn compile command, the pom dependency of
> > tubemq-docker needs to obtain the dependency package of tubemq from the
> > warehouse instead of locally, so a compilation error occurs; this part
> > should be a pom problem, and solve the problem when the next version is
> > released(does not affect the main line).
> >
> > 4. LICENSE problem of bdb:
> > Through everyone's discussion and confirmation, the 7.3.7 version of
> > berkeleydb-je is authorized under Apache V2 LICENSE, which is explained
> in
> > detail in the LICENSE file of TubeMQ; subsequent project evolution will
> > consider gradually removing this component.
> >
> > 5. Modify the contents of the CHANGES.md file and add this modification
> > item.
> >
> > Please see if there are any other problems. If OK, we will launch a new
> > round of version release.
> >
> > Thanks!
> >
> >
> > Goson zhang  于2021年2月11日周四 下午2:55写道:
> >
> >> Ok, thanks Daniel!
> >>
> >>
> >>
> >> Daniel Widdis  于2021年2月11日周四 下午2:10写道:
> >>
> >>> To continue to provide clarity:
> >>>
> >>> The current version (7.5.11) still has AL2.0 licensing; I just
> >>> downloaded it to confirm.  Any version from 7.3.7 and newer (at this
> point
> >>> in time) is an acceptable dependency.
> >>>
> >>> If Oracle chooses to change the license again for future releases that
> >>> could pose a problem, but personally I don't think that's likely.
> >>>
> >>>
> >>> On 2/10/21, 9:58 PM, "Goson zhang"  wrote:
> >>>
> >>> Yes, restricting the use of its version number in the project is
> >>> still a
> >>> relatively passive solution: if we want to upgrade the version, but
> >>> the
> >>> corresponding version authorization is adjusted, our project still
> >>> has
> >>> restrictions.
> >>>
> >>> The biggest dependency of replacing this component lies in the
> >>> active/standby switching function: currently we are not considering
> >>> expanding its scope of use, and we are analyzing the new
> >>> active/standby
> >>> switching scheme, and want to temporarily maintain the existing
> >>> method
> >>> before completing this task, until the real-time active/standby
> >>> switching
> >>> is provided.
> >>>
> >>> I plan to explain this problem in detail in the supplementary
> binary
> >>> dependency package LICENSE, until the solution is adjusted to
> >>> completely
> >>> solve it.
> >>>
> >>> See if this is OK?
> >>>
> >>>
> >>> Justin Mclean  于2021年2月11日周四 下午1:46写道:
> >>>
> >>> > Hi,
> >>> >
> >

Re: [VOTE] Graduate Apache Daffodil to a top-level project

2021-02-09 Thread Daniel B. Widdis
+1 non-binding

On Tue, Feb 9, 2021 at 2:03 PM Mike Beckerle  wrote:

> Folks,
>
> I would now call for people to please officially vote on this proposal.
>
> This vote thread will be open for 72 hours - ending on Friday Feb 12,
> 5:15pm ET.US (UTC-5)
>
> The discussion thread already contained votes (eleven +1, including 4
> binding, and no 0 or -1), and no negative discussion points, so I am
> incorporating that thread by reference here.
>
> Discussion thread:
>
> https://lists.apache.org/thread.html/r1340f079a36ccc7498231eebbe9a5a0e69570f436d213fa665ca48c0%40%3Cgeneral.incubator.apache.org%3E
>
>
> The draft board proposal is below for your review. As you all know this is
> business boilerplate except for our project name, charter statement, the
> original PMC slate, and VP/Chairperson put forward.
>
>
> Regards,
> Mike Beckerle
> Daffodil PMC Chair Elect, Committer
>
>
>
>
>
>
>
>
>
>
> *Establish the Apache Daffodil Project WHEREAS, the Board of Directors
> deems it to be in the best interests of the Foundation and consistent
> with the Foundation's purpose to establish a Project Management
> Committee charged with the creation and maintenance of open-source
> software, for distribution at no charge to the public, related to an
> implementation of the Data Format Description Language (DFDL) used to
> convert between fixed format data and more readily processed forms such
> as XML or JSON.*
>
>
>
>
>
>
>
>
>
>
> *NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
> (PMC), to be known as the "Apache Daffodil Project", be and hereby is
> established pursuant to Bylaws of the Foundation; and be it further
> RESOLVED, that the Apache Daffodil Project be and hereby is responsible
> for the creation and maintenance of software related to an
> implementation of the Data Format Description Language (DFDL) used to
> convert between fixed format data and more readily processed forms such
> as XML or JSON; and be it further*
>
>
>
>
>
>
>
>
>
>
> *RESOLVED, that the office of "Vice President, Apache Daffodil" be and
> hereby is created, the person holding such office to serve at the
> direction of the Board of Directors as the chair of the Apache Daffodil
> Project, and to have primary responsibility for management of the
> projects within the scope of responsibility of the Apache Daffodil
> Project; and be it further RESOLVED, that the persons listed
> immediately below be and hereby are appointed to serve as the initial
> members of the Apache Daffodil Project:*
>
>
>
>
>
>
>
>
>
>
>
>
>
> *  * Brandon Sloane>  *
> Christofer Dutz   >  * Dave Fisher
>   >  * Dave Thompson
> >  * John Interrante
> >  * John Wass
> >  * Joshua Adams
>  >  * Kevin Ratnasekera
> >  * Mike Beckerle
> >  * Steve Lawrence
>  >  * Taylor Wise
> >  * Olabusayo Kilo
>  >*
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> * NOW, THEREFORE, BE IT FURTHER RESOLVED, that Mike Beckerle be appointed
>   to the office of Vice President, Apache Daffodil, to serve in accordance
> with and subject to the direction of the Board of Directors and the
> Bylaws of the Foundation until death, resignation, retirement, removal
> or disqualification, or until a successor is appointed; and be it
> further RESOLVED, that the Apache Daffodil Project be and hereby is
> tasked with the migration and rationalization of the Apache Incubator
> Daffodil podling; and be it further RESOLVED, that all
> responsibilities pertaining to the Apache Incubator Daffodil podling
> encumbered upon the Apache Incubator PMC are hereafter discharged.*
>
>
> -
>


-- 
Dan Widdis


Re: [VOTE] Accept Wayang into the Apache Incubator

2020-12-11 Thread Daniel B. Widdis
+1 (non-binding).  I'm interested in getting involved in this project!

On Fri, Dec 11, 2020 at 8:33 AM Christofer Dutz 
wrote:

> Hi all,
>
> following up the [DISCUSS] thread on Wayang (
> https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07faedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E)
> I would like to call a VOTE to accept Wayang Aka Rheem into the Apache
> Incubator.
>
> Please cast your vote:
>
>   [ ] +1, bring Wayang into the Incubator
>   [ ] +0, I don't care either way
>   [ ] -1, do not bring Wayang into the Incubator, because...
>
> The vote will open at least for 72 hours and only votes from the Incubator
> PMC are binding, but votes from everyone are welcome.
>
> Chris
>
> -
>
> Wayang Proposal (
> https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal)
>
> == Abstract ==
>
> Wayang is a cross-platform data processing system that aims at decoupling
> the business logic of data analytics applications from concrete data
> processing platforms, such as Apache Flink or Apache Spark. Hence, it tames
> the complexity that arises from the "Cambrian explosion" of novel data
> processing platforms that we currently witness.
>
> Note that Wayang project is the Rheem project, but we have renamed the
> project because of trademark issues.
>
> You can find the project web page at: https://rheem-ecosystem.github.io/
>
> = Proposal =
>
> Wayang is a cross-platform system that provides an abstraction over data
> processing platforms to free users from the burdens of (i) performing
> tedious and costly data migration and integration tasks to run their
> applications, and (ii) choosing the right data processing platforms for
> their applications. To achieve this, Wayang: (1) provides an abstraction on
> top of existing data processing platforms that allows users to specify
> their data analytics tasks in a form of a DAG of operators; (2) comes with
> a cross-platform optimizer for automating the selection of
> suitable/efficient platforms; and (3) and finally takes care of executing
> the optimized plan, including communication across platforms. In summary,
> Wayang has the following salient features:
>
> - Flexible Data Model - It considers a flexible and simple data model
> based on data quanta. A data quantum is an atomic processing unit in the
> system, that can represent a large spectrum of data formats, such as data
> points for a machine learning application, tuples for a database
> application, or RDF triples. Hence, Wayang is able to express a wide range
> of data analytics tasks.
> - Platform independence - It provides a simple interface (currently Java
> and Scala) that is inspired by established programming models, such as that
> of Apache Spark and Apache Flink. Users represent their data analytic tasks
> as a DAG (Wayang plan), where vertices correspond to Wayang operators and
> edges represent data flows (data quanta flowing) among these operators. A
> Wayang operator defines a particular kind of data transformation over an
> input data quantum, ranging from basic functionality (e.g.,
> transformations, filters, joins) to complex, extensible tasks (e.g.,
> PageRank).
> - Cross-platform execution - Besides running a data analytic task on any
> data processing platform, it also comes with an optimizer that can decide
> to execute a single data analytic task using multiple data processing
> platforms. This allows for exploiting the capabilities of different data
> processing platforms to perform complex data analytic tasks more
> efficiently.
> Self-tuning UDF-based cost model - Its optimizer uses a cost model fully
> based on UDFs. This not only enables Wayang to learn the cost functions of
> newly added data processing platforms, but also allows developers to tune
> the optimizer at will.
> - Extensibility - It treats data processing platforms as plugins to allow
> users (developers) to easily incorporate new data processing platforms into
> the system. This is achieved by exposing the functionalities of data
> processing platforms as operators (execution operators). The same approach
> is followed at the Wayang interface, where users can also extend Wayang
> capabilities, i.e., the operators, easily.
>
> We plan to work on the stability of all these features as well as
> extending Wayang with more advanced features. Furthermore, Wayang currently
> supports Apache Spark, Standalone Java, GraphChi, relational databases (via
> JDBC). We plan to incorporate more data processing platforms, such as
> Apache Flink and Apache Hive.
>
> === Background ===
>
> Many organizations and companies collect or produce large variety of data
> to apply data analytics over them. This is because insights from data
> rapidly allow them to make better decisions. Thus, the pursuit for
> efficient and scalable data analytics as well as the
> one-size-does-not-fit-all philosophy has given rise to a plethora of data
> processing platforms. Examples of these specialized 

Re: [incubator-pinot] Migrate LinkedIn internal CI pipeline from Travis CI to Github Actions

2020-12-07 Thread Daniel B. Widdis
Someone needs to have "Admin" access to the repository.  Most committers
just have "Write" access.

I would suggest filing an Issue at the repository stating these
requirements.

On Mon, Dec 7, 2020 at 12:22 PM Jialiang Li  wrote:

> Hi community,
>
> This is Jack from LinkedIn Pinot team. We’re currently thinking of
> migrating our existing internal publication pipeline from travis CI to
> github actions, but it seems none of our committers has the permission to
> add the secret key to the public repo. Could anyone help guide how to do
> that?
>
>
> https://docs.github.com/en/free-pro-team@latest/actions/reference/encrypted-secrets#creating-encrypted-secrets-for-a-repository
>
>
> Best Regards,
> Jack
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

-- 
Dan Widdis


Re: Where to get feedback on a project before formulating a proposal

2020-11-24 Thread Daniel B. Widdis
On Tue, Nov 24, 2020 at 4:39 AM Nick Kew  wrote:

> Whose initiative is this?  If it's you acting as dictator, the first thing
> to do is discuss
> it within your own community.  If it's coming from them and they're
> pressing you
> with positive reasons to move, that's a good start.
>

The idea was first brought up over a year ago by a member of the community,
although it's been me dragging my feet.  They are actually an ASF member
(V/P of a TLP) and it seems they could potentially serve as a sponsor,
however, I wanted to also get feedback from others.  Admittedly the
community is still small, however, and part of the goal here is to help
transition from me doing most of the PRs to a much more sustainable,
community-led project team.

One gotcha to be aware of is that Apache expects project work to happen both
> openly and on-record.


We actually do already do many of the things that Apache projects do,
including:
 - open/on-record conversations for all but a few things (email invites to
committer status, for example)
 - meritocracy invites (a quality PR or two gets that invite)
 - care with licensing of dependencies
 - attempts to build community involvement by encouraging newcomers to open
source to submit PRs.


Does it relate to one or more existing Apache project?  If so, that would
> be a good
> place to look for folks who'll take an interest.
>

Good suggestion, thanks. We are a dependency of at least 4 apache projects
that I'm aware of (two TLP, two incubating) and I may inquire on those
lists for interest.


-- 
Dan Widdis