Re: [VOTE] Release Apache Hivemall (Incubating) v0.5.0-RC2

2018-01-30 Thread Makoto Yui
Hi,

>> IMO, adding license information along with copyrights in NOTICE sounds
>> reasonable because comparing LICENSE with NOTICE is hard when divided
>> while it may be redundant.
>
> In general only LICENSE should contain license information [1] as the NOTICE 
> file is informational only [2], (see d. "The contents of the NOTICE file are 
> for informational purposes only and do not modify the License.") It also 
> should be keep as short as possible [3] as it has an impact on downstream ASF 
> projects.

Thank you for pointing out ASF policy. I'll remove licensing
information from NOTICE.

>> BTW, can we remove "rcX" from "x.y.z-rcX" on releasing "x.y.z" without
>> voting when IPMC vote passed?
>
> You can name release artefacts however you want as long as it has 
> “incubating” in it. Changing the name of the release (i.e. dropping the RC 
> bit) doesn’t effect the signature or change the file contents so that’s fine. 
> Best to do this via a "svn move” from the dist/dev area to the /dist area.

On publishing artifacts to Maven, pom versions are written and thus
signature change is actually happening in other release, I think.
Signature of XXX.src.zip can be unchanged though.

Thanks,
Makoto

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Onyx - proposal for Apache Incubation

2018-01-30 Thread Jean-Baptiste Onofré
Hi,

Coral is a good name !

Does the code belong to Seoul National University ? In that case, in addition of
your ICLA, we would need a SGA (it's not blocker for the project bootstrapping
or code donation, but we, at least, will need it later for graduation). On the
other hand, if the committers are all part on the university, you can also sign
a CCLA.

Happy to be mentor on the project if you want me ! ;)

Thanks,
Regards
JB

On 01/30/2018 10:17 AM, Byung-Gon Chun wrote:
> Thanks for the comments, JB!
> My replies are inlined below.
> 
> On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré 
> wrote:
> 
>> Hi,
>>
>> sorry to be a little bit late on this.
>>
>> It's a very interesting proposal. It sounds pretty close to the portability
>> layer we want to add in Apache Beam. I would love to see interaction
>> between the
>> two communities.
>>
>> I have two minor questions:
>>
>> 1. about the name: Onyx sounds very generic and the name is used in other
>> technologies. Maybe another unique name would be more accurate.
>>
> 
> We proposed Coral instead. How does this sound?
> 
> 
>> 2. the Onyx code is on github right now, under the Apache 2.0 license.
>> Does this
>> code has any affiliation with companies ? Meaning that we would need a SGA
>> for
>> the code donation.
>>
>> It does not. The developers are affiliated with Seoul National University.
> In this case, do we still need a SGA?
> 
> 
>> If you need any help for the incubation, I would be more than happy to
>> help !
>>
>>
> Thanks for the offer. Would you be interested in being a mentor of the
> project?
> 
> Thanks.
> -Gon
> 
> 
> 
>> Regards
>> JB
>>
>> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote:
>>> Dear Apache Incubator Community,
>>>
>>> Please accept the following proposal for presentation and discussion:
>>> https://wiki.apache.org/incubator/OnyxProposal
>>>
>>> Onyx is a data processing system that aims to flexibly control the
>> runtime
>>> behaviors of a job to adapt to varying deployment characteristics (e.g.,
>>> harnessing transient resources in datacenters, cross-datacenter
>> deployment,
>>> changing runtime based on job characteristics, etc.). Onyx provides ways
>> to
>>> extend the system’s capabilities and incorporate the extensions to the
>>> flexible job execution.
>>> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an
>>> Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
>>> based on a deployment policy.
>>>
>>> I've attached the proposal below.
>>>
>>> Best regards,
>>> Byung-Gon Chun
>>>
>>> = OnyxProposal =
>>>
>>> == Abstract ==
>>> Onyx is a data processing system for flexible employment with
>>> different execution scenarios for various deployment characteristics
>>> on clusters.
>>>
>>> == Proposal ==
>>> Today, there is a wide variety of data processing systems with
>>> different designs for better performance and datacenter efficiency.
>>> They include processing data on specific resource environments and
>>> running jobs with specific attributes. Although each system
>>> successfully solves the problems it targets, most systems are designed
>>> in the way that runtime behaviors are built tightly inside the system
>>> core to hide the complexity of distributed computing. This makes it
>>> hard for a single system to support different deployment
>>> characteristics with different runtime behaviors without substantial
>>> effort.
>>>
>>> Onyx is a data processing system that aims to flexibly control the
>>> runtime behaviors of a job to adapt to varying deployment
>>> characteristics. Moreover, it provides a means of extending the
>>> system’s capabilities and incorporating the extensions to the flexible
>>> job execution.
>>>
>>> In order to be able to easily modify runtime behaviors to adapt to
>>> varying deployment characteristics, Onyx exposes runtime behaviors to
>>> be flexibly configured and modified at both compile-time and runtime
>>> through a set of high-level graph pass interfaces.
>>>
>>> We hope to contribute to the big data processing community by enabling
>>> more flexibility and extensibility in job executions. Furthermore, we
>>> can benefit more together as a community when we work together as a
>>> community to mature the system with more use cases and understanding
>>> of diverse deployment characteristics. The Apache Software Foundation
>>> is the perfect place to achieve these aspirations.
>>>
>>> == Background ==
>>> Many data processing systems have distinctive runtime behaviors
>>> optimized and configured for specific deployment characteristics like
>>> different resource environments and for handling special job
>>> attributes.
>>>
>>> For example, much research have been conducted to overcome the
>>> challenge of running data processing jobs on cheap, unreliable
>>> transient resources. Likewise, techniques for disaggregating different
>>> types of resources, like memory, CPU and GPU, are being actively
>>> developed to use 

Re: [VOTE] Release Apache Livy 0.5.0-incubating (RC2)

2018-01-30 Thread Justin Mclean
HI,

+1 binding

I checked:
- incubating in name
- DISCLAIMER exists
- LICENSE is OK but missing a license [1][2]
- please update year in NOTICE
- all source files have ASF headers
- no unexpected binary files in release
- Can compile from source

I did get some failing tests btw, I just ran again skipping the tests and it 
compiled fine.

Thanks,
Justin

1. 
livy-0.5.0-incubating-src/docs/assets/themes/apache/bootstrap/fonts/glyphicons-halflings-regular/*
2. 
livy-0.5.0-incubating-src/server/src/main/resources/org/apache/livy/server/ui/static/fonts/*



-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Incubator PMC Board Report Timeline - February 2018

2018-01-30 Thread John D. Ament
February 2018 Incubator report timeline:

https://wiki.apache.org/incubator/February2018

Wed February 07 -- Podling reports due by end of day
Sun February 11 -- Shepherd reviews due by end of day
Sun February 11 -- Summary due by end of day
Tue February 13 -- Mentor signoff due by end of day
Wed February 14 -- Report submitted to Board
Wed February 21 -- Board meeting


Also apologies for anyone who received a Feb 2017 reminder, typo on my part.


Re: [VOTE] Release Apache Hivemall (Incubating) v0.5.0-RC2

2018-01-30 Thread Justin Mclean
Hi,

> IMO, adding license information along with copyrights in NOTICE sounds
> reasonable because comparing LICENSE with NOTICE is hard when divided
> while it may be redundant.

In general only LICENSE should contain license information [1] as the NOTICE 
file is informational only [2], (see d. "The contents of the NOTICE file are 
for informational purposes only and do not modify the License.") It also should 
be keep as short as possible [3] as it has an impact on downstream ASF projects.

> BTW, can we remove "rcX" from "x.y.z-rcX" on releasing "x.y.z" without
> voting when IPMC vote passed?

You can name release artefacts however you want as long as it has “incubating” 
in it. Changing the name of the release (i.e. dropping the RC bit) doesn’t 
effect the signature or change the file contents so that’s fine. Best to do 
this via a "svn move” from the dist/dev area to the /dist area.

Thanks,
Justin

1. http://www.apache.org/dev/licensing-howto.html#overview-of-files
2. http://www.apache.org/licenses/LICENSE-2.0.html#redistribution (see d. )
3.http://www.apache.org/dev/licensing-howto.html#mod-notice
-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Release Apache Hivemall (Incubating) v0.5.0-RC2

2018-01-30 Thread sebb
On 30 January 2018 at 03:02, Makoto Yui  wrote:
> Justin,
>
> 2018-01-30 11:33 GMT+09:00 Justin Mclean :
>> You should be careful following other TLPs as examples and follow the 
>> instructions here. [1] If you want some good example that the HTTP project 
>> or TomCat are I believe good ones to follow. That or perhaps more recently 
>> graduated projects.
>>
>>> https://github.com/apache/hadoop/blob/trunk/NOTICE.txt
>>> https://github.com/apache/spark/blob/master/NOTICE#L2
>
> I'll take a look at Tomcat's one.
>
>>> In your opinion, Hadoop/Spark's NOTICE file is wrong as well.
>>
>> It may be due to historical reason or they may have included issues due to 
>> malformed upstream projects NOTICE files. Which looks to be the case here 
>> for Hadoop with the license information being in the NOTICE file, not so 
>> sure with Spark. IMO they could do with some improvement but that’s up the 
>> PMC of those projects to do that.
>
> IMO, adding license information along with copyrights in NOTICE sounds
> reasonable because comparing LICENSE with NOTICE is hard when divided
> while it may be redundant.

The NOTICE file is like a poem - it is only complete when nothing more
can be taken out.

Nothing must be added to NOTICE unless it is definitely required.

>>> I'll cancel this release and do release process again but wait for
>>> other IPMC's comment for a while to find other glitches.
>>
>> That a good idea. You might want to get your mentors to double check the 
>> release as well. Did any on them vote on this release candidate? (From a 
>> quick look I couldn’t see any mentor votes.)
>
> Not yet. I'm asking our mentors to join this vote, expecting some of them 
> joins.
>
> BTW, can we remove "rcX" from "x.y.z-rcX" on releasing "x.y.z" without
> voting when IPMC vote passed?
> Some of other incubator projects do so but I don't have confidence
> whether it's okay or not.
>
> Thanks,
> Makoto
>
> --
> Makoto YUI 
> Research Engineer, Treasure Data, Inc.
> http://myui.github.io/
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Onyx - proposal for Apache Incubation

2018-01-30 Thread Byung-Gon Chun
If Coral as our project name is fine, I will start voting in a couple of
days.
Let me know if you have any concern.

Thanks.
-Gon

On Tue, Jan 30, 2018 at 6:17 PM, Byung-Gon Chun  wrote:

> Thanks for the comments, JB!
> My replies are inlined below.
>
> On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré 
> wrote:
>
>> Hi,
>>
>> sorry to be a little bit late on this.
>>
>> It's a very interesting proposal. It sounds pretty close to the
>> portability
>> layer we want to add in Apache Beam. I would love to see interaction
>> between the
>> two communities.
>>
>> I have two minor questions:
>>
>> 1. about the name: Onyx sounds very generic and the name is used in other
>> technologies. Maybe another unique name would be more accurate.
>>
>
> We proposed Coral instead. How does this sound?
>
>
>> 2. the Onyx code is on github right now, under the Apache 2.0 license.
>> Does this
>> code has any affiliation with companies ? Meaning that we would need a
>> SGA for
>> the code donation.
>>
>> It does not. The developers are affiliated with Seoul National
> University.
> In this case, do we still need a SGA?
>
>
>> If you need any help for the incubation, I would be more than happy to
>> help !
>>
>>
> Thanks for the offer. Would you be interested in being a mentor of the
> project?
>
> Thanks.
> -Gon
>
>
>
>> Regards
>> JB
>>
>> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote:
>> > Dear Apache Incubator Community,
>> >
>> > Please accept the following proposal for presentation and discussion:
>> > https://wiki.apache.org/incubator/OnyxProposal
>> >
>> > Onyx is a data processing system that aims to flexibly control the
>> runtime
>> > behaviors of a job to adapt to varying deployment characteristics (e.g.,
>> > harnessing transient resources in datacenters, cross-datacenter
>> deployment,
>> > changing runtime based on job characteristics, etc.). Onyx provides
>> ways to
>> > extend the system’s capabilities and incorporate the extensions to the
>> > flexible job execution.
>> > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an
>> > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
>> > based on a deployment policy.
>> >
>> > I've attached the proposal below.
>> >
>> > Best regards,
>> > Byung-Gon Chun
>> >
>> > = OnyxProposal =
>> >
>> > == Abstract ==
>> > Onyx is a data processing system for flexible employment with
>> > different execution scenarios for various deployment characteristics
>> > on clusters.
>> >
>> > == Proposal ==
>> > Today, there is a wide variety of data processing systems with
>> > different designs for better performance and datacenter efficiency.
>> > They include processing data on specific resource environments and
>> > running jobs with specific attributes. Although each system
>> > successfully solves the problems it targets, most systems are designed
>> > in the way that runtime behaviors are built tightly inside the system
>> > core to hide the complexity of distributed computing. This makes it
>> > hard for a single system to support different deployment
>> > characteristics with different runtime behaviors without substantial
>> > effort.
>> >
>> > Onyx is a data processing system that aims to flexibly control the
>> > runtime behaviors of a job to adapt to varying deployment
>> > characteristics. Moreover, it provides a means of extending the
>> > system’s capabilities and incorporating the extensions to the flexible
>> > job execution.
>> >
>> > In order to be able to easily modify runtime behaviors to adapt to
>> > varying deployment characteristics, Onyx exposes runtime behaviors to
>> > be flexibly configured and modified at both compile-time and runtime
>> > through a set of high-level graph pass interfaces.
>> >
>> > We hope to contribute to the big data processing community by enabling
>> > more flexibility and extensibility in job executions. Furthermore, we
>> > can benefit more together as a community when we work together as a
>> > community to mature the system with more use cases and understanding
>> > of diverse deployment characteristics. The Apache Software Foundation
>> > is the perfect place to achieve these aspirations.
>> >
>> > == Background ==
>> > Many data processing systems have distinctive runtime behaviors
>> > optimized and configured for specific deployment characteristics like
>> > different resource environments and for handling special job
>> > attributes.
>> >
>> > For example, much research have been conducted to overcome the
>> > challenge of running data processing jobs on cheap, unreliable
>> > transient resources. Likewise, techniques for disaggregating different
>> > types of resources, like memory, CPU and GPU, are being actively
>> > developed to use datacenter resources more efficiently. Many
>> > researchers are also working to run data processing jobs in even more
>> > diverse environments, such as across distant datacenters. Similarly,
>> > for special 

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

2018-01-30 Thread Byung-Gon Chun
Thanks for the comments, JB!
My replies are inlined below.

On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré 
wrote:

> Hi,
>
> sorry to be a little bit late on this.
>
> It's a very interesting proposal. It sounds pretty close to the portability
> layer we want to add in Apache Beam. I would love to see interaction
> between the
> two communities.
>
> I have two minor questions:
>
> 1. about the name: Onyx sounds very generic and the name is used in other
> technologies. Maybe another unique name would be more accurate.
>

We proposed Coral instead. How does this sound?


> 2. the Onyx code is on github right now, under the Apache 2.0 license.
> Does this
> code has any affiliation with companies ? Meaning that we would need a SGA
> for
> the code donation.
>
> It does not. The developers are affiliated with Seoul National University.
In this case, do we still need a SGA?


> If you need any help for the incubation, I would be more than happy to
> help !
>
>
Thanks for the offer. Would you be interested in being a mentor of the
project?

Thanks.
-Gon



> Regards
> JB
>
> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote:
> > Dear Apache Incubator Community,
> >
> > Please accept the following proposal for presentation and discussion:
> > https://wiki.apache.org/incubator/OnyxProposal
> >
> > Onyx is a data processing system that aims to flexibly control the
> runtime
> > behaviors of a job to adapt to varying deployment characteristics (e.g.,
> > harnessing transient resources in datacenters, cross-datacenter
> deployment,
> > changing runtime based on job characteristics, etc.). Onyx provides ways
> to
> > extend the system’s capabilities and incorporate the extensions to the
> > flexible job execution.
> > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an
> > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
> > based on a deployment policy.
> >
> > I've attached the proposal below.
> >
> > Best regards,
> > Byung-Gon Chun
> >
> > = OnyxProposal =
> >
> > == Abstract ==
> > Onyx is a data processing system for flexible employment with
> > different execution scenarios for various deployment characteristics
> > on clusters.
> >
> > == Proposal ==
> > Today, there is a wide variety of data processing systems with
> > different designs for better performance and datacenter efficiency.
> > They include processing data on specific resource environments and
> > running jobs with specific attributes. Although each system
> > successfully solves the problems it targets, most systems are designed
> > in the way that runtime behaviors are built tightly inside the system
> > core to hide the complexity of distributed computing. This makes it
> > hard for a single system to support different deployment
> > characteristics with different runtime behaviors without substantial
> > effort.
> >
> > Onyx is a data processing system that aims to flexibly control the
> > runtime behaviors of a job to adapt to varying deployment
> > characteristics. Moreover, it provides a means of extending the
> > system’s capabilities and incorporating the extensions to the flexible
> > job execution.
> >
> > In order to be able to easily modify runtime behaviors to adapt to
> > varying deployment characteristics, Onyx exposes runtime behaviors to
> > be flexibly configured and modified at both compile-time and runtime
> > through a set of high-level graph pass interfaces.
> >
> > We hope to contribute to the big data processing community by enabling
> > more flexibility and extensibility in job executions. Furthermore, we
> > can benefit more together as a community when we work together as a
> > community to mature the system with more use cases and understanding
> > of diverse deployment characteristics. The Apache Software Foundation
> > is the perfect place to achieve these aspirations.
> >
> > == Background ==
> > Many data processing systems have distinctive runtime behaviors
> > optimized and configured for specific deployment characteristics like
> > different resource environments and for handling special job
> > attributes.
> >
> > For example, much research have been conducted to overcome the
> > challenge of running data processing jobs on cheap, unreliable
> > transient resources. Likewise, techniques for disaggregating different
> > types of resources, like memory, CPU and GPU, are being actively
> > developed to use datacenter resources more efficiently. Many
> > researchers are also working to run data processing jobs in even more
> > diverse environments, such as across distant datacenters. Similarly,
> > for special job attributes, many works take different approaches, such
> > as runtime optimization, to solve problems like data skew, and to
> > optimize systems for data processing jobs with small-scale input data.
> >
> > Although each of the systems performs well with the jobs and in the
> > environments they target, they perform poorly with 

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

2018-01-30 Thread Jean-Baptiste Onofré
Hi,

sorry to be a little bit late on this.

It's a very interesting proposal. It sounds pretty close to the portability
layer we want to add in Apache Beam. I would love to see interaction between the
two communities.

I have two minor questions:

1. about the name: Onyx sounds very generic and the name is used in other
technologies. Maybe another unique name would be more accurate.
2. the Onyx code is on github right now, under the Apache 2.0 license. Does this
code has any affiliation with companies ? Meaning that we would need a SGA for
the code donation.

If you need any help for the incubation, I would be more than happy to help !

Regards
JB

On 01/26/2018 12:28 AM, Byung-Gon Chun wrote:
> Dear Apache Incubator Community,
> 
> Please accept the following proposal for presentation and discussion:
> https://wiki.apache.org/incubator/OnyxProposal
> 
> Onyx is a data processing system that aims to flexibly control the runtime
> behaviors of a job to adapt to varying deployment characteristics (e.g.,
> harnessing transient resources in datacenters, cross-datacenter deployment,
> changing runtime based on job characteristics, etc.). Onyx provides ways to
> extend the system’s capabilities and incorporate the extensions to the
> flexible job execution.
> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an
> Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
> based on a deployment policy.
> 
> I've attached the proposal below.
> 
> Best regards,
> Byung-Gon Chun
> 
> = OnyxProposal =
> 
> == Abstract ==
> Onyx is a data processing system for flexible employment with
> different execution scenarios for various deployment characteristics
> on clusters.
> 
> == Proposal ==
> Today, there is a wide variety of data processing systems with
> different designs for better performance and datacenter efficiency.
> They include processing data on specific resource environments and
> running jobs with specific attributes. Although each system
> successfully solves the problems it targets, most systems are designed
> in the way that runtime behaviors are built tightly inside the system
> core to hide the complexity of distributed computing. This makes it
> hard for a single system to support different deployment
> characteristics with different runtime behaviors without substantial
> effort.
> 
> Onyx is a data processing system that aims to flexibly control the
> runtime behaviors of a job to adapt to varying deployment
> characteristics. Moreover, it provides a means of extending the
> system’s capabilities and incorporating the extensions to the flexible
> job execution.
> 
> In order to be able to easily modify runtime behaviors to adapt to
> varying deployment characteristics, Onyx exposes runtime behaviors to
> be flexibly configured and modified at both compile-time and runtime
> through a set of high-level graph pass interfaces.
> 
> We hope to contribute to the big data processing community by enabling
> more flexibility and extensibility in job executions. Furthermore, we
> can benefit more together as a community when we work together as a
> community to mature the system with more use cases and understanding
> of diverse deployment characteristics. The Apache Software Foundation
> is the perfect place to achieve these aspirations.
> 
> == Background ==
> Many data processing systems have distinctive runtime behaviors
> optimized and configured for specific deployment characteristics like
> different resource environments and for handling special job
> attributes.
> 
> For example, much research have been conducted to overcome the
> challenge of running data processing jobs on cheap, unreliable
> transient resources. Likewise, techniques for disaggregating different
> types of resources, like memory, CPU and GPU, are being actively
> developed to use datacenter resources more efficiently. Many
> researchers are also working to run data processing jobs in even more
> diverse environments, such as across distant datacenters. Similarly,
> for special job attributes, many works take different approaches, such
> as runtime optimization, to solve problems like data skew, and to
> optimize systems for data processing jobs with small-scale input data.
> 
> Although each of the systems performs well with the jobs and in the
> environments they target, they perform poorly with unconsidered cases,
> and do not consider supporting multiple deployment characteristics on
> a single system in their designs.
> 
> For an application writer to optimize an application to perform well
> on a certain system engraved with its underlying behaviors, it
> requires a deep understanding of the system itself, which is an
> overhead that often requires a lot of time and effort. Moreover, for a
> developer to modify such system behaviors, it requires modifications
> of the system core, which requires an even deeper understanding of the
> system itself.
> 
> With this background, Onyx is designed