Re: [PROPOSAL] sparklyr
A big thanks to all that have left feedback! After much deliberation, we have decided to withdraw this proposal for the time being. The questions around licenses are delicate, and we are currently not ready to navigate them. Cheers, Kevin On Mon, Oct 21, 2019 at 11:52 PM 申远 wrote: > You could also read the documentation[1] here about what license is allowed > in ASF project. > > [1] https://apache.org/legal/resolved.html#category-a > > Best Regards, > YorkShen > > 申远 > > > 申远 于2019年10月22日周二 下午2:49写道: > > > Base on my experience (wearing my Apache Weex's hat), GPL/LGPL > dependency > > is not compatible with ASF's policy, and you may want to fix the License > > problem at the beginning, even before into Incubator. Otherwise, GPL/LGPL > > dependency will give you a lot of pain than you'd ever expect. > > > > Best Regards, > > YorkShen > > > > 申远 > > > > > > Javier Luraschi 于2019年10月22日周二 上午2:55写道: > > > >> Regarding licenses, dplyr is under MIT, see: > >> https://github.com/tidyverse/dplyr/blob/master/LICENSE.md. However, > other > >> packages are under GPL2. > >> > >> Here are all the packages that sparklyr currently depends on and their > >> associated license (This was retrieved from > >> https://CRAN.R-project.org/package=, since R package repo > >> (CRAN) requires their license to be clearly defined). > >> > >> assertthat: GPL-3 > >> base64enc: GPL-2 | GPL-3 > >> config: GPL-3 > >> DBI: LGPL-2 | LGPL-2.1 | LGPL-3 > >> dplyr: MIT > >> dbplyr: MIT > >> digest: GPL-2 | GPL-3 > >> forge: Apache > >> generics: GPL-2 > >> httr: MIT > >> jsonlite: MIT > >> openssl: MIT > >> purrr: GPL-3 > >> r2d3: BSD-3 > >> rappdirs: MIT > >> rlang: GPL-3 > >> rprojroot: GPL-3 > >> rstudioapi: MIT > >> tibble: MIT > >> tidyr: MIT > >> withr: GPL-2 | GPL-3 > >> xml2: GPL-2 | GPL-3 > >> ellipsis: GPL-3 > >> > >> > >> On Mon, Oct 21, 2019 at 1:12 AM Justin Mclean > > >> wrote: > >> > >> > Hi, > >> > > >> > I also concerned that the initial committer list only contains 3 > >> > committers. Why have you not included others in the community that > have > >> > made contributions? > >> > > >> > I don’t know if this is an issue or not but bring it up just in case > you > >> > not aware. I can see that some of the tidyverse packages are under > GPL2, > >> > the GPL license is not compatible with the ALv2. I’m not 100% sure > what > >> > license dplyr is under. I can see that sparkly depends on several > (10+) > >> GPL > >> > licensed pieces of software. Do you see this causing any issue as GPL > >> code > >> > can’t be included in an Apache source release and can’t be a > >> non-optional > >> > dependancy of an ASF project. Have you discussed this with your > >> champion or > >> > proposed mentors and have they flagged this as a possible issue? > >> > > >> > I can see that one of the proposed mentors is not an IPMC member > (which > >> is > >> > required) and another seems not very active in signing off reports or > >> > voting on releases. Did you think the existing mentors will provide > your > >> > project with enough support? > >> > > >> > Thanks, > >> > Justin > >> > > >> > 1. https://github.com/tidyverse/dplyr/blob/master/LICENSE > >> > - > >> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >> > For additional commands, e-mail: general-h...@incubator.apache.org > >> > > >> > > >> > > >
Re: [PROPOSAL] sparklyr
You could also read the documentation[1] here about what license is allowed in ASF project. [1] https://apache.org/legal/resolved.html#category-a Best Regards, YorkShen 申远 申远 于2019年10月22日周二 下午2:49写道: > Base on my experience (wearing my Apache Weex's hat), GPL/LGPL dependency > is not compatible with ASF's policy, and you may want to fix the License > problem at the beginning, even before into Incubator. Otherwise, GPL/LGPL > dependency will give you a lot of pain than you'd ever expect. > > Best Regards, > YorkShen > > 申远 > > > Javier Luraschi 于2019年10月22日周二 上午2:55写道: > >> Regarding licenses, dplyr is under MIT, see: >> https://github.com/tidyverse/dplyr/blob/master/LICENSE.md. However, other >> packages are under GPL2. >> >> Here are all the packages that sparklyr currently depends on and their >> associated license (This was retrieved from >> https://CRAN.R-project.org/package=, since R package repo >> (CRAN) requires their license to be clearly defined). >> >> assertthat: GPL-3 >> base64enc: GPL-2 | GPL-3 >> config: GPL-3 >> DBI: LGPL-2 | LGPL-2.1 | LGPL-3 >> dplyr: MIT >> dbplyr: MIT >> digest: GPL-2 | GPL-3 >> forge: Apache >> generics: GPL-2 >> httr: MIT >> jsonlite: MIT >> openssl: MIT >> purrr: GPL-3 >> r2d3: BSD-3 >> rappdirs: MIT >> rlang: GPL-3 >> rprojroot: GPL-3 >> rstudioapi: MIT >> tibble: MIT >> tidyr: MIT >> withr: GPL-2 | GPL-3 >> xml2: GPL-2 | GPL-3 >> ellipsis: GPL-3 >> >> >> On Mon, Oct 21, 2019 at 1:12 AM Justin Mclean >> wrote: >> >> > Hi, >> > >> > I also concerned that the initial committer list only contains 3 >> > committers. Why have you not included others in the community that have >> > made contributions? >> > >> > I don’t know if this is an issue or not but bring it up just in case you >> > not aware. I can see that some of the tidyverse packages are under GPL2, >> > the GPL license is not compatible with the ALv2. I’m not 100% sure what >> > license dplyr is under. I can see that sparkly depends on several (10+) >> GPL >> > licensed pieces of software. Do you see this causing any issue as GPL >> code >> > can’t be included in an Apache source release and can’t be a >> non-optional >> > dependancy of an ASF project. Have you discussed this with your >> champion or >> > proposed mentors and have they flagged this as a possible issue? >> > >> > I can see that one of the proposed mentors is not an IPMC member (which >> is >> > required) and another seems not very active in signing off reports or >> > voting on releases. Did you think the existing mentors will provide your >> > project with enough support? >> > >> > Thanks, >> > Justin >> > >> > 1. https://github.com/tidyverse/dplyr/blob/master/LICENSE >> > - >> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> > For additional commands, e-mail: general-h...@incubator.apache.org >> > >> > >> >
Re: [PROPOSAL] sparklyr
Base on my experience (wearing my Apache Weex's hat), GPL/LGPL dependency is not compatible with ASF's policy, and you may want to fix the License problem at the beginning, even before into Incubator. Otherwise, GPL/LGPL dependency will give you a lot of pain than you'd ever expect. Best Regards, YorkShen 申远 Javier Luraschi 于2019年10月22日周二 上午2:55写道: > Regarding licenses, dplyr is under MIT, see: > https://github.com/tidyverse/dplyr/blob/master/LICENSE.md. However, other > packages are under GPL2. > > Here are all the packages that sparklyr currently depends on and their > associated license (This was retrieved from > https://CRAN.R-project.org/package=, since R package repo > (CRAN) requires their license to be clearly defined). > > assertthat: GPL-3 > base64enc: GPL-2 | GPL-3 > config: GPL-3 > DBI: LGPL-2 | LGPL-2.1 | LGPL-3 > dplyr: MIT > dbplyr: MIT > digest: GPL-2 | GPL-3 > forge: Apache > generics: GPL-2 > httr: MIT > jsonlite: MIT > openssl: MIT > purrr: GPL-3 > r2d3: BSD-3 > rappdirs: MIT > rlang: GPL-3 > rprojroot: GPL-3 > rstudioapi: MIT > tibble: MIT > tidyr: MIT > withr: GPL-2 | GPL-3 > xml2: GPL-2 | GPL-3 > ellipsis: GPL-3 > > > On Mon, Oct 21, 2019 at 1:12 AM Justin Mclean > wrote: > > > Hi, > > > > I also concerned that the initial committer list only contains 3 > > committers. Why have you not included others in the community that have > > made contributions? > > > > I don’t know if this is an issue or not but bring it up just in case you > > not aware. I can see that some of the tidyverse packages are under GPL2, > > the GPL license is not compatible with the ALv2. I’m not 100% sure what > > license dplyr is under. I can see that sparkly depends on several (10+) > GPL > > licensed pieces of software. Do you see this causing any issue as GPL > code > > can’t be included in an Apache source release and can’t be a non-optional > > dependancy of an ASF project. Have you discussed this with your champion > or > > proposed mentors and have they flagged this as a possible issue? > > > > I can see that one of the proposed mentors is not an IPMC member (which > is > > required) and another seems not very active in signing off reports or > > voting on releases. Did you think the existing mentors will provide your > > project with enough support? > > > > Thanks, > > Justin > > > > 1. https://github.com/tidyverse/dplyr/blob/master/LICENSE > > - > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > >
Re: [PROPOSAL] sparklyr
Regarding licenses, dplyr is under MIT, see: https://github.com/tidyverse/dplyr/blob/master/LICENSE.md. However, other packages are under GPL2. Here are all the packages that sparklyr currently depends on and their associated license (This was retrieved from https://CRAN.R-project.org/package=, since R package repo (CRAN) requires their license to be clearly defined). assertthat: GPL-3 base64enc: GPL-2 | GPL-3 config: GPL-3 DBI: LGPL-2 | LGPL-2.1 | LGPL-3 dplyr: MIT dbplyr: MIT digest: GPL-2 | GPL-3 forge: Apache generics: GPL-2 httr: MIT jsonlite: MIT openssl: MIT purrr: GPL-3 r2d3: BSD-3 rappdirs: MIT rlang: GPL-3 rprojroot: GPL-3 rstudioapi: MIT tibble: MIT tidyr: MIT withr: GPL-2 | GPL-3 xml2: GPL-2 | GPL-3 ellipsis: GPL-3 On Mon, Oct 21, 2019 at 1:12 AM Justin Mclean wrote: > Hi, > > I also concerned that the initial committer list only contains 3 > committers. Why have you not included others in the community that have > made contributions? > > I don’t know if this is an issue or not but bring it up just in case you > not aware. I can see that some of the tidyverse packages are under GPL2, > the GPL license is not compatible with the ALv2. I’m not 100% sure what > license dplyr is under. I can see that sparkly depends on several (10+) GPL > licensed pieces of software. Do you see this causing any issue as GPL code > can’t be included in an Apache source release and can’t be a non-optional > dependancy of an ASF project. Have you discussed this with your champion or > proposed mentors and have they flagged this as a possible issue? > > I can see that one of the proposed mentors is not an IPMC member (which is > required) and another seems not very active in signing off reports or > voting on releases. Did you think the existing mentors will provide your > project with enough support? > > Thanks, > Justin > > 1. https://github.com/tidyverse/dplyr/blob/master/LICENSE > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >
Re: [PROPOSAL] sparklyr
This looks interesting to me. I would be willing to contribute, if you would like to add me to the initial list of committers. On Mon, Oct 21, 2019 at 10:50 AM Matt Sicker wrote: > A lot of core R libraries seem to be under GPL. If we build more R > projects at Apache, it seems like we may need more Apache-licensed (or > compatible) libraries in R. > > On Mon, 21 Oct 2019 at 03:12, Justin Mclean > wrote: > > > > Hi, > > > > I also concerned that the initial committer list only contains 3 > committers. Why have you not included others in the community that have > made contributions? > > > > I don’t know if this is an issue or not but bring it up just in case you > not aware. I can see that some of the tidyverse packages are under GPL2, > the GPL license is not compatible with the ALv2. I’m not 100% sure what > license dplyr is under. I can see that sparkly depends on several (10+) GPL > licensed pieces of software. Do you see this causing any issue as GPL code > can’t be included in an Apache source release and can’t be a non-optional > dependancy of an ASF project. Have you discussed this with your champion or > proposed mentors and have they flagged this as a possible issue? > > > > I can see that one of the proposed mentors is not an IPMC member (which > is required) and another seems not very active in signing off reports or > voting on releases. Did you think the existing mentors will provide your > project with enough support? > > > > Thanks, > > Justin > > > > 1. https://github.com/tidyverse/dplyr/blob/master/LICENSE > > - > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > > -- > Matt Sicker > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >
Re: [PROPOSAL] sparklyr
A lot of core R libraries seem to be under GPL. If we build more R projects at Apache, it seems like we may need more Apache-licensed (or compatible) libraries in R. On Mon, 21 Oct 2019 at 03:12, Justin Mclean wrote: > > Hi, > > I also concerned that the initial committer list only contains 3 committers. > Why have you not included others in the community that have made > contributions? > > I don’t know if this is an issue or not but bring it up just in case you not > aware. I can see that some of the tidyverse packages are under GPL2, the GPL > license is not compatible with the ALv2. I’m not 100% sure what license dplyr > is under. I can see that sparkly depends on several (10+) GPL licensed pieces > of software. Do you see this causing any issue as GPL code can’t be included > in an Apache source release and can’t be a non-optional dependancy of an ASF > project. Have you discussed this with your champion or proposed mentors and > have they flagged this as a possible issue? > > I can see that one of the proposed mentors is not an IPMC member (which is > required) and another seems not very active in signing off reports or voting > on releases. Did you think the existing mentors will provide your project > with enough support? > > Thanks, > Justin > > 1. https://github.com/tidyverse/dplyr/blob/master/LICENSE > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > -- Matt Sicker - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] sparklyr
Hi, I also concerned that the initial committer list only contains 3 committers. Why have you not included others in the community that have made contributions? I don’t know if this is an issue or not but bring it up just in case you not aware. I can see that some of the tidyverse packages are under GPL2, the GPL license is not compatible with the ALv2. I’m not 100% sure what license dplyr is under. I can see that sparkly depends on several (10+) GPL licensed pieces of software. Do you see this causing any issue as GPL code can’t be included in an Apache source release and can’t be a non-optional dependancy of an ASF project. Have you discussed this with your champion or proposed mentors and have they flagged this as a possible issue? I can see that one of the proposed mentors is not an IPMC member (which is required) and another seems not very active in signing off reports or voting on releases. Did you think the existing mentors will provide your project with enough support? Thanks, Justin 1. https://github.com/tidyverse/dplyr/blob/master/LICENSE - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] sparklyr
Hi, it's an interesting proposal. I guess that one of your challenge during the incubation is to extend the community (only 3 initial committers is very low) and extend the diversity (two companies affiliation). Regards JB On 19/10/2019 17:53, Kevin Kuo wrote: > Greetings! > > We are proposing to enter sparklyr (https://spark.rstudio.com/), an open > source R package for interfacing with Apache Spark, into incubation. Please > see the proposal below. > > == > > = Abstract = > > sparklyr is an open source R package providing an interface to Apache > Spark, a system for large-scale data analysis on clusters. It provides a > dplyr interface for manipulating Spark DataFrames, supports the Spark ML > and Structured Streaming components, and offers a developer API to create > extensions. > > = Proposal = > > The sparklyr project, along with the ecosystem of extensions it supports, > aims to democratize the capabilities of Apache Spark for R users, who > represent a significant portion of data scientists today. The API is > designed to reduce friction for users transitioning from local, “small > data” workflows to computing on clusters, while preserving the flexibility > of Apache Spark as much as possible. Some features include: > > - It is compatible with the tidyverse ecosystem of packages, which is a > popular collection of libraries for data science in R. Specifically, one > can use `dplyr` verbs to manipulate Spark DataFrames. However, one can also > use sparklyr without using tidyverse packages. > - It features an extensions API that allows users to easily wrap existing > Spark packages written in Scala. This has enabled the development of > sparkxgb (interface for xgboost4j), graphframes (interface for > GraphFrames), mleap (interface for MLeap), and sparktf (interface for Spark > TensorFlow connector), to name a few. > > = Rationale = > > By becoming an Apache project, sparklyr can better align with the Apache > Spark project, and encourage stronger collaboration among users and > contributors in the R and Apache communities. Culturally, sparklyr is also > a good fit for ASF: the development of the project has adhered to the > Apache way since inception, and the current contributors are committed to > upholding those values. > > = Initial Goals = > > The initial goals will be to move the existing codebase to Apache and the > documentation from the RStudio domain to Apache. > > = Current Status = > > == Meritocracy == > > The sparklyr project has operated on meritocratic principles since > inception. We have accepted major patches from developers outside RStudio, > and have operated with the implicit expectation that contributors to major > features maintain those features. > > == Community == > > The sparklyr project currently has 699 stars on GitHub, 52 direct > contributors, ~1,400 issues (approximately 500 of those are open), and > approximately 194,000 downloads from CRAN each month. The documentation > website spark.rstudio.com achieves ~15k visitors per month. There are also > more than 15 open source extensions written that implement features such as > genomic analysis and interoperability with databases. > > = Known Risks = > > == Reliance on Salaried Developers == > > sparklyr is currently maintained by salaried developers at RStudio and > receives some ongoing contributions from the community, although all > committers are employed by RStudio. We hope that by becoming an Apache > project, the project will garner additional developer interest and expand > the diversity of committers. > > = Documentation = > > Documentation of the project can be found at https://spark.rstudio.com/ and > https://cran.r-project.org/web/packages/sparklyr/sparklyr.pdf. There is > also a free online book, available at https://therinspark.com/, that can be > used as a reference. > > = Initial Source = > > The sparklyr codebase is currently hosted on GitHub: > https://github.com/rstudio/sparklyr. sparklyr has been Apache 2.0 licensed > since inception. RStudio currently maintains CLAs from all significant > contributors. RStudio does not own the copyright of sparklyr and it is not > a trademark. > > = External Dependencies = > > We remark that `sparklyr` imports some R packages that are not > Apache-compatible licensed; however, these packages are not distributed > with the project. Note, for example, R itself is GPLv2 licensed. > > = Required Resources = > > - Mailing lists: {users, dev, commits}@sparklyr.incubator.apache.org > - GitHub repo > - If possible, we would like to continue using GitHub for issue tracking, > as it is much more familiar to the R community than JIRA. > > = Project Name = > > There is sufficient goodwill built around the package so we would like to > keep the name. sparklyr is pronounced spark-lee-R, i.e. does not rhyme with > the data manipulation package dplyr, and is never capitalized. Incorrect > spellings include SparklyR and sparklyR. > > = Initial Com
Re: [PROPOSAL] sparklyr
Hi - An interesting proposal. I am concerned about the very small size of the Initial Committer list with 3 individuals one of whom I only see small contributions from on https://github.com/rstudio/sparklyr/graphs/contributors Do the Mentors intend to be active participants in the community? Also, Sean will need to join the IPMC which is easy for him to request. Regards, Dave > On Oct 19, 2019, at 8:53 AM, Kevin Kuo wrote: > > Greetings! > > We are proposing to enter sparklyr (https://spark.rstudio.com/), an open > source R package for interfacing with Apache Spark, into incubation. Please > see the proposal below. > > == > > = Abstract = > > sparklyr is an open source R package providing an interface to Apache > Spark, a system for large-scale data analysis on clusters. It provides a > dplyr interface for manipulating Spark DataFrames, supports the Spark ML > and Structured Streaming components, and offers a developer API to create > extensions. > > = Proposal = > > The sparklyr project, along with the ecosystem of extensions it supports, > aims to democratize the capabilities of Apache Spark for R users, who > represent a significant portion of data scientists today. The API is > designed to reduce friction for users transitioning from local, “small > data” workflows to computing on clusters, while preserving the flexibility > of Apache Spark as much as possible. Some features include: > > - It is compatible with the tidyverse ecosystem of packages, which is a > popular collection of libraries for data science in R. Specifically, one > can use `dplyr` verbs to manipulate Spark DataFrames. However, one can also > use sparklyr without using tidyverse packages. > - It features an extensions API that allows users to easily wrap existing > Spark packages written in Scala. This has enabled the development of > sparkxgb (interface for xgboost4j), graphframes (interface for > GraphFrames), mleap (interface for MLeap), and sparktf (interface for Spark > TensorFlow connector), to name a few. > > = Rationale = > > By becoming an Apache project, sparklyr can better align with the Apache > Spark project, and encourage stronger collaboration among users and > contributors in the R and Apache communities. Culturally, sparklyr is also > a good fit for ASF: the development of the project has adhered to the > Apache way since inception, and the current contributors are committed to > upholding those values. > > = Initial Goals = > > The initial goals will be to move the existing codebase to Apache and the > documentation from the RStudio domain to Apache. > > = Current Status = > > == Meritocracy == > > The sparklyr project has operated on meritocratic principles since > inception. We have accepted major patches from developers outside RStudio, > and have operated with the implicit expectation that contributors to major > features maintain those features. > > == Community == > > The sparklyr project currently has 699 stars on GitHub, 52 direct > contributors, ~1,400 issues (approximately 500 of those are open), and > approximately 194,000 downloads from CRAN each month. The documentation > website spark.rstudio.com achieves ~15k visitors per month. There are also > more than 15 open source extensions written that implement features such as > genomic analysis and interoperability with databases. > > = Known Risks = > > == Reliance on Salaried Developers == > > sparklyr is currently maintained by salaried developers at RStudio and > receives some ongoing contributions from the community, although all > committers are employed by RStudio. We hope that by becoming an Apache > project, the project will garner additional developer interest and expand > the diversity of committers. > > = Documentation = > > Documentation of the project can be found at https://spark.rstudio.com/ and > https://cran.r-project.org/web/packages/sparklyr/sparklyr.pdf. There is > also a free online book, available at https://therinspark.com/, that can be > used as a reference. > > = Initial Source = > > The sparklyr codebase is currently hosted on GitHub: > https://github.com/rstudio/sparklyr. sparklyr has been Apache 2.0 licensed > since inception. RStudio currently maintains CLAs from all significant > contributors. RStudio does not own the copyright of sparklyr and it is not > a trademark. > > = External Dependencies = > > We remark that `sparklyr` imports some R packages that are not > Apache-compatible licensed; however, these packages are not distributed > with the project. Note, for example, R itself is GPLv2 licensed. > > = Required Resources = > > - Mailing lists: {users, dev, commits}@sparklyr.incubator.apache.org > - GitHub repo > - If possible, we would like to continue using GitHub for issue tracking, > as it is much more familiar to the R community than JIRA. > > = Project Name = > > There is sufficient goodwill built around the package so we would like to > keep the name. sparklyr is pronounced s
[PROPOSAL] sparklyr
Greetings! We are proposing to enter sparklyr (https://spark.rstudio.com/), an open source R package for interfacing with Apache Spark, into incubation. Please see the proposal below. == = Abstract = sparklyr is an open source R package providing an interface to Apache Spark, a system for large-scale data analysis on clusters. It provides a dplyr interface for manipulating Spark DataFrames, supports the Spark ML and Structured Streaming components, and offers a developer API to create extensions. = Proposal = The sparklyr project, along with the ecosystem of extensions it supports, aims to democratize the capabilities of Apache Spark for R users, who represent a significant portion of data scientists today. The API is designed to reduce friction for users transitioning from local, “small data” workflows to computing on clusters, while preserving the flexibility of Apache Spark as much as possible. Some features include: - It is compatible with the tidyverse ecosystem of packages, which is a popular collection of libraries for data science in R. Specifically, one can use `dplyr` verbs to manipulate Spark DataFrames. However, one can also use sparklyr without using tidyverse packages. - It features an extensions API that allows users to easily wrap existing Spark packages written in Scala. This has enabled the development of sparkxgb (interface for xgboost4j), graphframes (interface for GraphFrames), mleap (interface for MLeap), and sparktf (interface for Spark TensorFlow connector), to name a few. = Rationale = By becoming an Apache project, sparklyr can better align with the Apache Spark project, and encourage stronger collaboration among users and contributors in the R and Apache communities. Culturally, sparklyr is also a good fit for ASF: the development of the project has adhered to the Apache way since inception, and the current contributors are committed to upholding those values. = Initial Goals = The initial goals will be to move the existing codebase to Apache and the documentation from the RStudio domain to Apache. = Current Status = == Meritocracy == The sparklyr project has operated on meritocratic principles since inception. We have accepted major patches from developers outside RStudio, and have operated with the implicit expectation that contributors to major features maintain those features. == Community == The sparklyr project currently has 699 stars on GitHub, 52 direct contributors, ~1,400 issues (approximately 500 of those are open), and approximately 194,000 downloads from CRAN each month. The documentation website spark.rstudio.com achieves ~15k visitors per month. There are also more than 15 open source extensions written that implement features such as genomic analysis and interoperability with databases. = Known Risks = == Reliance on Salaried Developers == sparklyr is currently maintained by salaried developers at RStudio and receives some ongoing contributions from the community, although all committers are employed by RStudio. We hope that by becoming an Apache project, the project will garner additional developer interest and expand the diversity of committers. = Documentation = Documentation of the project can be found at https://spark.rstudio.com/ and https://cran.r-project.org/web/packages/sparklyr/sparklyr.pdf. There is also a free online book, available at https://therinspark.com/, that can be used as a reference. = Initial Source = The sparklyr codebase is currently hosted on GitHub: https://github.com/rstudio/sparklyr. sparklyr has been Apache 2.0 licensed since inception. RStudio currently maintains CLAs from all significant contributors. RStudio does not own the copyright of sparklyr and it is not a trademark. = External Dependencies = We remark that `sparklyr` imports some R packages that are not Apache-compatible licensed; however, these packages are not distributed with the project. Note, for example, R itself is GPLv2 licensed. = Required Resources = - Mailing lists: {users, dev, commits}@sparklyr.incubator.apache.org - GitHub repo - If possible, we would like to continue using GitHub for issue tracking, as it is much more familiar to the R community than JIRA. = Project Name = There is sufficient goodwill built around the package so we would like to keep the name. sparklyr is pronounced spark-lee-R, i.e. does not rhyme with the data manipulation package dplyr, and is never capitalized. Incorrect spellings include SparklyR and sparklyR. = Initial Committers = Javier Luraschi (RStudio) Kevin Kuo (RStudio) Hossein Falaki (Databricks) = Sponsors = == Champion == Xiangrui Meng == Nominated Mentors == Xiangrui Meng Felix Cheung Sean R. Owen