Re: [DISCUSS] SparkR support on k8s back-end for Spark 2.4

2018-08-15 Thread Wenchen Fan
I'm also happy to see we have R support on k8s for Spark 2.4. I'll do the manual testing for it if we don't want to upgrade the OS now. If the Python support is also merged in this way, I think we can merge the R support PR too? On Thu, Aug 16, 2018 at 7:23 AM shane knapp wrote: > >> What is

Re: [DISCUSS] SparkR support on k8s back-end for Spark 2.4

2018-08-15 Thread shane knapp
> > > What is the current purpose of these builds? > > to be honest, i have absolutely no idea. :) these were set up a long time ago, in a galaxy far far away, by someone who is not me. > - spark-docs seems to be building the docs, is that the only place > where the docs build is tested? > > i

Re: [DISCUSS] SparkR support on k8s back-end for Spark 2.4

2018-08-15 Thread Marcelo Vanzin
On Wed, Aug 15, 2018 at 1:35 PM, shane knapp wrote: > in fact, i don't see us getting rid of all of the centos machines until EOY > (see my above comment, re docs, release etc). these are the builds that > will remain on centos for the near future: >

Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-08-15 Thread Ryan Blue
I think I found a good solution to the problem of using Expression in the TableCatalog API and in the DeleteSupport API. For DeleteSupport, there is already a stable and public subset of Expression named Filter that can be used to pass filters. The reason why DeleteSupport would use Expression is

Re: [DISCUSS] SparkR support on k8s back-end for Spark 2.4

2018-08-15 Thread shane knapp
On Wed, Aug 15, 2018 at 12:45 PM, Reynold Xin wrote: > What's the reason we don't want to do the OS updates right now? Is it due > to the unpredictability of potential issues that might happen and end up > delaying 2.4 release? > > that is exactly it... i haven't had a chance to test everything

Re: [DISCUSS] SparkR support on k8s back-end for Spark 2.4

2018-08-15 Thread Ilan Filonenko
Correct, the OS change and updates would require more testing, from what Shane has told me, and could potentially surface some issue that could delay a major release. So yes, the release manager would need to run the tests manually and after the release we would switch to a fully integrated

Re: [DISCUSS] SparkR support on k8s back-end for Spark 2.4

2018-08-15 Thread Reynold Xin
Personally I'd love for R support to be in 2.4, but I don't consider something "Done" unless tests are running ... Is the proposal: the release manager manually run the R tests when preparing the release, and switch over to fully integrated Jenkins after 2.4.0 is released? On Wed, Aug 15, 2018 at

Re: [DISCUSS] SparkR support on k8s back-end for Spark 2.4

2018-08-15 Thread Reynold Xin
What's the reason we don't want to do the OS updates right now? Is it due to the unpredictability of potential issues that might happen and end up delaying 2.4 release? On Wed, Aug 15, 2018 at 2:33 PM Erik Erlandson wrote: > The SparkR support PR is finished, along with integration testing,

Re: [DISCUSS] SparkR support on k8s back-end for Spark 2.4

2018-08-15 Thread Ilan Filonenko
The SparkR support PR includes integration testing that can be tested on a local Minikube instance by merely running the distribution with appropriate flags (--r) and running the integration-tests similarly to as you would on any k8s test. Maybe some others could locally test this, if there is any

[DISCUSS] SparkR support on k8s back-end for Spark 2.4

2018-08-15 Thread Erik Erlandson
The SparkR support PR is finished, along with integration testing, however Shane has requested that the integration testing not be enabled until after the 2.4 release because it requires the OS updates he wants to test *after* the release. The integration testing can be run locally, and so the

Re: Naming policy for packages

2018-08-15 Thread 880f0464
IANAL, however in case of any legal action ASAF might have a pretty weak case for at least two reasons: - Spark is a common word and its usage in the names of software projects (in different) form is widespread and precedes release of Apache Spark. For example outside data processing community

Re: Naming policy for packages

2018-08-15 Thread Mark Hamstra
While it is permissible to have a maven identify like "spark-foo" from "org.bar", I'll agree with Sean that avoiding that kind of name is often wiser. It is just too easy to slip into prohibited usage if the most popular, de facto identification turns out to become "spark-foo" instead of something

Proposing an 18-month maintenance period for feature branches

2018-08-15 Thread Sean Owen
This was mentioned in another thread, but I wanted to highlight this proposed change to our release policy: https://github.com/apache/spark-website/pull/136 Right now we don't have any guidance about how long feature branches get maintenance releases. I'm proposing 18 months as a guide -- not a

Re: Naming policy for packages

2018-08-15 Thread Reynold Xin
craps? :( On Wed, Aug 15, 2018 at 11:47 AM Koert Kuipers wrote: > ok it doesnt sound so bad if the maven identifier can have spark it in. no > big deal! > > otherwise i was going to suggest "kraps". like kraps-xml > > scala> "spark".reverse > res0: String = kraps > > > On Wed, Aug 15, 2018 at

Re: Naming policy for packages

2018-08-15 Thread Koert Kuipers
ok it doesnt sound so bad if the maven identifier can have spark it in. no big deal! otherwise i was going to suggest "kraps". like kraps-xml scala> "spark".reverse res0: String = kraps On Wed, Aug 15, 2018 at 2:43 PM, Sean Owen wrote: > I'd refer you again to the trademark policy. In the

Re: Naming policy for packages

2018-08-15 Thread Sean Owen
I'd refer you again to the trademark policy. In the first link I see projects whose software ID is like "spark-foo" but title/subtitle is like "Foo for Apache Spark". This is OK. 'sparklyr' is in a gray area we've talked about before; see https://www.apache.org/foundation/marks/ as well. I think

Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-08-15 Thread Ryan Blue
I agree that it would be great to have a stable public expression API that corresponds to what is parsed, not the implementations. That would be great, but I worry that it will get out of date, and a data source that needs to support a new expression has to wait up to 6 months for a public release

Re: Naming policy for packages

2018-08-15 Thread 0xF0F0F0
Does it mean that majority of Spark related projects, including top Datatbricks (https://github.com/databricks?utf8=%E2%9C%93=spark==) or RStudio (sparklyr) contributions, violate the trademark? Sent with [ProtonMail](https://protonmail.com) Secure Email. ‐‐‐ Original Message ‐‐‐ On

Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-08-15 Thread Reynold Xin
Sorry I completely disagree with using Expression in critical public APIs that we expect a lot of developers to use. There's a huge difference between exposing InternalRow vs Expression. InternalRow is a relatively small surface (still quite large) that I can see ourselves within a version getting

Re: Naming policy for packages

2018-08-15 Thread Sean Owen
You might be interested in the full policy: https://spark.apache.org/trademarks.html What it is trying to prevent is confusion. Is spark-xml from the Spark project? Sounds like it but who knows ? What is a vendor releases ASFSpark 3.0? Are people going to think this is an official real project

Re: Naming policy for packages

2018-08-15 Thread Koert Kuipers
mhhh thats somewhat unfortunate? its helpful to me that something is called say spark-xml, it tells me its xml for spark! any other name would probably be less informative. or is this still allowed? On Wed, Aug 15, 2018 at 11:35 AM, Reynold Xin wrote: > Unfortunately that’s an Apache

Re: Naming policy for packages

2018-08-15 Thread Reynold Xin
Unfortunately that’s an Apache foundation policy and the Spark community has no power to change it. My understanding: The reason Spark can’t be in the name is because if it is used frequently enough, the foundation would lose the Spark trademark. Cheers. On Wed, Aug 15, 2018 at 7:19 AM Simon

Re: Naming policy for packages

2018-08-15 Thread Simon Dirmeier
Hey, thanks for clearning that up. Imho this is somewhat unfortunate, because package names that contain "spark", somewhat promote and advertise Apache Spark, right? Best, Simon Am 15.08.18 um 14:00 schrieb Sean Owen: You raise a great point, and we were just discussing this. The page is

Re: Naming policy for packages

2018-08-15 Thread Sean Owen
You raise a great point, and we were just discussing this. The page is old and contains many projects that were listed before the trademarks we're being enforced. Some have renamed themselves. We will update the page and remove stale or noncompliant projects and ask those that need to change to do

Naming policy for packages

2018-08-15 Thread Simon Dirmeier
Dear all, I am currently developing two OSS extension packages for spark; one related to machine learning; one related to biological applications. According to the trademark guidelines (https://spark.apache.org/trademarks.html) I am not allowed to use /Names derived from “Spark”, such as