Re: [discuss] SparkR CRAN feasibility check server problem

2018-11-11 Thread Hyukjin Kwon
I made a PR to officially drop R prior to version 3.4 (
https://github.com/apache/spark/pull/23012).
The tests will probably fail for now since it produces warnings for using R
3.1.x.

2018년 11월 11일 (일) 오전 3:00, Felix Cheung 님이 작성:

> It’s a great point about min R version. From what I see, mostly because of
> fixes and packages support, most users of R are fairly up to date? So
> perhaps 3.4 as min version is reasonable esp. for Spark 3.
>
> Are we getting traction with CRAN sysadmin? It seems like this has been
> broken a few times.
>
>
> --
> *From:* Liang-Chi Hsieh 
> *Sent:* Saturday, November 10, 2018 2:32 AM
> *To:* dev@spark.apache.org
> *Subject:* Re: [discuss] SparkR CRAN feasibility check server problem
>
>
> Yeah, thanks Hyukjin Kwon for bringing this up for discussion.
>
> I don't know how higher versions of R are widely used across R community.
> If
> R version 3.1.x was not very commonly used, I think we can discuss to
> upgrade minimum R version in next Spark version.
>
> If we ended up with not upgrading, we can discuss with CRAN sysadmin to fix
> it by the service side automatically that prevents malformed R packages
> info. So we don't need to fix it manually every time.
>
>
>
> Hyukjin Kwon wrote
> >> Can upgrading R able to fix the issue. Is this perhaps not necessarily
> > malform but some new format for new versions perhaps?
> > That's my guess. I am not totally sure about it tho.
> >
> >> Anyway we should consider upgrading R version if that fixes the problem.
> > Yea, we should. If we should, it should be more them R 3.4. Maybe it's
> > good
> > time to start to talk about minimum R version. 3.1.x is too old. It's
> > released 4.5 years ago.
> > R 3.4.0 is released 1.5 years ago. Considering the timing for Spark 3.0,
> > deprecating lower versions, bumping up R to 3.4 might be reasonable
> > option.
> >
> > Adding Shane as well.
> >
> > If we ended up with not upgrading it, I will forward this email to CRAN
> > sysadmin to discuss further anyway.
> >
> >
> >
> > 2018년 11월 2일 (금) 오후 12:51, Felix Cheung 
>
> > felixcheung@
>
> > 님이 작성:
> >
> >> Thanks for being this up and much appreciate with keeping on top of this
> >> at all times.
> >>
> >> Can upgrading R able to fix the issue. Is this perhaps not necessarily
> >> malform but some new format for new versions perhaps? Anyway we should
> >> consider upgrading R version if that fixes the problem.
> >>
> >> As an option we could also disable the repo check in Jenkins but I can
> >> see
> >> that could also be problematic.
> >>
> >>
> >> On Thu, Nov 1, 2018 at 7:35 PM Hyukjin Kwon 
>
> > gurwls223@
>
> >  wrote:
> >>
> >>> Hi all,
> >>>
> >>> I want to raise the CRAN failure issue because it started to block
> Spark
> >>> PRs time to time. Since the number
> >>> of PRs grows hugely in Spark community, this is critical to not block
> >>> other PRs.
> >>>
> >>> There has been a problem at CRAN (See
> >>> https://github.com/apache/spark/pull/20005 for analysis).
> >>> To cut it short, the root cause is malformed package info from
> >>> https://cran.r-project.org/src/contrib/PACKAGES
> >>> from server side, and this had to be fixed by requesting it to CRAN
> >>> sysaadmin's help.
> >>>
> >>> https://issues.apache.org/jira/browse/SPARK-24152 <- newly open. I am
> >>> pretty sure it's the same issue
> >>> https://issues.apache.org/jira/browse/SPARK-25923 <- reopen/resolved 2
> >>> times
> >>> https://issues.apache.org/jira/browse/SPARK-22812
> >>>
> >>> This happened 5 times for roughly about 10 months, causing blocking
> >>> almost all PRs in Apache Spark.
> >>> Historically, it blocked whole PRs for few days once, and whole Spark
> >>> community had to stop working.
> >>>
> >>> I assume this has been not a super big big issue so far for other
> >>> projects or other people because apparently
> >>> higher version of R has some logics to handle this malformed documents
> >>> (at least I verified R 3.4.0 works fine).
> >>>
> >>> For our side, Jenkins has low R version (R 3.1.1 if that's not updated
> >>> from what I have seen before),
> >>> which is unable to parse the malformed server's response.
> >>>
> >>> So, I want to talk about how we are going to handle this. Possible
> >>> solutions are:
> >>>
> >>> 1. We should start a talk with CRAN sysadmin to permanently prevent
> this
> >>> issue
> >>> 2. We upgrade R to 3.4.0 in Jenkins (however we will not be able to
> test
> >>> low R versions)
> >>> 3. ...
> >>>
> >>> If if we fine, I would like to suggest to forward this email to CRAN
> >>> sysadmin to discuss further about this.
> >>>
> >>> Adding Liang-Chi Felix and Shivaram who I already talked about this few
> >>> times before.
> >>>
> >>> Thanks all.
> >>>
> >>>
> >>>
> >>>
>
>
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-11 Thread Felix Cheung
I opened a PR on the vignettes fix to skip eval.



From: Shivaram Venkataraman 
Sent: Wednesday, November 7, 2018 7:26 AM
To: Felix Cheung
Cc: Sean Owen; Shivaram Venkataraman; Wenchen Fan; Matei Zaharia; dev
Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

Agree with the points Felix made.

One thing is that it looks like the only problem is vignettes and the
tests are being skipped as designed. If you see
https://win-builder.r-project.org/incoming_pretest/SparkR_2.4.0_20181105_165757/Windows/00check.log
and 
https://win-builder.r-project.org/incoming_pretest/SparkR_2.4.0_20181105_165757/Debian/00check.log,
the tests run in 1s.
On Tue, Nov 6, 2018 at 1:29 PM Felix Cheung  wrote:
>
> I’d rather not mess with 2.4.0 at this point. On CRAN is nice but users can 
> also install from Apache Mirror.
>
> Also I had attempted and failed to get vignettes not to build, it was non 
> trivial and could t get it to work. It I have an idea.
>
> As for tests I don’t know exact why is it not skipped. Need to investigate 
> but worse case test_package can run with 0 test.
>
>
>
> 
> From: Sean Owen 
> Sent: Tuesday, November 6, 2018 10:51 AM
> To: Shivaram Venkataraman
> Cc: Felix Cheung; Wenchen Fan; Matei Zaharia; dev
> Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
>
> I think the second option, to skip the tests, is best right now, if
> the alternative is to have no SparkR release at all!
> Can we monkey-patch the 2.4.0 release for SparkR in this way, bless it
> from the PMC, and release that? It's drastic but so is not being able
> to release, I think.
> Right? or is CRAN not actually an important distribution path for
> SparkR in particular?
>
> On Tue, Nov 6, 2018 at 12:49 PM Shivaram Venkataraman
>  wrote:
> >
> > Right - I think we should move on with 2.4.0.
> >
> > In terms of what can be done to avoid this error there are two strategies
> > - Felix had this other thread about JDK 11 that should at least let
> > Spark run on the CRAN instance. In general this strategy isn't
> > foolproof because the JDK version and other dependencies on that
> > machine keep changing over time and we dont have much control over it.
> > Worse we also dont have much control
> > - The other solution is to not run code to build the vignettes
> > document and just have static code blocks there that have been
> > pre-evaluated / pre-populated. We can open a JIRA to discuss the
> > pros/cons of this ?
> >
> > Thanks
> > Shivaram
> >
> > On Tue, Nov 6, 2018 at 10:57 AM Felix Cheung  
> > wrote:
> > >
> > > We have not been able to publish to CRAN for quite some time (since 2.3.0 
> > > was archived - the cause is Java 11)
> > >
> > > I think it’s ok to announce the release of 2.4.0
> > >
> > >
> > > 
> > > From: Wenchen Fan 
> > > Sent: Tuesday, November 6, 2018 8:51 AM
> > > To: Felix Cheung
> > > Cc: Matei Zaharia; Sean Owen; Spark dev list; Shivaram Venkataraman
> > > Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
> > >
> > > Do you mean we should have a 2.4.0 release without CRAN and then do a 
> > > 2.4.1 immediately?
> > >
> > > On Wed, Nov 7, 2018 at 12:34 AM Felix Cheung  
> > > wrote:
> > >>
> > >> Shivaram and I were discussing.
> > >> Actually we worked with them before. Another possible approach is to 
> > >> remove the vignettes eval and all test from the source package... in the 
> > >> next release.
> > >>
> > >>
> > >> 
> > >> From: Matei Zaharia 
> > >> Sent: Tuesday, November 6, 2018 12:07 AM
> > >> To: Felix Cheung
> > >> Cc: Sean Owen; dev; Shivaram Venkataraman
> > >> Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
> > >>
> > >> Maybe it’s wroth contacting the CRAN maintainers to ask for help? 
> > >> Perhaps we aren’t disabling it correctly, or perhaps they can ignore 
> > >> this specific failure. +Shivaram who might have some ideas.
> > >>
> > >> Matei
> > >>
> > >> > On Nov 5, 2018, at 9:09 PM, Felix Cheung  
> > >> > wrote:
> > >> >
> > >> > I don¡Št know what the cause is yet.
> > >> >
> > >> > The test should be skipped because of this check
> > >> > https://github.com/apache/spark/blob/branch-2.4/R/pkg/inst/tests/testthat/test_basic.R#L21
> > >> >
> > >> > And this
> > >> > https://github.com/apache/spark/blob/branch-2.4/R/pkg/inst/tests/testthat/test_basic.R#L57
> > >> >
> > >> > But it ran:
> > >> > callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper",
> > >> >  "fit", formula,
> > >> >
> > >> > The earlier release was achived because of Java 11+ too so this 
> > >> > unfortunately isn¡Št new.
> > >> >
> > >> >
> > >> > From: Sean Owen 
> > >> > Sent: Monday, November 5, 2018 7:22 PM
> > >> > To: Felix Cheung
> > >> > Cc: dev
> > >> > Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
> > >> >
> > >> > What can we do to get the release through? is there any way to
> > >> > 

ASF board report for November

2018-11-11 Thread Matei Zaharia
It’s time to submit Spark's quarterly ASF board report on November 14th, so I 
wanted to run the text by everyone to make sure we’re not missing something. 
Let me know whether I missed anything:



Apache Spark is a fast and general engine for large-scale data processing. It 
offers high-level APIs in Java, Scala, Python and R as well as a rich set of 
libraries including stream processing, machine learning, and graph analytics. 

Project status:

- We released Apache Spark 2.4.0 on Nov 2nd, 2018 as our newest feature 
release. Spark 2.4’s features include a barrier execution mode for machine 
learning computations, higher-order functions in Spark SQL, pivot syntax in 
SQL, a built-in Apache Avro data source, Kubernetes improvements, and 
experimental support for Scala 2.12, as well as multiple smaller features and 
fixes. The release notes are available at 
http://spark.apache.org/releases/spark-release-2-4-0.html.

- We released Apache Spark 2.3.2 on Sept 24th, 2018 as a bug fix release for 
the 2.3 branch.

- Multiple dev discussions are under way about the next feature release, which 
is likely to be Spark 3.0, on our dev and user mailing lists. Some of the key 
questions are which JDK, Scala, Python, R, Hadoop and Hive versions to support, 
as well as whether to remove certain deprecated APIs. We encourage everyone in 
the community to give feedback on these discussions through the mailing lists 
and JIRA.

Trademarks:

- We are continuing engagement with various organizations.

Latest releases:

- Nov 2nd, 2018: Spark 2.4.0
- Sept 24th, 2018: Spark 2.3.2
- July 2nd, 2018: Spark 2.2.2

Committers and PMC:

- We added six new committers since the last report: Shane Knapp, Dongjoon 
Hyun, Kazuaki Ishizaki, Xingbo Jiang, Yinan Li, and Takeshi Yamamuro.
- The latest committer was added on September 18th, 2018 (Kazuaki Ishizaki).
- The latest PMC member was added on Jan 12th, 2018 (Xiao Li).


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org