Re: [VOTE] Apache Spark 2.1.0 (RC5)

Holden Karau Fri, 16 Dec 2016 00:50:02 -0800

Thanks for the specific mention of the new PySpark packaging Shivaram,

For *nix (Linux, Unix, OS X, etc.) Python users interested in helping test
the new artifacts you can do as follows:

Setup PySpark with pip by:

1. Download the artifact from
http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/pyspark-2.1.0+hadoop2.7.tar.gz
2. (Optional): Create a virtual env (e.g. virtualenv /tmp/pysparktest;
source /tmp/pysparktest/bin/activate)
3. (Possibly required depending on pip version): Upgrade pip to a recent
version (e.g. pip install --upgrade pip)
3. Install the package with pip install pyspark-2.1.0+hadoop2.7.tar.gz
4. If you have SPARK_HOME set to any specific path unset it to force the
pip installed pyspark to run with its provided jars

In the future we hope to publish to PyPI allowing you to skip the download
step, but there just wasn't a chance to get that part included for this
release. If everything goes smoothly hopefully we can add that soon (see
SPARK-18128 <https://issues.apache.org/jira/browse/SPARK-18128>) :)

Some things to verify:
1) Verify you can start the PySpark shell (e.g. run pyspark)
2) Verify you can start PySpark from python (e.g. run python, verify you
can import pyspark and construct a SparkContext).
3) Verify you PySpark programs works with pip installed PySpark as well as
regular spark (e.g. spark-submit my-workload.py)
4) Have a different version of Spark downloaded locally as well? Verify
that launches and runs correctly & pip installed PySpark is not taking
precedence (make sure to use the fully qualified path when executing).

Some things that are explicitly not supported in pip installed PySpark:
1) Starting a new standalone cluster with pip installed PySpark (connecting
to an existing standalone cluster is expected to work)
2) non-Python Spark interfaces (e.g. don't pip install pypsark for SparkR,
use the SparkR packaging instead :)).
3) PyPi - if things go well coming in a future release (track the progress
on https://issues.apache.org/jira/browse/SPARK-18128)
4) Python versions prior to 2.7
5) Full Windows support - later follow up task (if your interested in this
please chat with me or see https://issues.apache.org/jira/browse/SPARK-18136
)

Post verification cleanup:
1. Uninstall the pip installed PySpark since it is just an RC and you don't
want it getting in the way later (e.g. pip uninstall pypsark-2.1.0 )
2 (Optional). deactivate your pip environment

If anyone has any questions about the new PySpark packaging I'm more than
happy to chat :)

Cheers,

Holden :)

On Thu, Dec 15, 2016 at 9:44 PM, Reynold Xin <r...@databricks.com> wrote:

> I'm going to start this with a +1!
>
>
> On Thu, Dec 15, 2016 at 9:42 PM, Shivaram Venkataraman <
> shiva...@eecs.berkeley.edu> wrote:
>
>> In addition to usual binary artifacts, this is the first release where
>> we have installable packages for Python [1] and R [2] that are part of
>> the release.  I'm including instructions to test the R package below.
>> Holden / other Python developers can chime in if there are special
>> instructions to test the pip package.
>>
>> To test the R source package you can follow the following commands.
>> 1. Download the SparkR source package from
>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.
>> 0-rc5-bin/SparkR_2.1.0.tar.gz
>> 2. Install the source package with R CMD INSTALL SparkR_2.1.0.tar.gz
>> 3. As the SparkR package doesn't contain Spark JARs (this is due to
>> package size limits from CRAN), we'll need to run [3]
>> export SPARKR_RELEASE_DOWNLOAD_URL="http://people.apache.org/~pwend
>> ell/spark-releases/spark-2.1.0-rc5-bin/spark-2.1.0-bin-hadoop2.6.tgz"
>> 4. Launch R. You can now use include SparkR with `library(SparkR)` and
>> test it with your applications.
>> 5. Note that the first time a SparkSession is created the binary
>> artifacts will the downloaded.
>>
>> Thanks
>> Shivaram
>>
>> [1] https://issues.apache.org/jira/browse/SPARK-18267
>> [2] https://issues.apache.org/jira/browse/SPARK-18590
>> [3] Note that this isn't required once 2.1.0 has been released as
>> SparkR can automatically resolve and download releases.
>>
>> On Thu, Dec 15, 2016 at 9:16 PM, Reynold Xin <r...@databricks.com> wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and
>> passes
>> > if a majority of at least 3 +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.1.0
>> > [ ] -1 Do not release this package because ...
>> >
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.1.0-rc5
>> > (cd0a08361e2526519e7c131c42116bf56fa62c76)
>> >
>> > List of JIRA tickets resolved are:
>> > https://issues.apache.org/jira/issues/?jql=project%20%3D%
>> 20SPARK%20AND%20fixVersion%20%3D%202.1.0
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>> >
>> > Release artifacts are signed with the following key:
>> > https://people.apache.org/keys/committer/pwendell.asc
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1223/
>> >
>> > The documentation corresponding to this release can be found at:
>> > http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>> >
>> >
>> > FAQ
>> >
>> > How can I help test this release?
>> >
>> > If you are a Spark user, you can help us test this release by taking an
>> > existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > What should happen to JIRA tickets still targeting 2.1.0?
>> >
>> > Committers should look at those and triage. Extremely important bug
>> fixes,
>> > documentation, and API tweaks that impact compatibility should be
>> worked on
>> > immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>> >
>> > What happened to RC3/RC5?
>> >
>> > They had issues withe release packaging and as a result were skipped.
>> >
>>
>
>

-- 
Twitter: https://twitter.com/holdenkarau

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Reply via email to