[VOTE] SPARK 2.3.2 (RC1)
Please vote on releasing the following candidate as Apache Spark version 2.3.2. The vote is open until July 11th PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.3.2 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ The tag to be voted on is v2.3.2-rc1 (commit 4df06b45160241dbb331153efbb25703f913c192): https://github.com/apache/spark/tree/v2.3.2-rc1 The release files, including signatures, digests, etc. can be found at: https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-bin/ Signatures used for Spark RCs can be found in this file: https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1277/ The documentation corresponding to this release can be found at: https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/ The list of bug fixes going into 2.3.2 can be found at the following URL: https://issues.apache.org/jira/projects/SPARK/versions/12343289 PS. This is my first time to do release, please help to check if everything is landing correctly. Thanks ^-^ FAQ = How can I help test this release? = If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions. If you're working in PySpark you can set up a virtual env and install the current RC and see if anything important breaks, in the Java/Scala you can add the staging repository to your projects resolvers and test with the RC (make sure to clean up the artifact cache before/after so you don't end up building with a out of date RC going forward). === What should happen to JIRA tickets still targeting 2.3.2? === The current list of open tickets targeted at 2.3.2 can be found at: https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.3.2 Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to an appropriate release. == But my bug isn't fixed? == In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from the previous release. That being said, if there is something which is a regression that has not been correctly targeted please ping me or a committer to help target the issue.
[SPARK][SQL] Distributed createDataframe from many pandas DFs using Arrow
We've created a snippet that creates a Spark DF from a RDD of many pandas DFs in a distributed manner that does not require the driver to collect the entire dataset. Early tests show a performance improvement of x6-x10 over using pandasDF->Rows>sparkDF. I've seen that there are some open pull requests that change the way arrow serialization work, Should I open a pull request to add this functionality to SparkSession? (`createFromPandasDataframesRDD`) https://gist.github.com/linar-jether/7dd61ed6fa89098ab9c58a1ab428b2b5 Thanks, Linar
Re: [SPARK][SQL] Distributed createDataframe from many pandas DFs using Arrow
Hi Linar, This seems useful. But perhaps reusing the same function name is better? http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.SparkSession.createDataFrame Currently createDataFrame takes an RDD of any kind of SQL data representation(e.g. row, tuple, int, boolean, etc.), or list, or pandas.DataFrame. Perhaps we can support taking an RDD of *pandas.DataFrame *as the "data" args too? What do other people think. Li On Sun, Jul 8, 2018 at 1:13 PM, Linar Savion wrote: > We've created a snippet that creates a Spark DF from a RDD of many pandas > DFs in a distributed manner that does not require the driver to collect the > entire dataset. > > Early tests show a performance improvement of x6-x10 over using > pandasDF->Rows>sparkDF. > > I've seen that there are some open pull requests that change the way arrow > serialization work, Should I open a pull request to add this functionality > to SparkSession? (`createFromPandasDataframesRDD`) > > https://gist.github.com/linar-jether/7dd61ed6fa89098ab9c58a1ab428b2b5 > > Thanks, > Linar >
[DESIGN] Barrier Execution Mode
Hi All, I would like to invite you to review the design document for Barrier Execution Mode: https://docs.google.com/document/d/1GvcYR6ZFto3dOnjfLjZMtTezX0W5VYN9w1l4-tQXaZk/edit# TL;DR: We announced the project Hydrogen on recent Spark+AI Summit, a major part of the project involves significant changes to execution mode of Spark. This design doc proposes new APIs as well as new execution mode (known as barrier execution mode) to provide high-performance support for DL workloads. Major changes include: - Add RDDBarrier to support gang scheduling. - Add BarrierTaskContext to support global sync of all tasks in a stage; - Better fault tolerance approach for barrier stage, that in case some tasks fail in the middle, retry all tasks in the same stage. - Integrate barrier execution mode with Standalone cluster manager. Please feel free to review and discuss on the design proposal. Thanks, Xingbo
Re: [VOTE] SPARK 2.3.2 (RC1)
Just checking that the doc issue in https://issues.apache.org/jira/browse/SPARK-24530 is worked around in this release? This was pointed out as an example of a broken doc: https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression Here it is in 2.3.2 RC1: https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/_site/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression It wasn't immediately obvious to me whether this addressed the issue that was identified or not. Otherwise nothing is open for 2.3.2, sigs and license look good, tests pass as last time, etc. +1 On Sun, Jul 8, 2018 at 3:30 AM Saisai Shao wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.3.2. > > The vote is open until July 11th PST and passes if a majority +1 PMC votes > are cast, with a minimum of 3 +1 votes. > > [ ] +1 Release this package as Apache Spark 2.3.2 > [ ] -1 Do not release this package because ... > > To learn more about Apache Spark, please see http://spark.apache.org/ > > The tag to be voted on is v2.3.2-rc1 > (commit 4df06b45160241dbb331153efbb25703f913c192): > https://github.com/apache/spark/tree/v2.3.2-rc1 > > The release files, including signatures, digests, etc. can be found at: > https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-bin/ > > Signatures used for Spark RCs can be found in this file: > https://dist.apache.org/repos/dist/dev/spark/KEYS > > The staging repository for this release can be found at: > https://repository.apache.org/content/repositories/orgapachespark-1277/ > > The documentation corresponding to this release can be found at: > https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/ > > The list of bug fixes going into 2.3.2 can be found at the following URL: > https://issues.apache.org/jira/projects/SPARK/versions/12343289 > > PS. This is my first time to do release, please help to check if > everything is landing correctly. Thanks ^-^ > > FAQ > > = > How can I help test this release? > = > > If you are a Spark user, you can help us test this release by taking > an existing Spark workload and running on this release candidate, then > reporting any regressions. > > If you're working in PySpark you can set up a virtual env and install > the current RC and see if anything important breaks, in the Java/Scala > you can add the staging repository to your projects resolvers and test > with the RC (make sure to clean up the artifact cache before/after so > you don't end up building with a out of date RC going forward). > > === > What should happen to JIRA tickets still targeting 2.3.2? > === > > The current list of open tickets targeted at 2.3.2 can be found at: > https://issues.apache.org/jira/projects/SPARK and search for "Target > Version/s" = 2.3.2 > > Committers should look at those and triage. Extremely important bug > fixes, documentation, and API tweaks that impact compatibility should > be worked on immediately. Everything else please retarget to an > appropriate release. > > == > But my bug isn't fixed? > == > > In order to make timely releases, we will typically not hold the > release unless the bug in question is a regression from the previous > release. That being said, if there is something which is a regression > that has not been correctly targeted please ping me or a committer to > help target the issue. >
Re: [DESIGN] Barrier Execution Mode
Xingbo, Please reference the spip and jira ticket next time: [SPARK-24374] SPIP: Support Barrier Scheduling in Apache Spark On Sun, Jul 8, 2018 at 9:45 AM Xingbo Jiang wrote: > Hi All, > > I would like to invite you to review the design document for Barrier > Execution Mode: > > https://docs.google.com/document/d/1GvcYR6ZFto3dOnjfLjZMtTezX0W5VYN9w1l4-tQXaZk/edit# > > TL;DR: We announced the project Hydrogen on recent Spark+AI Summit, a > major part of the project involves significant changes to execution mode of > Spark. This design doc proposes new APIs as well as new execution mode > (known as barrier execution mode) to provide high-performance support for > DL workloads. > > Major changes include: > >- Add RDDBarrier to support gang scheduling. >- Add BarrierTaskContext to support global sync of all tasks in a >stage; >- Better fault tolerance approach for barrier stage, that in case some >tasks fail in the middle, retry all tasks in the same stage. >- Integrate barrier execution mode with Standalone cluster manager. > > Please feel free to review and discuss on the design proposal. > > Thanks, > Xingbo > >
Re: [SPARK][SQL] Distributed createDataframe from many pandas DFs using Arrow
Yes I would just reuse the same function. On Sun, Jul 8, 2018 at 5:01 AM Li Jin wrote: > Hi Linar, > > This seems useful. But perhaps reusing the same function name is better? > > > http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.SparkSession.createDataFrame > > Currently createDataFrame takes an RDD of any kind of SQL data > representation(e.g. row, tuple, int, boolean, etc.), or list, or > pandas.DataFrame. > > Perhaps we can support taking an RDD of *pandas.DataFrame *as the "data" > args too? > > What do other people think. > > Li > > On Sun, Jul 8, 2018 at 1:13 PM, Linar Savion > wrote: > >> We've created a snippet that creates a Spark DF from a RDD of many pandas >> DFs in a distributed manner that does not require the driver to collect the >> entire dataset. >> >> Early tests show a performance improvement of x6-x10 over using >> pandasDF->Rows>sparkDF. >> >> I've seen that there are some open pull requests that change the way >> arrow serialization work, Should I open a pull request to add this >> functionality to SparkSession? (`createFromPandasDataframesRDD`) >> >> https://gist.github.com/linar-jether/7dd61ed6fa89098ab9c58a1ab428b2b5 >> >> Thanks, >> Linar >> > >
Re: [VOTE] SPARK 2.3.2 (RC1)
Three business days might be too short. Let us open the vote until the end of this Friday (July 13th)? Cheers, Xiao 2018-07-08 10:15 GMT-07:00 Sean Owen : > Just checking that the doc issue in https://issues.apache.org/ > jira/browse/SPARK-24530 is worked around in this release? > > This was pointed out as an example of a broken doc: > https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml. > classification.LogisticRegression > > Here it is in 2.3.2 RC1: > https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1- > docs/_site/api/python/pyspark.ml.html#pyspark.ml.classification. > LogisticRegression > > It wasn't immediately obvious to me whether this addressed the issue that > was identified or not. > > > Otherwise nothing is open for 2.3.2, sigs and license look good, tests > pass as last time, etc. > > +1 > > On Sun, Jul 8, 2018 at 3:30 AM Saisai Shao wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 2.3.2. >> >> The vote is open until July 11th PST and passes if a majority +1 PMC >> votes are cast, with a minimum of 3 +1 votes. >> >> [ ] +1 Release this package as Apache Spark 2.3.2 >> [ ] -1 Do not release this package because ... >> >> To learn more about Apache Spark, please see http://spark.apache.org/ >> >> The tag to be voted on is v2.3.2-rc1 (commit >> 4df06b45160241dbb331153efbb25703f913c192): >> https://github.com/apache/spark/tree/v2.3.2-rc1 >> >> The release files, including signatures, digests, etc. can be found at: >> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-bin/ >> >> Signatures used for Spark RCs can be found in this file: >> https://dist.apache.org/repos/dist/dev/spark/KEYS >> >> The staging repository for this release can be found at: >> https://repository.apache.org/content/repositories/orgapachespark-1277/ >> >> The documentation corresponding to this release can be found at: >> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/ >> >> The list of bug fixes going into 2.3.2 can be found at the following URL: >> https://issues.apache.org/jira/projects/SPARK/versions/12343289 >> >> PS. This is my first time to do release, please help to check if >> everything is landing correctly. Thanks ^-^ >> >> FAQ >> >> = >> How can I help test this release? >> = >> >> If you are a Spark user, you can help us test this release by taking >> an existing Spark workload and running on this release candidate, then >> reporting any regressions. >> >> If you're working in PySpark you can set up a virtual env and install >> the current RC and see if anything important breaks, in the Java/Scala >> you can add the staging repository to your projects resolvers and test >> with the RC (make sure to clean up the artifact cache before/after so >> you don't end up building with a out of date RC going forward). >> >> === >> What should happen to JIRA tickets still targeting 2.3.2? >> === >> >> The current list of open tickets targeted at 2.3.2 can be found at: >> https://issues.apache.org/jira/projects/SPARK and search for "Target >> Version/s" = 2.3.2 >> >> Committers should look at those and triage. Extremely important bug >> fixes, documentation, and API tweaks that impact compatibility should >> be worked on immediately. Everything else please retarget to an >> appropriate release. >> >> == >> But my bug isn't fixed? >> == >> >> In order to make timely releases, we will typically not hold the >> release unless the bug in question is a regression from the previous >> release. That being said, if there is something which is a regression >> that has not been correctly targeted please ping me or a committer to >> help target the issue. >> >
Re: [VOTE] SPARK 2.3.2 (RC1)
Hi Sean, SPARK-24530 is not included in this RC1 release. Actually I'm so familiar with this issue so still using python2 to generate docs. In the JIRA it mentioned that python3 with sphinx could workaround this issue. @Hyukjin Kwon would you please help to clarify? Thanks Saisai Xiao Li 于2018年7月9日周一 上午1:59写道: > Three business days might be too short. Let us open the vote until the end > of this Friday (July 13th)? > > Cheers, > > Xiao > > 2018-07-08 10:15 GMT-07:00 Sean Owen : > >> Just checking that the doc issue in >> https://issues.apache.org/jira/browse/SPARK-24530 is worked around in >> this release? >> >> This was pointed out as an example of a broken doc: >> >> https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression >> >> Here it is in 2.3.2 RC1: >> >> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/_site/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression >> >> It wasn't immediately obvious to me whether this addressed the issue that >> was identified or not. >> >> >> Otherwise nothing is open for 2.3.2, sigs and license look good, tests >> pass as last time, etc. >> >> +1 >> >> On Sun, Jul 8, 2018 at 3:30 AM Saisai Shao >> wrote: >> >>> Please vote on releasing the following candidate as Apache Spark version >>> 2.3.2. >>> >>> The vote is open until July 11th PST and passes if a majority +1 PMC >>> votes are cast, with a minimum of 3 +1 votes. >>> >>> [ ] +1 Release this package as Apache Spark 2.3.2 >>> [ ] -1 Do not release this package because ... >>> >>> To learn more about Apache Spark, please see http://spark.apache.org/ >>> >>> The tag to be voted on is v2.3.2-rc1 >>> (commit 4df06b45160241dbb331153efbb25703f913c192): >>> https://github.com/apache/spark/tree/v2.3.2-rc1 >>> >>> The release files, including signatures, digests, etc. can be found at: >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-bin/ >>> >>> Signatures used for Spark RCs can be found in this file: >>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>> >>> The staging repository for this release can be found at: >>> https://repository.apache.org/content/repositories/orgapachespark-1277/ >>> >>> The documentation corresponding to this release can be found at: >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/ >>> >>> The list of bug fixes going into 2.3.2 can be found at the following URL: >>> https://issues.apache.org/jira/projects/SPARK/versions/12343289 >>> >>> PS. This is my first time to do release, please help to check if >>> everything is landing correctly. Thanks ^-^ >>> >>> FAQ >>> >>> = >>> How can I help test this release? >>> = >>> >>> If you are a Spark user, you can help us test this release by taking >>> an existing Spark workload and running on this release candidate, then >>> reporting any regressions. >>> >>> If you're working in PySpark you can set up a virtual env and install >>> the current RC and see if anything important breaks, in the Java/Scala >>> you can add the staging repository to your projects resolvers and test >>> with the RC (make sure to clean up the artifact cache before/after so >>> you don't end up building with a out of date RC going forward). >>> >>> === >>> What should happen to JIRA tickets still targeting 2.3.2? >>> === >>> >>> The current list of open tickets targeted at 2.3.2 can be found at: >>> https://issues.apache.org/jira/projects/SPARK and search for "Target >>> Version/s" = 2.3.2 >>> >>> Committers should look at those and triage. Extremely important bug >>> fixes, documentation, and API tweaks that impact compatibility should >>> be worked on immediately. Everything else please retarget to an >>> appropriate release. >>> >>> == >>> But my bug isn't fixed? >>> == >>> >>> In order to make timely releases, we will typically not hold the >>> release unless the bug in question is a regression from the previous >>> release. That being said, if there is something which is a regression >>> that has not been correctly targeted please ping me or a committer to >>> help target the issue. >>> >> >
Re: [VOTE] SPARK 2.3.2 (RC1)
Seems Python 2's Sphinx was used - https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/_site/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression and SPARK-24530 issue exists in the RC. it's kind of tricky to manually verify if Python 3 is used given my few tries in my local. I think the fix against SPARK-24530 is technically not merged yet; however, I don't think this blocks the release like the previous release. I think we could proceed in parallel. Will probably make a progress on https://github.com/apache/spark/pull/21659, and fix the release doc too. 2018년 7월 9일 (월) 오전 8:25, Saisai Shao 님이 작성: > Hi Sean, > > SPARK-24530 is not included in this RC1 release. Actually I'm so familiar > with this issue so still using python2 to generate docs. > > In the JIRA it mentioned that python3 with sphinx could workaround this > issue. @Hyukjin Kwon would you please help to > clarify? > > Thanks > Saisai > > > Xiao Li 于2018年7月9日周一 上午1:59写道: > >> Three business days might be too short. Let us open the vote until the >> end of this Friday (July 13th)? >> >> Cheers, >> >> Xiao >> >> 2018-07-08 10:15 GMT-07:00 Sean Owen : >> >>> Just checking that the doc issue in >>> https://issues.apache.org/jira/browse/SPARK-24530 is worked around in >>> this release? >>> >>> This was pointed out as an example of a broken doc: >>> >>> https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression >>> >>> Here it is in 2.3.2 RC1: >>> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/_site/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression >>> >>> It wasn't immediately obvious to me whether this addressed the issue >>> that was identified or not. >>> >>> >>> Otherwise nothing is open for 2.3.2, sigs and license look good, tests >>> pass as last time, etc. >>> >>> +1 >>> >>> On Sun, Jul 8, 2018 at 3:30 AM Saisai Shao >>> wrote: >>> Please vote on releasing the following candidate as Apache Spark version 2.3.2. The vote is open until July 11th PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.3.2 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ The tag to be voted on is v2.3.2-rc1 (commit 4df06b45160241dbb331153efbb25703f913c192): https://github.com/apache/spark/tree/v2.3.2-rc1 The release files, including signatures, digests, etc. can be found at: https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-bin/ Signatures used for Spark RCs can be found in this file: https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1277/ The documentation corresponding to this release can be found at: https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/ The list of bug fixes going into 2.3.2 can be found at the following URL: https://issues.apache.org/jira/projects/SPARK/versions/12343289 PS. This is my first time to do release, please help to check if everything is landing correctly. Thanks ^-^ FAQ = How can I help test this release? = If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions. If you're working in PySpark you can set up a virtual env and install the current RC and see if anything important breaks, in the Java/Scala you can add the staging repository to your projects resolvers and test with the RC (make sure to clean up the artifact cache before/after so you don't end up building with a out of date RC going forward). === What should happen to JIRA tickets still targeting 2.3.2? === The current list of open tickets targeted at 2.3.2 can be found at: https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.3.2 Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to an appropriate release. == But my bug isn't fixed? == In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from the previous release. That being said, if there is something which is a regression that has not been correctly targeted please ping me or a co
Re: [VOTE] SPARK 2.3.2 (RC1)
Thanks @Hyukjin Kwon . Yes I'm using python2 to build docs, looks like Python2 with Sphinx has issues. What is the pending thing for this PR ( https://github.com/apache/spark/pull/21659)? I'm planning to cut RC2 once this is merged, do you an ETA for this PR? Hyukjin Kwon 于2018年7月9日周一 上午9:06写道: > Seems Python 2's Sphinx was used - > https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/_site/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression > and SPARK-24530 issue exists in the RC. it's kind of tricky to manually > verify if Python 3 is used given my few tries in my local. > > I think the fix against SPARK-24530 is technically not merged yet; > however, I don't think this blocks the release like the previous release. I > think we could proceed in parallel. > Will probably make a progress on > https://github.com/apache/spark/pull/21659, and fix the release doc too. > > > 2018년 7월 9일 (월) 오전 8:25, Saisai Shao 님이 작성: > >> Hi Sean, >> >> SPARK-24530 is not included in this RC1 release. Actually I'm so familiar >> with this issue so still using python2 to generate docs. >> >> In the JIRA it mentioned that python3 with sphinx could workaround this >> issue. @Hyukjin Kwon would you please help to >> clarify? >> >> Thanks >> Saisai >> >> >> Xiao Li 于2018年7月9日周一 上午1:59写道: >> >>> Three business days might be too short. Let us open the vote until the >>> end of this Friday (July 13th)? >>> >>> Cheers, >>> >>> Xiao >>> >>> 2018-07-08 10:15 GMT-07:00 Sean Owen : >>> Just checking that the doc issue in https://issues.apache.org/jira/browse/SPARK-24530 is worked around in this release? This was pointed out as an example of a broken doc: https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression Here it is in 2.3.2 RC1: https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/_site/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression It wasn't immediately obvious to me whether this addressed the issue that was identified or not. Otherwise nothing is open for 2.3.2, sigs and license look good, tests pass as last time, etc. +1 On Sun, Jul 8, 2018 at 3:30 AM Saisai Shao wrote: > Please vote on releasing the following candidate as Apache Spark > version 2.3.2. > > The vote is open until July 11th PST and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1 votes. > > [ ] +1 Release this package as Apache Spark 2.3.2 > [ ] -1 Do not release this package because ... > > To learn more about Apache Spark, please see http://spark.apache.org/ > > The tag to be voted on is v2.3.2-rc1 > (commit 4df06b45160241dbb331153efbb25703f913c192): > https://github.com/apache/spark/tree/v2.3.2-rc1 > > The release files, including signatures, digests, etc. can be found at: > https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-bin/ > > Signatures used for Spark RCs can be found in this file: > https://dist.apache.org/repos/dist/dev/spark/KEYS > > The staging repository for this release can be found at: > https://repository.apache.org/content/repositories/orgapachespark-1277/ > > The documentation corresponding to this release can be found at: > https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/ > > The list of bug fixes going into 2.3.2 can be found at the following > URL: > https://issues.apache.org/jira/projects/SPARK/versions/12343289 > > PS. This is my first time to do release, please help to check if > everything is landing correctly. Thanks ^-^ > > FAQ > > = > How can I help test this release? > = > > If you are a Spark user, you can help us test this release by taking > an existing Spark workload and running on this release candidate, then > reporting any regressions. > > If you're working in PySpark you can set up a virtual env and install > the current RC and see if anything important breaks, in the Java/Scala > you can add the staging repository to your projects resolvers and test > with the RC (make sure to clean up the artifact cache before/after so > you don't end up building with a out of date RC going forward). > > === > What should happen to JIRA tickets still targeting 2.3.2? > === > > The current list of open tickets targeted at 2.3.2 can be found at: > https://issues.apache.org/jira/projects/SPARK and search for "Target > Version/s" = 2.3.2 > > Committers should look at those and triage. Extremely important bug > fixes, documentation, and API tweaks that impact compatibility should > be worked on immediately.
Re: [VOTE] SPARK 2.3.2 (RC1)
The reason is that it is not 100% clear if the root cause in the Sphinx bug is Python 2 and if the workaround is to use Python 3. Xiangrui opened a bug against Sphinx https://github.com/sphinx-doc/sphinx/issues/5142 Here is my observation: - Sphinx seems having a bug that it does not respect 'autodoc_docstring_signature' feature (which allows to override the signature in the documentation manually) does not work in few cases such as __init__, and seems failing to override its signature. - In case of Python 2, functools's wraps does not copy its signature. So, looks __init__ wrapped by a wrapper (for example, 'keyword_only') sets the wrapper's signature (*args, **kwargs). - In case of Python 3, functools.s wraps copies its signature. So, looks the documentation is fine even if, apparently, autodoc did not work. To cut it short, I am waiting for some responses at https://github.com/sphinx-doc/sphinx/issues/5142 to check and confirm that this is an issue at Sphinx and the workaround is to use Python 3. Given my observation, the workaround is to use Python 3. So, if the response is pending at Sphinx, we could probably just merge it for now. Even if the bug is fixed in Sphinx, I think we will live with this bug for long time anyway. 2018년 7월 9일 (월) 오전 9:28, Saisai Shao 님이 작성: > Thanks @Hyukjin Kwon . Yes I'm using python2 to > build docs, looks like Python2 with Sphinx has issues. > > What is the pending thing for this PR ( > https://github.com/apache/spark/pull/21659)? I'm planning to cut RC2 once > this is merged, do you an ETA for this PR? > > Hyukjin Kwon 于2018年7月9日周一 上午9:06写道: > >> Seems Python 2's Sphinx was used - >> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/_site/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression >> and SPARK-24530 issue exists in the RC. it's kind of tricky to manually >> verify if Python 3 is used given my few tries in my local. >> >> I think the fix against SPARK-24530 is technically not merged yet; >> however, I don't think this blocks the release like the previous release. I >> think we could proceed in parallel. >> Will probably make a progress on >> https://github.com/apache/spark/pull/21659, and fix the release doc too. >> >> >> 2018년 7월 9일 (월) 오전 8:25, Saisai Shao 님이 작성: >> >>> Hi Sean, >>> >>> SPARK-24530 is not included in this RC1 release. Actually I'm so >>> familiar with this issue so still using python2 to generate docs. >>> >>> In the JIRA it mentioned that python3 with sphinx could workaround this >>> issue. @Hyukjin Kwon would you please help to >>> clarify? >>> >>> Thanks >>> Saisai >>> >>> >>> Xiao Li 于2018年7月9日周一 上午1:59写道: >>> Three business days might be too short. Let us open the vote until the end of this Friday (July 13th)? Cheers, Xiao 2018-07-08 10:15 GMT-07:00 Sean Owen : > Just checking that the doc issue in > https://issues.apache.org/jira/browse/SPARK-24530 is worked around in > this release? > > This was pointed out as an example of a broken doc: > > https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression > > Here it is in 2.3.2 RC1: > > https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/_site/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression > > It wasn't immediately obvious to me whether this addressed the issue > that was identified or not. > > > Otherwise nothing is open for 2.3.2, sigs and license look good, tests > pass as last time, etc. > > +1 > > On Sun, Jul 8, 2018 at 3:30 AM Saisai Shao > wrote: > >> Please vote on releasing the following candidate as Apache Spark >> version 2.3.2. >> >> The vote is open until July 11th PST and passes if a majority +1 PMC >> votes are cast, with a minimum of 3 +1 votes. >> >> [ ] +1 Release this package as Apache Spark 2.3.2 >> [ ] -1 Do not release this package because ... >> >> To learn more about Apache Spark, please see http://spark.apache.org/ >> >> The tag to be voted on is v2.3.2-rc1 >> (commit 4df06b45160241dbb331153efbb25703f913c192): >> https://github.com/apache/spark/tree/v2.3.2-rc1 >> >> The release files, including signatures, digests, etc. can be found >> at: >> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-bin/ >> >> Signatures used for Spark RCs can be found in this file: >> https://dist.apache.org/repos/dist/dev/spark/KEYS >> >> The staging repository for this release can be found at: >> >> https://repository.apache.org/content/repositories/orgapachespark-1277/ >> >> The documentation corresponding to this release can be found at: >> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/ >> >> The list of bug fixes going into 2.3.2 can be found at the following >> URL: >> https:/