Re: [VOTE] Release Spark 2.4.6 (RC8)

Xiao Li Wed, 03 Jun 2020 10:33:08 -0700

Yes. Spark 3.0 RC2 works well.

I think the current behavior in Spark 2.4 affects the adoption, especially
for the new users who want to try Spark in their local environment.


It impacts all our built-in clients, like Scala Shell and PySpark. Should
we consider back-porting it to 2.4?

Although this fixes the bug, it will also introduce the behavior change. We
should publicly document it and mention it in the release note. Let us
review it more carefully and understand the risk and impact.

Thanks,

Xiao

Nicholas Chammas <nicholas.cham...@gmail.com> 于2020年6月3日周三 上午10:12写道：

> I believe that was fixed in 3.0 and there was a decision not to backport
> the fix: SPARK-31170 <https://issues.apache.org/jira/browse/SPARK-31170>
>
> On Wed, Jun 3, 2020 at 1:04 PM Xiao Li <gatorsm...@gmail.com> wrote:
>
>> Just downloaded it in my local macbook. Trying to create a table using
>> the pre-built PySpark. It sounds like the conf "spark.sql.warehouse.dir"
>> does not take an effect. It is trying to create a directory in
>> "file:/user/hive/warehouse/t1". I have not done any investigation yet. Have
>> any of you hit the same issue?
>>
>> C02XT0U7JGH5:bin lixiao$ ./pyspark --conf
>> spark.sql.warehouse.dir="/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6"
>>
>> Python 2.7.16 (default, Jan 27 2020, 04:46:15)
>>
>> [GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin
>>
>> Type "help", "copyright", "credits" or "license" for more information.
>>
>> 20/06/03 09:56:11 WARN NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>>
>> Using Spark's default log4j profile:
>> org/apache/spark/log4j-defaults.properties
>>
>> Setting default log level to "WARN".
>>
>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>> setLogLevel(newLevel).
>>
>> Welcome to
>>
>>       ____              __
>>
>>      / __/__  ___ _____/ /__
>>
>>     _\ \/ _ \/ _ `/ __/  '_/
>>
>>    /__ / .__/\_,_/_/ /_/\_\   version 2.4.6
>>
>>       /_/
>>
>>
>> Using Python version 2.7.16 (default, Jan 27 2020 04:46:15)
>>
>> SparkSession available as 'spark'.
>>
>> >>> spark.sql("set spark.sql.warehouse.dir").show(truncate=False)
>>
>>
>> +-----------------------+-------------------------------------------------+
>>
>> |key                    |value
>>   |
>>
>>
>> +-----------------------+-------------------------------------------------+
>>
>>
>> |spark.sql.warehouse.dir|/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6|
>>
>>
>> +-----------------------+-------------------------------------------------+
>>
>>
>> >>> spark.sql("create table t1 (col1 int)")
>>
>> 20/06/03 09:56:29 WARN HiveMetaStore: Location:
>> file:/user/hive/warehouse/t1 specified for non-external table:t1
>>
>> Traceback (most recent call last):
>>
>>   File "<stdin>", line 1, in <module>
>>
>>   File
>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/session.py",
>> line 767, in sql
>>
>>     return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>>
>>   File
>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
>> line 1257, in __call__
>>
>>   File
>> "/Users/lixiao/Downloads/spark-2.4.6-bin-hadoop2.6/python/pyspark/sql/utils.py",
>> line 69, in deco
>>
>>     raise AnalysisException(s.split(': ', 1)[1], stackTrace)
>>
>> pyspark.sql.utils.AnalysisException:
>> u'org.apache.hadoop.hive.ql.metadata.HiveException:
>> MetaException(message:file:/user/hive/warehouse/t1 is not a directory or
>> unable to create one);'
>>
>> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2020年6月3日周三 上午9:18写道：
>>
>>> +1
>>>
>>> Bests,
>>> Dongjoon
>>>
>>> On Wed, Jun 3, 2020 at 5:59 AM Tom Graves <tgraves...@yahoo.com.invalid>
>>> wrote:
>>>
>>>>  +1
>>>>
>>>> Tom
>>>>
>>>> On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau <
>>>> hol...@pigscanfly.ca> wrote:
>>>>
>>>>
>>>> Please vote on releasing the following candidate as Apache Spark
>>>> version 2.4.6.
>>>>
>>>> The vote is open until June 5th at 9AM PST and passes if a majority +1
>>>> PMC votes are cast, with a minimum of 3 +1 votes.
>>>>
>>>> [ ] +1 Release this package as Apache Spark 2.4.6
>>>> [ ] -1 Do not release this package because ...
>>>>
>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>
>>>> There are currently no issues targeting 2.4.6 (try project = SPARK AND
>>>> "Target Version/s" = "2.4.6" AND status in (Open, Reopened, "In Progress"))
>>>>
>>>> The tag to be voted on is v2.4.6-rc8 (commit
>>>> 807e0a484d1de767d1f02bd8a622da6450bdf940):
>>>> https://github.com/apache/spark/tree/v2.4.6-rc8
>>>>
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-bin/
>>>>
>>>> Signatures used for Spark RCs can be found in this file:
>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>
>>>> The staging repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1349/
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.6-rc8-docs/
>>>>
>>>> The list of bug fixes going into 2.4.6 can be found at the following
>>>> URL:
>>>> https://issues.apache.org/jira/projects/SPARK/versions/12346781
>>>>
>>>> This release is using the release script of the tag v2.4.6-rc8.
>>>>
>>>> FAQ
>>>>
>>>> =========================
>>>> What happened to the other RCs?
>>>> =========================
>>>>
>>>> The parallel maven build caused some flakiness so I wasn't comfortable
>>>> releasing them. I backported the fix from the 3.0 branch for this release.
>>>> I've got a proposed change to the build script so that we only push tags
>>>> when once the build is a success for the future, but it does not block this
>>>> release.
>>>>
>>>> =========================
>>>> How can I help test this release?
>>>> =========================
>>>>
>>>> If you are a Spark user, you can help us test this release by taking
>>>> an existing Spark workload and running on this release candidate, then
>>>> reporting any regressions.
>>>>
>>>> If you're working in PySpark you can set up a virtual env and install
>>>> the current RC and see if anything important breaks, in the Java/Scala
>>>> you can add the staging repository to your projects resolvers and test
>>>> with the RC (make sure to clean up the artifact cache before/after so
>>>> you don't end up building with an out of date RC going forward).
>>>>
>>>> ===========================================
>>>> What should happen to JIRA tickets still targeting 2.4.6?
>>>> ===========================================
>>>>
>>>> The current list of open tickets targeted at 2.4.6 can be found at:
>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>> Version/s" = 2.4.6
>>>>
>>>> Committers should look at those and triage. Extremely important bug
>>>> fixes, documentation, and API tweaks that impact compatibility should
>>>> be worked on immediately. Everything else please retarget to an
>>>> appropriate release.
>>>>
>>>> ==================
>>>> But my bug isn't fixed?
>>>> ==================
>>>>
>>>> In order to make timely releases, we will typically not hold the
>>>> release unless the bug in question is a regression from the previous
>>>> release. That being said, if there is something which is a regression
>>>> that has not been correctly targeted please ping me or a committer to
>>>> help target the issue.
>>>>
>>>>
>>>> --
>>>> Twitter: https://twitter.com/holdenkarau
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>>

Re: [VOTE] Release Spark 2.4.6 (RC8)

Reply via email to