Re: [VOTE] Apache Spark 2.1.1 (RC3)

Holden Karau Sun, 23 Apr 2017 23:45:06 -0700

Whats the regression this fixed in 2.1 from 2.0?

On Fri, Apr 21, 2017 at 7:45 PM, Wenchen Fan <[email protected]> wrote:


> IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will only
> scan all table files only once, and write back the inferred schema to
> metastore so that we don't need to do the schema inference again.
>
> So technically this will introduce a performance regression for the first
> query, but compared to branch-2.0, it's not performance regression. And
> this patch fixed a regression in branch-2.1, which can run in branch-2.0.
> Personally, I think we should keep INFER_AND_SAVE as the default mode.
>
> + [Eric], what do you think?
>
> On Sat, Apr 22, 2017 at 1:37 AM, Michael Armbrust <[email protected]>
> wrote:
>
>> Thanks for pointing this out, Michael.  Based on the conversation on the
>> PR <https://github.com/apache/spark/pull/16944#issuecomment-285529275>
>> this seems like a risky change to include in a release branch with a
>> default other than NEVER_INFER.
>>
>> +Wenchen?  What do you think?
>>
>> On Thu, Apr 20, 2017 at 4:14 PM, Michael Allman <[email protected]>
>> wrote:
>>
>>> We've identified the cause of the change in behavior. It is related to
>>> the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key
>>> and its related functionality was absent from our previous build. The
>>> default setting in the current build was causing Spark to attempt to scan
>>> all table files during query analysis. Changing this setting to NEVER_INFER
>>> disabled this operation and resolved the issue we had.
>>>
>>> Michael
>>>
>>>
>>> On Apr 20, 2017, at 3:42 PM, Michael Allman <[email protected]>
>>> wrote:
>>>
>>> I want to caution that in testing a build from this morning's branch-2.1
>>> we found that Hive partition pruning was not working. We found that Spark
>>> SQL was fetching all Hive table partitions for a very simple query whereas
>>> in a build from several weeks ago it was fetching only the required
>>> partitions. I cannot currently think of a reason for the regression outside
>>> of some difference between branch-2.1 from our previous build and
>>> branch-2.1 from this morning.
>>>
>>> That's all I know right now. We are actively investigating to find the
>>> root cause of this problem, and specifically whether this is a problem in
>>> the Spark codebase or not. I will report back when I have an answer to that
>>> question.
>>>
>>> Michael
>>>
>>>
>>> On Apr 18, 2017, at 11:59 AM, Michael Armbrust <[email protected]>
>>> wrote:
>>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00
>>> PST and passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.1.1
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.1.1-rc3
>>> <https://github.com/apache/spark/tree/v2.1.1-rc3> (2ed19cff2f6ab79
>>> a718526e5d16633412d8c4dd4)
>>>
>>> List of JIRA tickets resolved can be found with this filter
>>> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1>
>>> .
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1230/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>>>
>>>
>>> *FAQ*
>>>
>>> *How can I help test this release?*
>>>
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> *What should happen to JIRA tickets still targeting 2.1.1?*
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should be
>>> worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>>>
>>> *But my bug isn't fixed!??!*
>>>
>>> In order to make timely releases, we will typically not hold the release
>>> unless the bug in question is a regression from 2.1.0.
>>>
>>> *What happened to RC1?*
>>>
>>> There were issues with the release packaging and as a result was skipped.
>>>
>>>
>>>
>>>
>>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: [VOTE] Apache Spark 2.1.1 (RC3)

Reply via email to