GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/21190
[SPARK-22938][SQL][followup] Assert that SQLConf.get is accessed only on
the driver
## What changes were proposed in this pull request?
This is a followup of https://github.com/apache/spark/pull/20136 . #20136
didn't really work because in the test, we are using local backend, which
shares the driver side `SparkEnv`, so `SparkEnv.get.executorId ==
SparkContext.DRIVER_IDENTIFIER` doesn't work.
This PR changes the check to `TaskContext.get != null`, and move the check
to `SQLConf.get`, and fix all the places that violate this check:
* `InMemoryTableScanExec#createAndDecompressColumn` is executed inside
`rdd.map`, we can't access `conf.offHeapColumnVectorEnabled` there.
* `DataType#sameType` may be executed in executor side, for things like
json schema inference, so we can't call `conf.caseSensitiveAnalysis` there.
This contributes to most of the code changes, as we need to add `caseSensitive`
parameter to a lot of methods.
* `ParquetFilters` is used in the file scan function, which is executed in
executor side, so we can't can't call `conf.parquetFilterPushDownDate` there.
* `WindowExec#createBoundOrdering` is called on executor side, so we can't
use `conf.sessionLocalTimezone` there.
## How was this patch tested?
existing test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark minor
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21190.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21190
commit fc679098d917d226a834a8ab6d08c23dbe5bf7db
Author: Wenchen Fan
Date: 2018-04-29T01:15:14Z
SQLConf should not be accessed in executor
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org