[ 
https://issues.apache.org/jira/browse/SPARK-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340740#comment-14340740
 ] 

Patrick Wendell commented on SPARK-6048:
----------------------------------------

Okay I just talked to [~vanzin] offline. Basically the crux of this issue is 
the following precedence question: If we have both the deprecated and new 
version of a configuration key (let's say c and c') present, how does Spark 
handle the precedence order with respect to the normal precedence order in 
Spark for configs?

There are two different possibilities:
1. Using the newer config always gives you highest precedence over any instance 
of an older config.
2. The newer and older config are treated as identical (basically, aliases) and 
only the normal precedence order applies.

We never publicly documented either approach, however Spark always has done (1) 
in the past because we've used simple fallbacks on the read side.

Personally my feeling is that we should clearly document some approach, have it 
be backwards compatible, and have the same approach for all deprecated configs. 
In the short term, the only way I see that happening is to roll back the 
existing translation on "set" (this is in some sense lossy, making it hard to 
support (1)) and stick to translation on "get". In the longer term, maybe we 
can come up with alternative ways to support these semantics.

> SparkConf.translateConfKey should translate on get, not set
> -----------------------------------------------------------
>
>                 Key: SPARK-6048
>                 URL: https://issues.apache.org/jira/browse/SPARK-6048
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.3.0
>            Reporter: Andrew Or
>            Assignee: Andrew Or
>            Priority: Blocker
>
> There are several issues with translating on set.
> (1) The most serious one is that if the user has both the deprecated and the 
> latest version of the same config set, then the value picked up by SparkConf 
> will be arbitrary. Why? Because during initialization of the conf we call 
> `conf.set` on each property in `sys.props` in an order arbitrarily defined by 
> Java. As a result, the value of the more recent config may be overridden by 
> that of the deprecated one. Instead, we should always use the value of the 
> most recent config.
> (2) If we translate on set, then we must keep translating everywhere else. In 
> fact, the current code does not translate on remove, which means the 
> following won't work if X is deprecated:
> {code}
> conf.set(X, Y)
> conf.remove(X) // X is not in the conf
> {code}
> This requires us to also translate in remove and other places, as we already 
> do for contains, leading to more duplicate code.
> (3) Since we call `conf.set` on all configs when initializing the conf, we 
> print all deprecation warnings in the beginning. Elsewhere in Spark, however, 
> we warn the user when the deprecated config / option / env var is actually 
> being used.
> We should keep this consistent so the user won't expect to find all 
> deprecation messages in the beginning of his logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to