[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198378267 --- Diff: python/pyspark/sql/dataframe.py --- @@ -81,6 +82,7 @@ def __init__(self, jdf, sql_ctx): # Check whether _repr_html is supported or not, we use it to avoid calling _jdf twice # by __repr__ and _repr_html_ while eager evaluation opened. self._support_repr_html = False +self.sql_conf = SQLConf(sql_ctx) --- End diff -- Got it, I'll try to do this by wrapping existing SQLConf. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198378065 --- Diff: python/pyspark/sql/dataframe.py --- @@ -358,22 +360,19 @@ def show(self, n=20, truncate=True, vertical=False): def _eager_eval(self): --- End diff -- Yep, agree. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198377709 --- Diff: python/pyspark/sql/conf.py --- @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier): (identifier, obj, type(obj).__name__)) +class ConfigEntry(object): +"""An entry contains all meta information for a configuration""" + +def __init__(self, confKey): +"""Create a new ConfigEntry with config key""" +self.confKey = confKey +self.converter = None +self.default = _NoValue + +def boolConf(self): +"""Designate current config entry is boolean config""" +self.converter = lambda x: str(x).lower() == "true" +return self + +def intConf(self): +"""Designate current config entry is integer config""" +self.converter = lambda x: int(x) +return self + +def stringConf(self): +"""Designate current config entry is string config""" +self.converter = lambda x: str(x) +return self + +def withDefault(self, default): +"""Give a default value for current config entry, the default value will be set +to _NoValue when its absent""" +self.default = default +return self + +def read(self, ctx): +"""Read value from this config entry through sql context""" +return self.converter(ctx.getConf(self.confKey, self.default)) + + +class SQLConf(object): +"""A class that enables the getting of SQL config parameters in pyspark""" + +REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\ --- End diff -- Great, thanks for your guidance, I'll take a look at them in detail. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198377607 --- Diff: python/pyspark/sql/conf.py --- @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier): (identifier, obj, type(obj).__name__)) +class ConfigEntry(object): +"""An entry contains all meta information for a configuration""" + +def __init__(self, confKey): +"""Create a new ConfigEntry with config key""" +self.confKey = confKey +self.converter = None +self.default = _NoValue + +def boolConf(self): +"""Designate current config entry is boolean config""" +self.converter = lambda x: str(x).lower() == "true" +return self + +def intConf(self): +"""Designate current config entry is integer config""" +self.converter = lambda x: int(x) +return self + +def stringConf(self): +"""Designate current config entry is string config""" +self.converter = lambda x: str(x) +return self + +def withDefault(self, default): +"""Give a default value for current config entry, the default value will be set +to _NoValue when its absent""" +self.default = default +return self + +def read(self, ctx): +"""Read value from this config entry through sql context""" +return self.converter(ctx.getConf(self.confKey, self.default)) + + +class SQLConf(object): +"""A class that enables the getting of SQL config parameters in pyspark""" + +REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\ +.boolConf()\ +.withDefault("false") + +REPL_EAGER_EVAL_MAX_NUM_ROWS = ConfigEntry("spark.sql.repl.eagerEval.maxNumRows")\ +.intConf()\ +.withDefault("20") + +REPL_EAGER_EVAL_TRUNCATE = ConfigEntry("spark.sql.repl.eagerEval.truncate")\ +.intConf()\ +.withDefault("20") + +PANDAS_RESPECT_SESSION_LOCAL_TIMEZONE = \ +ConfigEntry("spark.sql.execution.pandas.respectSessionTimeZone")\ +.boolConf() + +SESSION_LOCAL_TIMEZONE = ConfigEntry("spark.sql.session.timeZone")\ +.stringConf() + +ARROW_EXECUTION_ENABLED = ConfigEntry("spark.sql.execution.arrow.enabled")\ +.boolConf()\ +.withDefault("false") + +ARROW_FALLBACK_ENABLED = ConfigEntry("spark.sql.execution.arrow.fallback.enabled")\ +.boolConf()\ +.withDefault("true") --- End diff -- Got it, thanks for your advice. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198376125 --- Diff: python/pyspark/sql/conf.py --- @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier): (identifier, obj, type(obj).__name__)) +class ConfigEntry(object): +"""An entry contains all meta information for a configuration""" + +def __init__(self, confKey): +"""Create a new ConfigEntry with config key""" +self.confKey = confKey +self.converter = None +self.default = _NoValue + +def boolConf(self): +"""Designate current config entry is boolean config""" +self.converter = lambda x: str(x).lower() == "true" +return self + +def intConf(self): +"""Designate current config entry is integer config""" +self.converter = lambda x: int(x) +return self + +def stringConf(self): +"""Designate current config entry is string config""" +self.converter = lambda x: str(x) +return self + +def withDefault(self, default): +"""Give a default value for current config entry, the default value will be set +to _NoValue when its absent""" +self.default = default +return self + +def read(self, ctx): +"""Read value from this config entry through sql context""" +return self.converter(ctx.getConf(self.confKey, self.default)) + + +class SQLConf(object): +"""A class that enables the getting of SQL config parameters in pyspark""" + +REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\ --- End diff -- or [this hack](https://github.com/apache/spark/blob/173fe450df203b262b58f7e71c6b52a79db95ee0/python/pyspark/ml/image.py#L37-L234) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198375767 --- Diff: python/pyspark/sql/conf.py --- @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier): (identifier, obj, type(obj).__name__)) +class ConfigEntry(object): +"""An entry contains all meta information for a configuration""" + +def __init__(self, confKey): +"""Create a new ConfigEntry with config key""" +self.confKey = confKey +self.converter = None +self.default = _NoValue + +def boolConf(self): +"""Designate current config entry is boolean config""" +self.converter = lambda x: str(x).lower() == "true" +return self + +def intConf(self): +"""Designate current config entry is integer config""" +self.converter = lambda x: int(x) +return self + +def stringConf(self): +"""Designate current config entry is string config""" +self.converter = lambda x: str(x) +return self + +def withDefault(self, default): +"""Give a default value for current config entry, the default value will be set +to _NoValue when its absent""" +self.default = default +return self + +def read(self, ctx): +"""Read value from this config entry through sql context""" +return self.converter(ctx.getConf(self.confKey, self.default)) + + +class SQLConf(object): +"""A class that enables the getting of SQL config parameters in pyspark""" + +REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\ +.boolConf()\ +.withDefault("false") + +REPL_EAGER_EVAL_MAX_NUM_ROWS = ConfigEntry("spark.sql.repl.eagerEval.maxNumRows")\ +.intConf()\ +.withDefault("20") + +REPL_EAGER_EVAL_TRUNCATE = ConfigEntry("spark.sql.repl.eagerEval.truncate")\ +.intConf()\ +.withDefault("20") + +PANDAS_RESPECT_SESSION_LOCAL_TIMEZONE = \ +ConfigEntry("spark.sql.execution.pandas.respectSessionTimeZone")\ +.boolConf() + +SESSION_LOCAL_TIMEZONE = ConfigEntry("spark.sql.session.timeZone")\ +.stringConf() + +ARROW_EXECUTION_ENABLED = ConfigEntry("spark.sql.execution.arrow.enabled")\ +.boolConf()\ +.withDefault("false") + +ARROW_FALLBACK_ENABLED = ConfigEntry("spark.sql.execution.arrow.fallback.enabled")\ +.boolConf()\ +.withDefault("true") --- End diff -- For current approach, It introduces another chunk of codes - 100 addition and it doesn't quite look extendable too. I suggested few possible thoughts below. If that's difficult or impossible, I wouldn't do this for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198375556 --- Diff: python/pyspark/sql/dataframe.py --- @@ -358,22 +360,19 @@ def show(self, n=20, truncate=True, vertical=False): def _eager_eval(self): --- End diff -- If this can be replaced by sql_conf access, we don't really need this private function. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198375444 --- Diff: python/pyspark/sql/dataframe.py --- @@ -81,6 +82,7 @@ def __init__(self, jdf, sql_ctx): # Check whether _repr_html is supported or not, we use it to avoid calling _jdf twice # by __repr__ and _repr_html_ while eager evaluation opened. self._support_repr_html = False +self.sql_conf = SQLConf(sql_ctx) --- End diff -- I'd make this "private" `self._sql_conf` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198375498 --- Diff: python/pyspark/sql/dataframe.py --- @@ -81,6 +82,7 @@ def __init__(self, jdf, sql_ctx): # Check whether _repr_html is supported or not, we use it to avoid calling _jdf twice # by __repr__ and _repr_html_ while eager evaluation opened. self._support_repr_html = False +self.sql_conf = SQLConf(sql_ctx) --- End diff -- BTW, does Scala side has such attribute? If no, I won't do it in this way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198375472 --- Diff: python/pyspark/sql/conf.py --- @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier): (identifier, obj, type(obj).__name__)) +class ConfigEntry(object): +"""An entry contains all meta information for a configuration""" + +def __init__(self, confKey): +"""Create a new ConfigEntry with config key""" +self.confKey = confKey +self.converter = None +self.default = _NoValue + +def boolConf(self): +"""Designate current config entry is boolean config""" +self.converter = lambda x: str(x).lower() == "true" +return self + +def intConf(self): +"""Designate current config entry is integer config""" +self.converter = lambda x: int(x) +return self + +def stringConf(self): +"""Designate current config entry is string config""" +self.converter = lambda x: str(x) +return self + +def withDefault(self, default): +"""Give a default value for current config entry, the default value will be set +to _NoValue when its absent""" +self.default = default +return self + +def read(self, ctx): +"""Read value from this config entry through sql context""" +return self.converter(ctx.getConf(self.confKey, self.default)) + + +class SQLConf(object): +"""A class that enables the getting of SQL config parameters in pyspark""" + +REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\ +.boolConf()\ +.withDefault("false") + +REPL_EAGER_EVAL_MAX_NUM_ROWS = ConfigEntry("spark.sql.repl.eagerEval.maxNumRows")\ +.intConf()\ +.withDefault("20") + +REPL_EAGER_EVAL_TRUNCATE = ConfigEntry("spark.sql.repl.eagerEval.truncate")\ +.intConf()\ +.withDefault("20") + +PANDAS_RESPECT_SESSION_LOCAL_TIMEZONE = \ +ConfigEntry("spark.sql.execution.pandas.respectSessionTimeZone")\ +.boolConf() + +SESSION_LOCAL_TIMEZONE = ConfigEntry("spark.sql.session.timeZone")\ +.stringConf() + +ARROW_EXECUTION_ENABLED = ConfigEntry("spark.sql.execution.arrow.enabled")\ +.boolConf()\ +.withDefault("false") + +ARROW_FALLBACK_ENABLED = ConfigEntry("spark.sql.execution.arrow.fallback.enabled")\ +.boolConf()\ +.withDefault("true") --- End diff -- Just want to remove the hard coding as we discussed in https://github.com/apache/spark/pull/21370#discussion_r194276735. For the duplication of Scala code, currently I have an idea is just call buildConf and doc in Scala side to register the config and leave its doc, and manage the name also default value in python SQLConf. May I ask your suggestion? :) Thx. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198375388 --- Diff: python/pyspark/sql/conf.py --- @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier): (identifier, obj, type(obj).__name__)) +class ConfigEntry(object): +"""An entry contains all meta information for a configuration""" + +def __init__(self, confKey): +"""Create a new ConfigEntry with config key""" +self.confKey = confKey +self.converter = None +self.default = _NoValue + +def boolConf(self): +"""Designate current config entry is boolean config""" +self.converter = lambda x: str(x).lower() == "true" +return self + +def intConf(self): +"""Designate current config entry is integer config""" +self.converter = lambda x: int(x) +return self + +def stringConf(self): +"""Designate current config entry is string config""" +self.converter = lambda x: str(x) +return self + +def withDefault(self, default): +"""Give a default value for current config entry, the default value will be set +to _NoValue when its absent""" +self.default = default +return self + +def read(self, ctx): +"""Read value from this config entry through sql context""" +return self.converter(ctx.getConf(self.confKey, self.default)) + + +class SQLConf(object): +"""A class that enables the getting of SQL config parameters in pyspark""" + +REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\ +.boolConf()\ +.withDefault("false") + +REPL_EAGER_EVAL_MAX_NUM_ROWS = ConfigEntry("spark.sql.repl.eagerEval.maxNumRows")\ --- End diff -- I would also look up the attributes via Py4J and dynamically define the attributes here. It's internal purpose --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198375176 --- Diff: python/pyspark/sql/conf.py --- @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier): (identifier, obj, type(obj).__name__)) +class ConfigEntry(object): +"""An entry contains all meta information for a configuration""" + +def __init__(self, confKey): +"""Create a new ConfigEntry with config key""" +self.confKey = confKey +self.converter = None +self.default = _NoValue + +def boolConf(self): +"""Designate current config entry is boolean config""" +self.converter = lambda x: str(x).lower() == "true" +return self + +def intConf(self): +"""Designate current config entry is integer config""" +self.converter = lambda x: int(x) +return self + +def stringConf(self): +"""Designate current config entry is string config""" +self.converter = lambda x: str(x) +return self + +def withDefault(self, default): +"""Give a default value for current config entry, the default value will be set +to _NoValue when its absent""" +self.default = default +return self + +def read(self, ctx): +"""Read value from this config entry through sql context""" +return self.converter(ctx.getConf(self.confKey, self.default)) + + +class SQLConf(object): +"""A class that enables the getting of SQL config parameters in pyspark""" + +REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\ --- End diff -- Can we do this by wrapping existing SQLConf? We can make them static properties by, for example, [this hack](https://github.com/graphframes/graphframes/pull/169/files#diff-e81e6b169c0aa35012a3263b2f31b330R381) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198373530 --- Diff: python/pyspark/sql/conf.py --- @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier): (identifier, obj, type(obj).__name__)) +class ConfigEntry(object): +"""An entry contains all meta information for a configuration""" + +def __init__(self, confKey): +"""Create a new ConfigEntry with config key""" +self.confKey = confKey +self.converter = None +self.default = _NoValue + +def boolConf(self): +"""Designate current config entry is boolean config""" +self.converter = lambda x: str(x).lower() == "true" +return self + +def intConf(self): +"""Designate current config entry is integer config""" +self.converter = lambda x: int(x) +return self + +def stringConf(self): +"""Designate current config entry is string config""" +self.converter = lambda x: str(x) +return self + +def withDefault(self, default): +"""Give a default value for current config entry, the default value will be set +to _NoValue when its absent""" +self.default = default +return self + +def read(self, ctx): +"""Read value from this config entry through sql context""" +return self.converter(ctx.getConf(self.confKey, self.default)) + + +class SQLConf(object): +"""A class that enables the getting of SQL config parameters in pyspark""" + +REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\ +.boolConf()\ +.withDefault("false") + +REPL_EAGER_EVAL_MAX_NUM_ROWS = ConfigEntry("spark.sql.repl.eagerEval.maxNumRows")\ +.intConf()\ +.withDefault("20") + +REPL_EAGER_EVAL_TRUNCATE = ConfigEntry("spark.sql.repl.eagerEval.truncate")\ +.intConf()\ +.withDefault("20") + +PANDAS_RESPECT_SESSION_LOCAL_TIMEZONE = \ +ConfigEntry("spark.sql.execution.pandas.respectSessionTimeZone")\ +.boolConf() + +SESSION_LOCAL_TIMEZONE = ConfigEntry("spark.sql.session.timeZone")\ +.stringConf() + +ARROW_EXECUTION_ENABLED = ConfigEntry("spark.sql.execution.arrow.enabled")\ +.boolConf()\ +.withDefault("false") + +ARROW_FALLBACK_ENABLED = ConfigEntry("spark.sql.execution.arrow.fallback.enabled")\ +.boolConf()\ +.withDefault("true") --- End diff -- This duplicates codes in Scala side. What's a gain by doing this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
GitHub user xuanyuanking opened a pull request: https://github.com/apache/spark/pull/21648 [SPARK-24665][PySpark] Add SQLConf in PySpark to manage all sql configs ## What changes were proposed in this pull request? Add SQLConf and ConfigEntry for PySpark to manage all sql configs, drop all the hard code in config usage. ## How was this patch tested? Existing UT. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuanyuanking/spark SPARK-24665 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21648.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21648 commit 6a77ed523a27b98cc2c2887ebb85703039c318a4 Author: Yuanjian Li Date: 2018-06-26T15:12:17Z Move all core configs into a single class --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org