[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198377709 --- Diff: python/pyspark/sql/conf.py --- @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier): (identifier, obj, type(obj).__name__)) +class ConfigEntry(object): +"""An entry contains all meta information for a configuration""" + +def __init__(self, confKey): +"""Create a new ConfigEntry with config key""" +self.confKey = confKey +self.converter = None +self.default = _NoValue + +def boolConf(self): +"""Designate current config entry is boolean config""" +self.converter = lambda x: str(x).lower() == "true" +return self + +def intConf(self): +"""Designate current config entry is integer config""" +self.converter = lambda x: int(x) +return self + +def stringConf(self): +"""Designate current config entry is string config""" +self.converter = lambda x: str(x) +return self + +def withDefault(self, default): +"""Give a default value for current config entry, the default value will be set +to _NoValue when its absent""" +self.default = default +return self + +def read(self, ctx): +"""Read value from this config entry through sql context""" +return self.converter(ctx.getConf(self.confKey, self.default)) + + +class SQLConf(object): +"""A class that enables the getting of SQL config parameters in pyspark""" + +REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\ --- End diff -- Great, thanks for your guidance, I'll take a look at them in detail. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21389 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198377607 --- Diff: python/pyspark/sql/conf.py --- @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier): (identifier, obj, type(obj).__name__)) +class ConfigEntry(object): +"""An entry contains all meta information for a configuration""" + +def __init__(self, confKey): +"""Create a new ConfigEntry with config key""" +self.confKey = confKey +self.converter = None +self.default = _NoValue + +def boolConf(self): +"""Designate current config entry is boolean config""" +self.converter = lambda x: str(x).lower() == "true" +return self + +def intConf(self): +"""Designate current config entry is integer config""" +self.converter = lambda x: int(x) +return self + +def stringConf(self): +"""Designate current config entry is string config""" +self.converter = lambda x: str(x) +return self + +def withDefault(self, default): +"""Give a default value for current config entry, the default value will be set +to _NoValue when its absent""" +self.default = default +return self + +def read(self, ctx): +"""Read value from this config entry through sql context""" +return self.converter(ctx.getConf(self.confKey, self.default)) + + +class SQLConf(object): +"""A class that enables the getting of SQL config parameters in pyspark""" + +REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\ +.boolConf()\ +.withDefault("false") + +REPL_EAGER_EVAL_MAX_NUM_ROWS = ConfigEntry("spark.sql.repl.eagerEval.maxNumRows")\ +.intConf()\ +.withDefault("20") + +REPL_EAGER_EVAL_TRUNCATE = ConfigEntry("spark.sql.repl.eagerEval.truncate")\ +.intConf()\ +.withDefault("20") + +PANDAS_RESPECT_SESSION_LOCAL_TIMEZONE = \ +ConfigEntry("spark.sql.execution.pandas.respectSessionTimeZone")\ +.boolConf() + +SESSION_LOCAL_TIMEZONE = ConfigEntry("spark.sql.session.timeZone")\ +.stringConf() + +ARROW_EXECUTION_ENABLED = ConfigEntry("spark.sql.execution.arrow.enabled")\ +.boolConf()\ +.withDefault("false") + +ARROW_FALLBACK_ENABLED = ConfigEntry("spark.sql.execution.arrow.fallback.enabled")\ +.boolConf()\ +.withDefault("true") --- End diff -- Got it, thanks for your advice. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21389 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92363/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21389 **[Test build #92363 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92363/testReport)** for PR 21389 at commit [`c306953`](https://github.com/apache/spark/commit/c30695302a2790b458c09c478b19c40ec53243c1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to manage ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21648 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to manage ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21648 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92372/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to manage ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21648 **[Test build #92372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92372/testReport)** for PR 21648 at commit [`6a77ed5`](https://github.com/apache/spark/commit/6a77ed523a27b98cc2c2887ebb85703039c318a4). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ConfigEntry(object):` * `class SQLConf(object):` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18900 If hive outputs createTime for partitions, we should do it too, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198376125 --- Diff: python/pyspark/sql/conf.py --- @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier): (identifier, obj, type(obj).__name__)) +class ConfigEntry(object): +"""An entry contains all meta information for a configuration""" + +def __init__(self, confKey): +"""Create a new ConfigEntry with config key""" +self.confKey = confKey +self.converter = None +self.default = _NoValue + +def boolConf(self): +"""Designate current config entry is boolean config""" +self.converter = lambda x: str(x).lower() == "true" +return self + +def intConf(self): +"""Designate current config entry is integer config""" +self.converter = lambda x: int(x) +return self + +def stringConf(self): +"""Designate current config entry is string config""" +self.converter = lambda x: str(x) +return self + +def withDefault(self, default): +"""Give a default value for current config entry, the default value will be set +to _NoValue when its absent""" +self.default = default +return self + +def read(self, ctx): +"""Read value from this config entry through sql context""" +return self.converter(ctx.getConf(self.confKey, self.default)) + + +class SQLConf(object): +"""A class that enables the getting of SQL config parameters in pyspark""" + +REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\ --- End diff -- or [this hack](https://github.com/apache/spark/blob/173fe450df203b262b58f7e71c6b52a79db95ee0/python/pyspark/ml/image.py#L37-L234) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user maryannxue commented on the issue: https://github.com/apache/spark/pull/20345 Can we add an end-to-end test case for this join re-order? Other than that, LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21553: [SPARK-24215][PySpark][Follow Up] Implement eager evalua...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/21553 In the last commit I revert the changes of SQLConf and created a new PR of #21648. Could this follow up PR merged first? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198375767 --- Diff: python/pyspark/sql/conf.py --- @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier): (identifier, obj, type(obj).__name__)) +class ConfigEntry(object): +"""An entry contains all meta information for a configuration""" + +def __init__(self, confKey): +"""Create a new ConfigEntry with config key""" +self.confKey = confKey +self.converter = None +self.default = _NoValue + +def boolConf(self): +"""Designate current config entry is boolean config""" +self.converter = lambda x: str(x).lower() == "true" +return self + +def intConf(self): +"""Designate current config entry is integer config""" +self.converter = lambda x: int(x) +return self + +def stringConf(self): +"""Designate current config entry is string config""" +self.converter = lambda x: str(x) +return self + +def withDefault(self, default): +"""Give a default value for current config entry, the default value will be set +to _NoValue when its absent""" +self.default = default +return self + +def read(self, ctx): +"""Read value from this config entry through sql context""" +return self.converter(ctx.getConf(self.confKey, self.default)) + + +class SQLConf(object): +"""A class that enables the getting of SQL config parameters in pyspark""" + +REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\ +.boolConf()\ +.withDefault("false") + +REPL_EAGER_EVAL_MAX_NUM_ROWS = ConfigEntry("spark.sql.repl.eagerEval.maxNumRows")\ +.intConf()\ +.withDefault("20") + +REPL_EAGER_EVAL_TRUNCATE = ConfigEntry("spark.sql.repl.eagerEval.truncate")\ +.intConf()\ +.withDefault("20") + +PANDAS_RESPECT_SESSION_LOCAL_TIMEZONE = \ +ConfigEntry("spark.sql.execution.pandas.respectSessionTimeZone")\ +.boolConf() + +SESSION_LOCAL_TIMEZONE = ConfigEntry("spark.sql.session.timeZone")\ +.stringConf() + +ARROW_EXECUTION_ENABLED = ConfigEntry("spark.sql.execution.arrow.enabled")\ +.boolConf()\ +.withDefault("false") + +ARROW_FALLBACK_ENABLED = ConfigEntry("spark.sql.execution.arrow.fallback.enabled")\ +.boolConf()\ +.withDefault("true") --- End diff -- For current approach, It introduces another chunk of codes - 100 addition and it doesn't quite look extendable too. I suggested few possible thoughts below. If that's difficult or impossible, I wouldn't do this for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21553: [SPARK-24215][PySpark][Follow Up] Implement eager...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21553#discussion_r198375614 --- Diff: docs/configuration.md --- @@ -456,33 +456,6 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. - - spark.sql.repl.eagerEval.enabled - false - -Enable eager evaluation or not. If true and the REPL you are using supports eager evaluation, -Dataset will be ran automatically. The HTML table which generated by _repl_html_ -called by notebooks like Jupyter will feedback the queries user have defined. For plain Python -REPL, the output will be shown like dataframe.show() -(see https://issues.apache.org/jira/browse/SPARK-24215;>SPARK-24215 for more details). - - - - spark.sql.repl.eagerEval.maxNumRows - 20 - -Default number of rows in eager evaluation output HTML table generated by _repr_html_ or plain text, -this only take effect when spark.sql.repl.eagerEval.enabled is set to true. - - - - spark.sql.repl.eagerEval.truncate - 20 - -Default number of truncate in eager evaluation output HTML table generated by _repr_html_ or -plain text, this only take effect when spark.sql.repl.eagerEval.enabled set to true. - - --- End diff -- this should be in `sql-programming-guide.md` right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198375556 --- Diff: python/pyspark/sql/dataframe.py --- @@ -358,22 +360,19 @@ def show(self, n=20, truncate=True, vertical=False): def _eager_eval(self): --- End diff -- If this can be replaced by sql_conf access, we don't really need this private function. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198375444 --- Diff: python/pyspark/sql/dataframe.py --- @@ -81,6 +82,7 @@ def __init__(self, jdf, sql_ctx): # Check whether _repr_html is supported or not, we use it to avoid calling _jdf twice # by __repr__ and _repr_html_ while eager evaluation opened. self._support_repr_html = False +self.sql_conf = SQLConf(sql_ctx) --- End diff -- I'd make this "private" `self._sql_conf` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198375498 --- Diff: python/pyspark/sql/dataframe.py --- @@ -81,6 +82,7 @@ def __init__(self, jdf, sql_ctx): # Check whether _repr_html is supported or not, we use it to avoid calling _jdf twice # by __repr__ and _repr_html_ while eager evaluation opened. self._support_repr_html = False +self.sql_conf = SQLConf(sql_ctx) --- End diff -- BTW, does Scala side has such attribute? If no, I won't do it in this way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198375472 --- Diff: python/pyspark/sql/conf.py --- @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier): (identifier, obj, type(obj).__name__)) +class ConfigEntry(object): +"""An entry contains all meta information for a configuration""" + +def __init__(self, confKey): +"""Create a new ConfigEntry with config key""" +self.confKey = confKey +self.converter = None +self.default = _NoValue + +def boolConf(self): +"""Designate current config entry is boolean config""" +self.converter = lambda x: str(x).lower() == "true" +return self + +def intConf(self): +"""Designate current config entry is integer config""" +self.converter = lambda x: int(x) +return self + +def stringConf(self): +"""Designate current config entry is string config""" +self.converter = lambda x: str(x) +return self + +def withDefault(self, default): +"""Give a default value for current config entry, the default value will be set +to _NoValue when its absent""" +self.default = default +return self + +def read(self, ctx): +"""Read value from this config entry through sql context""" +return self.converter(ctx.getConf(self.confKey, self.default)) + + +class SQLConf(object): +"""A class that enables the getting of SQL config parameters in pyspark""" + +REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\ +.boolConf()\ +.withDefault("false") + +REPL_EAGER_EVAL_MAX_NUM_ROWS = ConfigEntry("spark.sql.repl.eagerEval.maxNumRows")\ +.intConf()\ +.withDefault("20") + +REPL_EAGER_EVAL_TRUNCATE = ConfigEntry("spark.sql.repl.eagerEval.truncate")\ +.intConf()\ +.withDefault("20") + +PANDAS_RESPECT_SESSION_LOCAL_TIMEZONE = \ +ConfigEntry("spark.sql.execution.pandas.respectSessionTimeZone")\ +.boolConf() + +SESSION_LOCAL_TIMEZONE = ConfigEntry("spark.sql.session.timeZone")\ +.stringConf() + +ARROW_EXECUTION_ENABLED = ConfigEntry("spark.sql.execution.arrow.enabled")\ +.boolConf()\ +.withDefault("false") + +ARROW_FALLBACK_ENABLED = ConfigEntry("spark.sql.execution.arrow.fallback.enabled")\ +.boolConf()\ +.withDefault("true") --- End diff -- Just want to remove the hard coding as we discussed in https://github.com/apache/spark/pull/21370#discussion_r194276735. For the duplication of Scala code, currently I have an idea is just call buildConf and doc in Scala side to register the config and leave its doc, and manage the name also default value in python SQLConf. May I ask your suggestion? :) Thx. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198375388 --- Diff: python/pyspark/sql/conf.py --- @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier): (identifier, obj, type(obj).__name__)) +class ConfigEntry(object): +"""An entry contains all meta information for a configuration""" + +def __init__(self, confKey): +"""Create a new ConfigEntry with config key""" +self.confKey = confKey +self.converter = None +self.default = _NoValue + +def boolConf(self): +"""Designate current config entry is boolean config""" +self.converter = lambda x: str(x).lower() == "true" +return self + +def intConf(self): +"""Designate current config entry is integer config""" +self.converter = lambda x: int(x) +return self + +def stringConf(self): +"""Designate current config entry is string config""" +self.converter = lambda x: str(x) +return self + +def withDefault(self, default): +"""Give a default value for current config entry, the default value will be set +to _NoValue when its absent""" +self.default = default +return self + +def read(self, ctx): +"""Read value from this config entry through sql context""" +return self.converter(ctx.getConf(self.confKey, self.default)) + + +class SQLConf(object): +"""A class that enables the getting of SQL config parameters in pyspark""" + +REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\ +.boolConf()\ +.withDefault("false") + +REPL_EAGER_EVAL_MAX_NUM_ROWS = ConfigEntry("spark.sql.repl.eagerEval.maxNumRows")\ --- End diff -- I would also look up the attributes via Py4J and dynamically define the attributes here. It's internal purpose --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21648: [SPARK-24665][PySpark] Add SQLConf in PySpark to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21648#discussion_r198375176 --- Diff: python/pyspark/sql/conf.py --- @@ -64,6 +64,97 @@ def _checkType(self, obj, identifier): (identifier, obj, type(obj).__name__)) +class ConfigEntry(object): +"""An entry contains all meta information for a configuration""" + +def __init__(self, confKey): +"""Create a new ConfigEntry with config key""" +self.confKey = confKey +self.converter = None +self.default = _NoValue + +def boolConf(self): +"""Designate current config entry is boolean config""" +self.converter = lambda x: str(x).lower() == "true" +return self + +def intConf(self): +"""Designate current config entry is integer config""" +self.converter = lambda x: int(x) +return self + +def stringConf(self): +"""Designate current config entry is string config""" +self.converter = lambda x: str(x) +return self + +def withDefault(self, default): +"""Give a default value for current config entry, the default value will be set +to _NoValue when its absent""" +self.default = default +return self + +def read(self, ctx): +"""Read value from this config entry through sql context""" +return self.converter(ctx.getConf(self.confKey, self.default)) + + +class SQLConf(object): +"""A class that enables the getting of SQL config parameters in pyspark""" + +REPL_EAGER_EVAL_ENABLED = ConfigEntry("spark.sql.repl.eagerEval.enabled")\ --- End diff -- Can we do this by wrapping existing SQLConf? We can make them static properties by, for example, [this hack](https://github.com/graphframes/graphframes/pull/169/files#diff-e81e6b169c0aa35012a3263b2f31b330R381) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org