[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194641579 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -3209,6 +3222,19 @@ class Dataset[T] private[sql]( } } + private[sql] def getRowsToPython( + _numRows: Int, + truncate: Int, + vertical: Boolean): Array[Any] = { +EvaluatePython.registerPicklers() +val numRows = _numRows.max(0).min(Int.MaxValue - 1) +val rows = getRows(numRows, truncate, vertical).map(_.toArray).toArray +val toJava: (Any) => Any = EvaluatePython.toJava(_, ArrayType(ArrayType(StringType))) +val iter: Iterator[Array[Byte]] = new SerDeUtil.AutoBatchedPickler( + rows.iterator.map(toJava)) +PythonRDD.serveIterator(iter, "serve-GetRows") --- End diff -- I think we return `Array[Any]` for `PythonRDD.serveIterator` too. https://github.com/apache/spark/blob/628c7b517969c4a7ccb26ea67ab3dd61266073ca/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala#L400 Did I maybe miss something? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194629747 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -3209,6 +3222,19 @@ class Dataset[T] private[sql]( } } + private[sql] def getRowsToPython( + _numRows: Int, + truncate: Int, + vertical: Boolean): Array[Any] = { +EvaluatePython.registerPicklers() +val numRows = _numRows.max(0).min(Int.MaxValue - 1) +val rows = getRows(numRows, truncate, vertical).map(_.toArray).toArray +val toJava: (Any) => Any = EvaluatePython.toJava(_, ArrayType(ArrayType(StringType))) +val iter: Iterator[Array[Byte]] = new SerDeUtil.AutoBatchedPickler( + rows.iterator.map(toJava)) +PythonRDD.serveIterator(iter, "serve-GetRows") --- End diff -- `PythonRDD.serveIterator(iter, "serve-GetRows")` returns `Int`, but the return type of `getRowsToPython ` is `Array[Any]`. How does it work? cc @xuanyuanking @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194292067 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): --- End diff -- This PR also changed `__repr__`. Thus, we need to update the PR title and description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194287915 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" --- End diff -- In the ongoing release, a nice-to-have refactoring is to move all the Core Confs into a single file just like what we did in Spark SQL Conf. Default values, boundary checking, types and descriptions. Thus, in PySpark, it would be better to do it starting from now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194278100 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" --- End diff -- Probably, we should access to SQLConf object. 1. Agree with not hardcoding it in general but 2. IMHO I want to avoid Py4J JVM accesses in the test because the test can likely be more flaky up to my knowledge, on the other hand (unlike Scala or Java side). Maybe we should try to take a look about this hardcoding if we see more occurrences next time --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194277542 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and the REPL you are using supports eager evaluation, --- End diff -- Just a question. When the REPL does not support eager evaluation, could we do anything better instead of silently ignoring the user inputs? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194277082 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -3209,6 +3222,19 @@ class Dataset[T] private[sql]( } } + private[sql] def getRowsToPython( --- End diff -- In DataFrameSuite, we have multiple test cases for `showString` instead of `getRows `, which is introduced in this PR. We also need the unit test cases for `getRowsToPython`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194276795 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + --- End diff -- These confs are not part of `spark.sql("SET -v").show(numRows = 200, truncate = false)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194276735 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" --- End diff -- Is that possible we can avoid hard-coding these conf key values? cc @ueshin @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194276557 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + --- End diff -- All the SQL configurations should follow what we did in the section of `Spark SQL` https://spark.apache.org/docs/latest/configuration.html#spark-sql. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194276329 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +if not self._support_repr_html and self._eager_eval: +vertical = False +return self._jdf.showString( +self._max_num_rows, self._truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you are +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +if self._eager_eval: +max_num_rows = max(self._max_num_rows, 0) +vertical = False --- End diff -- Any discussion about this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194276298 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -3209,6 +3222,19 @@ class Dataset[T] private[sql]( } } + private[sql] def getRowsToPython( + _numRows: Int, + truncate: Int, + vertical: Boolean): Array[Any] = { +EvaluatePython.registerPicklers() +val numRows = _numRows.max(0).min(Int.MaxValue - 1) --- End diff -- This should be also part of the conf description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194276179 --- Diff: python/pyspark/sql/tests.py --- @@ -3074,6 +3074,36 @@ def test_checking_csv_header(self): finally: shutil.rmtree(path) +def test_repr_html(self): --- End diff -- This function only covers the most basic positive case. We need also add more test cases. For example, the results when `spark.sql.repl.eagerEval.enabled` is set to `false`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194275282 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and the REPL you are using supports eager evaluation, +Dataset will be ran automatically. The HTML table which generated by _repl_html_ +called by notebooks like Jupyter will feedback the queries user have defined. For plain Python +REPL, the output will be shown like dataframe.show() +(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for more details). + + + + spark.sql.repl.eagerEval.maxNumRows + 20 + +Default number of rows in eager evaluation output HTML table generated by _repr_html_ or plain text, +this only take effect when spark.sql.repl.eagerEval.enabled is set to true. --- End diff -- take -> takes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r194275288 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and the REPL you are using supports eager evaluation, +Dataset will be ran automatically. The HTML table which generated by _repl_html_ +called by notebooks like Jupyter will feedback the queries user have defined. For plain Python +REPL, the output will be shown like dataframe.show() +(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for more details). + + + + spark.sql.repl.eagerEval.maxNumRows + 20 + +Default number of rows in eager evaluation output HTML table generated by _repr_html_ or plain text, +this only take effect when spark.sql.repl.eagerEval.enabled is set to true. + + + + spark.sql.repl.eagerEval.truncate + 20 + +Default number of truncate in eager evaluation output HTML table generated by _repr_html_ or +plain text, this only take effect when spark.sql.repl.eagerEval.enabled set to true. --- End diff -- take -> takes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21370 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192772218 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +if not self._support_repr_html and self._eager_eval: +vertical = False +return self._jdf.showString( +self._max_num_rows, self._truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you are +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +if self._eager_eval: +max_num_rows = max(self._max_num_rows, 0) +with SCCallSiteSync(self._sc) as css: +vertical = False +sock_info = self._jdf.getRowsToPython( +max_num_rows, self._truncate, vertical) +rows = list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer( +head = rows[0] +row_data = rows[1:] +has_more_data = len(row_data) > max_num_rows +row_data = row_data[0:max_num_rows] + +html = "\n" +# generate table head +html += "".join(map(lambda x: cgi.escape(x), head)) + "\n" +# generate table rows +for row in row_data: +data = "" + "".join(map(lambda x: cgi.escape(x), row)) + \ +"\n" +html += data +html += "\n" +if has_more_data: +html += "only showing top %d %s\n" % ( --- End diff -- Maybe we need this? Just want to keep same with `df.show()`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192772009 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +if not self._support_repr_html and self._eager_eval: +vertical = False +return self._jdf.showString( +self._max_num_rows, self._truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you are +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +if self._eager_eval: +max_num_rows = max(self._max_num_rows, 0) +with SCCallSiteSync(self._sc) as css: +vertical = False +sock_info = self._jdf.getRowsToPython( +max_num_rows, self._truncate, vertical) +rows = list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer( +head = rows[0] +row_data = rows[1:] +has_more_data = len(row_data) > max_num_rows +row_data = row_data[0:max_num_rows] + +html = "\n" +# generate table head +html += "".join(map(lambda x: cgi.escape(x), head)) + "\n" +# generate table rows +for row in row_data: +data = "" + "".join(map(lambda x: cgi.escape(x), row)) + \ +"\n" --- End diff -- Thanks, more clearer. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192771951 --- Diff: python/pyspark/sql/tests.py --- @@ -3040,6 +3040,36 @@ def test_csv_sampling_ratio(self): .csv(rdd, samplingRatio=0.5).schema self.assertEquals(schema, StructType([StructField("_c0", IntegerType(), True)])) +def test_repr_html(self): +import re +pattern = re.compile(r'^ *\|', re.MULTILINE) +df = self.spark.createDataFrame([(1, "1"), (2, "2")], ("key", "value")) +self.assertEquals(None, df._repr_html_()) +self.spark.conf.set("spark.sql.repl.eagerEval.enabled", "true") --- End diff -- Thanks, done in next commit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192771831 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +if not self._support_repr_html and self._eager_eval: +vertical = False +return self._jdf.showString( +self._max_num_rows, self._truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you are +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +if self._eager_eval: +max_num_rows = max(self._max_num_rows, 0) +with SCCallSiteSync(self._sc) as css: +vertical = False +sock_info = self._jdf.getRowsToPython( +max_num_rows, self._truncate, vertical) +rows = list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer( +head = rows[0] +row_data = rows[1:] +has_more_data = len(row_data) > max_num_rows +row_data = row_data[0:max_num_rows] + +html = "\n" +# generate table head +html += "".join(map(lambda x: cgi.escape(x), head)) + "\n" --- End diff -- Thanks, more clearer. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192771787 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +if not self._support_repr_html and self._eager_eval: +vertical = False +return self._jdf.showString( +self._max_num_rows, self._truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you are +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +if self._eager_eval: +max_num_rows = max(self._max_num_rows, 0) +with SCCallSiteSync(self._sc) as css: +vertical = False +sock_info = self._jdf.getRowsToPython( +max_num_rows, self._truncate, vertical) +rows = list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer( +head = rows[0] +row_data = rows[1:] +has_more_data = len(row_data) > max_num_rows +row_data = row_data[0:max_num_rows] --- End diff -- Thanks, done in next commit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192771103 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +if not self._support_repr_html and self._eager_eval: +vertical = False +return self._jdf.showString( +self._max_num_rows, self._truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you are +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +if self._eager_eval: +max_num_rows = max(self._max_num_rows, 0) +with SCCallSiteSync(self._sc) as css: --- End diff -- Thanks, delete it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192610559 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +if not self._support_repr_html and self._eager_eval: +vertical = False +return self._jdf.showString( +self._max_num_rows, self._truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you are +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +if self._eager_eval: +max_num_rows = max(self._max_num_rows, 0) +with SCCallSiteSync(self._sc) as css: +vertical = False +sock_info = self._jdf.getRowsToPython( +max_num_rows, self._truncate, vertical) +rows = list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer( +head = rows[0] +row_data = rows[1:] +has_more_data = len(row_data) > max_num_rows +row_data = row_data[0:max_num_rows] + +html = "\n" +# generate table head +html += "".join(map(lambda x: cgi.escape(x), head)) + "\n" +# generate table rows +for row in row_data: +data = "" + "".join(map(lambda x: cgi.escape(x), row)) + \ +"\n" --- End diff -- ditto: ``` "%s\n" % "".join(map(lambda x: cgi.escape(x), row)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192610390 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +if not self._support_repr_html and self._eager_eval: +vertical = False +return self._jdf.showString( +self._max_num_rows, self._truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you are +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +if self._eager_eval: +max_num_rows = max(self._max_num_rows, 0) +with SCCallSiteSync(self._sc) as css: +vertical = False +sock_info = self._jdf.getRowsToPython( +max_num_rows, self._truncate, vertical) +rows = list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer( +head = rows[0] +row_data = rows[1:] +has_more_data = len(row_data) > max_num_rows +row_data = row_data[0:max_num_rows] --- End diff -- tiny nit: `row_data[:max_num_rows]` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192610512 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +if not self._support_repr_html and self._eager_eval: +vertical = False +return self._jdf.showString( +self._max_num_rows, self._truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you are +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +if self._eager_eval: +max_num_rows = max(self._max_num_rows, 0) +with SCCallSiteSync(self._sc) as css: +vertical = False +sock_info = self._jdf.getRowsToPython( +max_num_rows, self._truncate, vertical) +rows = list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer( +head = rows[0] +row_data = rows[1:] +has_more_data = len(row_data) > max_num_rows +row_data = row_data[0:max_num_rows] + +html = "\n" +# generate table head +html += "".join(map(lambda x: cgi.escape(x), head)) + "\n" --- End diff -- maybe: ``` "%s\n" % "".join(map(lambda x: cgi.escape(x), head)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192610308 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +if not self._support_repr_html and self._eager_eval: +vertical = False +return self._jdf.showString( +self._max_num_rows, self._truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you are +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +if self._eager_eval: +max_num_rows = max(self._max_num_rows, 0) +with SCCallSiteSync(self._sc) as css: --- End diff -- `css` seems not used. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192610839 --- Diff: python/pyspark/sql/tests.py --- @@ -3040,6 +3040,36 @@ def test_csv_sampling_ratio(self): .csv(rdd, samplingRatio=0.5).schema self.assertEquals(schema, StructType([StructField("_c0", IntegerType(), True)])) +def test_repr_html(self): +import re +pattern = re.compile(r'^ *\|', re.MULTILINE) +df = self.spark.createDataFrame([(1, "1"), (2, "2")], ("key", "value")) +self.assertEquals(None, df._repr_html_()) +self.spark.conf.set("spark.sql.repl.eagerEval.enabled", "true") --- End diff -- Can we use `with self.sql_conf(...)`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192610620 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +if not self._support_repr_html and self._eager_eval: +vertical = False +return self._jdf.showString( +self._max_num_rows, self._truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you are +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +if self._eager_eval: +max_num_rows = max(self._max_num_rows, 0) +with SCCallSiteSync(self._sc) as css: +vertical = False +sock_info = self._jdf.getRowsToPython( +max_num_rows, self._truncate, vertical) +rows = list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer( +head = rows[0] +row_data = rows[1:] +has_more_data = len(row_data) > max_num_rows +row_data = row_data[0:max_num_rows] + +html = "\n" +# generate table head +html += "".join(map(lambda x: cgi.escape(x), head)) + "\n" +# generate table rows +for row in row_data: +data = "" + "".join(map(lambda x: cgi.escape(x), row)) + \ +"\n" +html += data +html += "\n" +if has_more_data: +html += "only showing top %d %s\n" % ( --- End diff -- I'd just way `row(s)`. Don't have to be super clever on this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192548464 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and REPL you are using supports eager evaluation, +dataframe will be ran automatically. HTML table will feedback the queries user have defined if --- End diff -- The HTML table is generated by `_repr_html_`, isn't Jupyter only term. `_repr_html` is the rich display support for IPython in notebook and Qt console. I think it can be used in other place but currently I just test this in Jupyter. I re-write the doc, please check is it appropriate, thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192548359 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and REPL you are using supports eager evaluation, +dataframe will be ran automatically. HTML table will feedback the queries user have defined if +_repl_html_ called by notebooks like Jupyter, otherwise for plain Python REPL, output --- End diff -- Thanks, done in 5b36604. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192548352 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and REPL you are using supports eager evaluation, +dataframe will be ran automatically. HTML table will feedback the queries user have defined if +_repl_html_ called by notebooks like Jupyter, otherwise for plain Python REPL, output +will be shown like dataframe.show() +(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for more details). + + + + spark.sql.repl.eagerEval.maxNumRows + 20 + +Default number of rows in eager evaluation output HTML table generated by _repr_html_ or plain text, +this only take effect when spark.sql.repl.eagerEval.enabled set to true. --- End diff -- Thanks, done in 5b36604. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192548361 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and REPL you are using supports eager evaluation, --- End diff -- Thanks, done in 5b36604. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192446664 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and REPL you are using supports eager evaluation, +dataframe will be ran automatically. HTML table will feedback the queries user have defined if +_repl_html_ called by notebooks like Jupyter, otherwise for plain Python REPL, output --- End diff -- `output ` -> `the output` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192446542 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and REPL you are using supports eager evaluation, +dataframe will be ran automatically. HTML table will feedback the queries user have defined if +_repl_html_ called by notebooks like Jupyter, otherwise for plain Python REPL, output +will be shown like dataframe.show() +(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for more details). + + + + spark.sql.repl.eagerEval.maxNumRows + 20 + +Default number of rows in eager evaluation output HTML table generated by _repr_html_ or plain text, +this only take effect when spark.sql.repl.eagerEval.enabled set to true. --- End diff -- `set to` -> `is set to` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192446886 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and REPL you are using supports eager evaluation, --- End diff -- `REPL` -> `the REPL` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192447943 --- Diff: docs/configuration.md --- @@ -456,6 +456,33 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and REPL you are using supports eager evaluation, +dataframe will be ran automatically. HTML table will feedback the queries user have defined if --- End diff -- `dataframe` -> `DataFrame/Dataset` What is `HTML table`? Is the term used in Jupyter only? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192349637 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +(eager_eval, console_row, console_truncate) = self._get_repl_config() +if not self._support_repr_html and eager_eval: --- End diff -- Just want to avoid calling `_jdf` twice here, cause the second time called by `__repr__ ` is useless while `_repr_html_` is supported. The return string of `__repr__` will not finally be shown to notebook. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192349210 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +if not self._support_repr_html and self._eager_eval: +vertical = False +return self._jdf.showString( +self._max_num_rows, self._truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you are +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +if self._eager_eval: +max_num_rows = self._max_num_rows --- End diff -- Thanks, done in 7f43a8b. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192349023 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and REPL you are using supports eager evaluation, +dataframe will be ran automatically and html table will feedback the queries user have defined --- End diff -- Sorry for this...again. 7f43a8b --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192349075 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -291,37 +310,30 @@ class Dataset[T] private[sql]( } } + val paddedRows = rows.map { row => +row.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell, colWidths(i)) + } else { +StringUtils.rightPad(cell, colWidths(i)) + } +} + } + // Create SeparateLine val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() // column names - rows.head.zipWithIndex.map { case (cell, i) => -if (truncate > 0) { - StringUtils.leftPad(cell, colWidths(i)) -} else { - StringUtils.rightPad(cell, colWidths(i)) -} - }.addString(sb, "|", "|", "|\n") - + paddedRows.head.addString(sb, "|", "|", "|\n") sb.append(sep) // data - rows.tail.foreach { -_.zipWithIndex.map { case (cell, i) => - if (truncate > 0) { -StringUtils.leftPad(cell.toString, colWidths(i)) - } else { -StringUtils.rightPad(cell.toString, colWidths(i)) - } -}.addString(sb, "|", "|", "|\n") - } - + paddedRows.tail.foreach(_.addString(sb, "|", "|", "|\n")) sb.append(sep) } else { // Extended display mode enabled val fieldNames = rows.head val dataRows = rows.tail - --- End diff -- Thanks, done in 7f43a8b. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192349063 --- Diff: python/pyspark/sql/dataframe.py --- @@ -78,6 +78,7 @@ def __init__(self, jdf, sql_ctx): self.is_cached = False self._schema = None # initialized lazily self._lazy_rdd = None +self._support_repr_html = False --- End diff -- Got it, more comments in 7f43a8b. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192348972 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and REPL you are using supports eager evaluation, +dataframe will be ran automatically and html table will feedback the queries user have defined +(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for more details). + + + + spark.sql.repl.eagerEval.maxNumRows + 20 + +Default number of rows in HTML table. --- End diff -- Got it, more detailed description in 7f43a8b. Please check. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192292200 --- Diff: python/pyspark/sql/dataframe.py --- @@ -78,6 +78,7 @@ def __init__(self, jdf, sql_ctx): self.is_cached = False self._schema = None # initialized lazily self._lazy_rdd = None +self._support_repr_html = False --- End diff -- Shall we explain why we need this (as talked in https://github.com/apache/spark/pull/21370#discussion_r191591799)? It took me a while to understand too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192292453 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and REPL you are using supports eager evaluation, +dataframe will be ran automatically and html table will feedback the queries user have defined +(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for more details). + + + + spark.sql.repl.eagerEval.maxNumRows + 20 + +Default number of rows in HTML table. --- End diff -- Shell we explain a bit more what's the HTML table here a bit more? For example, I think at least we should say it's `_repr_html_`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192292278 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and REPL you are using supports eager evaluation, +dataframe will be ran automatically and html table will feedback the queries user have defined --- End diff -- html -> HTML --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192291854 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -291,37 +310,30 @@ class Dataset[T] private[sql]( } } + val paddedRows = rows.map { row => +row.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell, colWidths(i)) + } else { +StringUtils.rightPad(cell, colWidths(i)) + } +} + } + // Create SeparateLine val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() // column names - rows.head.zipWithIndex.map { case (cell, i) => -if (truncate > 0) { - StringUtils.leftPad(cell, colWidths(i)) -} else { - StringUtils.rightPad(cell, colWidths(i)) -} - }.addString(sb, "|", "|", "|\n") - + paddedRows.head.addString(sb, "|", "|", "|\n") sb.append(sep) // data - rows.tail.foreach { -_.zipWithIndex.map { case (cell, i) => - if (truncate > 0) { -StringUtils.leftPad(cell.toString, colWidths(i)) - } else { -StringUtils.rightPad(cell.toString, colWidths(i)) - } -}.addString(sb, "|", "|", "|\n") - } - + paddedRows.tail.foreach(_.addString(sb, "|", "|", "|\n")) sb.append(sep) } else { // Extended display mode enabled val fieldNames = rows.head val dataRows = rows.tail - --- End diff -- Shall we revive this newline back? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192291498 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +if not self._support_repr_html and self._eager_eval: +vertical = False +return self._jdf.showString( +self._max_num_rows, self._truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you are +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +if self._eager_eval: +max_num_rows = self._max_num_rows --- End diff -- I see. I think it's okay with max(0) only. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192282041 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +if not self._support_repr_html and self._eager_eval: +vertical = False +return self._jdf.showString( +self._max_num_rows, self._truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you are +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +if self._eager_eval: +max_num_rows = self._max_num_rows --- End diff -- Yes, but I do this in scala side `getRowsToPython`. Link here: https://github.com/apache/spark/pull/21370/files/9c6b3bbc430ffbcb752dc9870df877728f356cb8#diff-7a46f10c3cedbf013cf255564d9483cdR3229 This is because during my test, I found python `sys.intmax` actually cast to long with 2 ^ 63 - 1 while scala `Int.MaxValue` is 2 ^ 31 - 1. ![image](https://user-images.githubusercontent.com/4833765/40816707-fb9f1eee-6580-11e8-9a24-9667aadc5177.png) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192209239 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +(eager_eval, console_row, console_truncate) = self._get_repl_config() +if not self._support_repr_html and eager_eval: --- End diff -- I see, thanks. I thinks it's okay, but I'm just curious why you want to restrict it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192207299 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def _eager_eval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def _max_num_rows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def _truncate(self): +"""Returns the truncate length for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", "20")) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +if not self._support_repr_html and self._eager_eval: +vertical = False +return self._jdf.showString( +self._max_num_rows, self._truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you are +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +if self._eager_eval: +max_num_rows = self._max_num_rows --- End diff -- We need to adjust `max_num_rows` as the same as Scala side like `val numRows = _numRows.max(0).min(Int.MaxValue - 1)`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192167547 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -291,37 +289,57 @@ class Dataset[T] private[sql]( } } + rows = rows.map { +_.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell, colWidths(i)) + } else { +StringUtils.rightPad(cell, colWidths(i)) + } +} + } --- End diff -- Thanks, done in 9c6b3bb. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192167463 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def eagerEval(self): --- End diff -- Thanks, done in 9c6b3bb. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192150368 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +(eager_eval, console_row, console_truncate) = self._get_repl_config() +if not self._support_repr_html and eager_eval: --- End diff -- Yes that's right. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r192147588 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -291,37 +289,57 @@ class Dataset[T] private[sql]( } } + rows = rows.map { +_.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell, colWidths(i)) + } else { +StringUtils.rightPad(cell, colWidths(i)) + } +} + } --- End diff -- Oh, I see, the pad rows only useful in console mode, so not need in html code. I'll do this ASAP. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191870129 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +(eager_eval, console_row, console_truncate) = self._get_repl_config() +if not self._support_repr_html and eager_eval: +vertical = False +return self._jdf.showString( +console_row, console_truncate, vertical) --- End diff -- Oh, I see. Good to know. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191869090 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +(eager_eval, console_row, console_truncate) = self._get_repl_config() +if not self._support_repr_html and eager_eval: --- End diff -- So you want to restrict `__repr__` to always return the original string like `"DataFrame[key: bigint, value: string]"` after `_repr_html_` is called? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191854612 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def eagerEval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def maxNumRows(self): --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191854703 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def eagerEval(self): +"""Returns true if the eager evaluation enabled. +""" +return self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" + +@property +def maxNumRows(self): +"""Returns the max row number for eager evaluation. +""" +return int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", "20")) + +@property +def truncate(self): --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191854585 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def eagerEval(self): --- End diff -- Btw, we should use snake case, e.g. `_eager_eval`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191853613 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -291,37 +289,57 @@ class Dataset[T] private[sql]( } } + rows = rows.map { +_.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell, colWidths(i)) + } else { +StringUtils.rightPad(cell, colWidths(i)) + } +} + } --- End diff -- Seems like the truncation is already done when creating `rows` above? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191854114 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +@property +def eagerEval(self): --- End diff -- Maybe we need `_`, e.g. `_eagerEval`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191702754 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -291,37 +289,57 @@ class Dataset[T] private[sql]( } } + rows = rows.map { +_.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell, colWidths(i)) + } else { +StringUtils.rightPad(cell, colWidths(i)) + } +} + } +} +rows + } + + /** + * Compose the string representing rows for output + * + * @param _numRows Number of rows to show + * @param truncate If set to more than 0, truncates strings to `truncate` characters and + * all cells will be aligned right. + * @param vertical If set to true, prints output rows vertically (one line per column value). + */ + private[sql] def showString( + _numRows: Int, + truncate: Int = 20, + vertical: Boolean = false): String = { +val numRows = _numRows.max(0).min(Int.MaxValue - 1) +// Get rows represented by Seq[Seq[String]], we may get one more line if it has more data. +val rows = getRows(numRows, truncate, vertical) +val fieldNames = rows.head +val data = rows.tail + +val hasMoreData = data.length > numRows +val dataRows = data.take(numRows) + +val sb = new StringBuilder +if (!vertical) { // Create SeparateLine - val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() + val sep: String = fieldNames.map(_.length).toArray +.map("-" * _).addString(sb, "+", "+", "+\n").toString() // column names - rows.head.zipWithIndex.map { case (cell, i) => -if (truncate > 0) { - StringUtils.leftPad(cell, colWidths(i)) -} else { - StringUtils.rightPad(cell, colWidths(i)) -} - }.addString(sb, "|", "|", "|\n") - + fieldNames.addString(sb, "|", "|", "|\n") sb.append(sep) // data - rows.tail.foreach { -_.zipWithIndex.map { case (cell, i) => - if (truncate > 0) { -StringUtils.leftPad(cell.toString, colWidths(i)) - } else { -StringUtils.rightPad(cell.toString, colWidths(i)) - } -}.addString(sb, "|", "|", "|\n") + dataRows.foreach { +_.addString(sb, "|", "|", "|\n") --- End diff -- Thanks, done in d4bf01a --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191702826 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -231,16 +234,17 @@ class Dataset[T] private[sql]( } /** - * Compose the string representing rows for output + * Get rows represented in Sequence by specific truncate and vertical requirement. * - * @param _numRows Number of rows to show + * @param numRows Number of rows to return * @param truncate If set to more than 0, truncates strings to `truncate` characters and * all cells will be aligned right. - * @param vertical If set to true, prints output rows vertically (one line per column value). + * @param vertical If set to true, the rows to return don't need truncate. --- End diff -- Yep, all abbreviation done in d4bf01a. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191702931 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) --- End diff -- Done in d4bf01a. Please check. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191702675 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -291,37 +289,57 @@ class Dataset[T] private[sql]( } } + rows = rows.map { +_.zipWithIndex.map { case (cell, i) => --- End diff -- Thanks, done in d4bf01a. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191696389 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -291,37 +289,57 @@ class Dataset[T] private[sql]( } } + rows = rows.map { +_.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell, colWidths(i)) + } else { +StringUtils.rightPad(cell, colWidths(i)) + } +} + } --- End diff -- Do this in getRows here is to reuse the truncate logic. I think its the same problem with we discuss in here: ![image](https://user-images.githubusercontent.com/4833765/40711061-d0762762-642c-11e8-9249-2465ee3e2536.png) If we don't need truncate, we can move the logic and `minimumColWidth` in `showString`. I would like to hear your suggestions. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191694631 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +(eager_eval, console_row, console_truncate) = self._get_repl_config() +if not self._support_repr_html and eager_eval: +vertical = False +return self._jdf.showString( +console_row, console_truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you're --- End diff -- ditto for abbreviation --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191694501 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and repl you're using supports eager evaluation, --- End diff -- ditto for abbreviation `you're`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191694169 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -231,16 +234,17 @@ class Dataset[T] private[sql]( } /** - * Compose the string representing rows for output + * Get rows represented in Sequence by specific truncate and vertical requirement. * - * @param _numRows Number of rows to show + * @param numRows Number of rows to return * @param truncate If set to more than 0, truncates strings to `truncate` characters and * all cells will be aligned right. - * @param vertical If set to true, prints output rows vertically (one line per column value). + * @param vertical If set to true, the rows to return don't need truncate. --- End diff -- I would avoid abbreviation in the documentation. `don't` -> `do not`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191693929 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -291,37 +289,57 @@ class Dataset[T] private[sql]( } } + rows = rows.map { +_.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell, colWidths(i)) + } else { +StringUtils.rightPad(cell, colWidths(i)) + } +} + } +} +rows + } + + /** + * Compose the string representing rows for output + * + * @param _numRows Number of rows to show + * @param truncate If set to more than 0, truncates strings to `truncate` characters and + * all cells will be aligned right. + * @param vertical If set to true, prints output rows vertically (one line per column value). + */ + private[sql] def showString( + _numRows: Int, + truncate: Int = 20, + vertical: Boolean = false): String = { +val numRows = _numRows.max(0).min(Int.MaxValue - 1) +// Get rows represented by Seq[Seq[String]], we may get one more line if it has more data. +val rows = getRows(numRows, truncate, vertical) +val fieldNames = rows.head +val data = rows.tail + +val hasMoreData = data.length > numRows +val dataRows = data.take(numRows) + +val sb = new StringBuilder +if (!vertical) { // Create SeparateLine - val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() + val sep: String = fieldNames.map(_.length).toArray +.map("-" * _).addString(sb, "+", "+", "+\n").toString() // column names - rows.head.zipWithIndex.map { case (cell, i) => -if (truncate > 0) { - StringUtils.leftPad(cell, colWidths(i)) -} else { - StringUtils.rightPad(cell, colWidths(i)) -} - }.addString(sb, "|", "|", "|\n") - + fieldNames.addString(sb, "|", "|", "|\n") sb.append(sep) // data - rows.tail.foreach { -_.zipWithIndex.map { case (cell, i) => - if (truncate > 0) { -StringUtils.leftPad(cell.toString, colWidths(i)) - } else { -StringUtils.rightPad(cell.toString, colWidths(i)) - } -}.addString(sb, "|", "|", "|\n") + dataRows.foreach { +_.addString(sb, "|", "|", "|\n") --- End diff -- nit: we could just make it inlined --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191692934 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -291,37 +289,57 @@ class Dataset[T] private[sql]( } } + rows = rows.map { +_.zipWithIndex.map { case (cell, i) => --- End diff -- nit: ``` rows.map { row => row.zipWithIndex... `` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191687426 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -231,16 +234,17 @@ class Dataset[T] private[sql]( } /** - * Compose the string representing rows for output + * Get rows represented in Sequence by specific truncate and vertical requirement. * - * @param _numRows Number of rows to show + * @param numRows Number of rows to return * @param truncate If set to more than 0, truncates strings to `truncate` characters and * all cells will be aligned right. - * @param vertical If set to true, prints output rows vertically (one line per column value). + * @param vertical If set to true, the rows to return don't need truncate. */ - private[sql] def showString( - _numRows: Int, truncate: Int = 20, vertical: Boolean = false): String = { -val numRows = _numRows.max(0).min(Int.MaxValue - 1) --- End diff -- Yep, thanks, my mistake here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191687183 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +(eager_eval, console_row, console_truncate) = self._get_repl_config() +if not self._support_repr_html and eager_eval: +vertical = False +return self._jdf.showString( +console_row, console_truncate, vertical) --- End diff -- Actually I firstly implement like this but we'll got an TypeError Exception ``` TypeError: __call__() got an unexpected keyword argument 'vertical' ``` The named arguments can't work during python call _jdf func. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191686126 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +(eager_eval, console_row, console_truncate) = self._get_repl_config() +if not self._support_repr_html and eager_eval: --- End diff -- As comment before, this is the flag to check whether \_repr_html\_ is called. ![image](https://user-images.githubusercontent.com/4833765/40709259-2cbf6ede-6428-11e8-8cbe-e14e1450ec31.png) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191685525 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) --- End diff -- Just follow the doc in https://github.com/apache/spark/blob/master/python/pyspark/sql/context.py#L134, but finally we cast it to int, so unicode is useless? I remove it in next commit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191685596 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) --- End diff -- OK. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191594326 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) --- End diff -- Do we need `u` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191594348 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191593987 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +(eager_eval, console_row, console_truncate) = self._get_repl_config() +if not self._support_repr_html and eager_eval: +vertical = False +return self._jdf.showString( +console_row, console_truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you're +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +(eager_eval, console_row, console_truncate) = self._get_repl_config() +if eager_eval: +with SCCallSiteSync(self._sc) as css: +vertical = False +sock_info = self._jdf.getRowsToPython( +console_row, console_truncate, vertical) --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191591921 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) --- End diff -- How about declaring those as `@property`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191591799 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +(eager_eval, console_row, console_truncate) = self._get_repl_config() +if not self._support_repr_html and eager_eval: --- End diff -- What's `_support_repr_html` for? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191593927 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +(eager_eval, console_row, console_truncate) = self._get_repl_config() +if not self._support_repr_html and eager_eval: +vertical = False +return self._jdf.showString( +console_row, console_truncate, vertical) --- End diff -- I guess ```python return self._jdf.showString( console_row, console_truncate, vertical=False) ``` should work without `vertical` variable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191591455 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -291,37 +289,57 @@ class Dataset[T] private[sql]( } } + rows = rows.map { +_.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell, colWidths(i)) + } else { +StringUtils.rightPad(cell, colWidths(i)) + } +} + } --- End diff -- We should do this in `showString`? And we can move `minimumColWidth` into the `showString` in that case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191595442 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -231,16 +234,17 @@ class Dataset[T] private[sql]( } /** - * Compose the string representing rows for output + * Get rows represented in Sequence by specific truncate and vertical requirement. * - * @param _numRows Number of rows to show + * @param numRows Number of rows to return * @param truncate If set to more than 0, truncates strings to `truncate` characters and * all cells will be aligned right. - * @param vertical If set to true, prints output rows vertically (one line per column value). + * @param vertical If set to true, the rows to return don't need truncate. */ - private[sql] def showString( - _numRows: Int, truncate: Int = 20, vertical: Boolean = false): String = { -val numRows = _numRows.max(0).min(Int.MaxValue - 1) --- End diff -- Don't we need to check the `numRows` range when called from `getRowsToPython`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191080316 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and repl you're using supports eager evaluation, +dataframe will be ran automatically and html table will feedback the queries user have defined +(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for more details). + + + + spark.sql.repl.eagerEval.showRows + 20 + +Default number of rows in HTML table. + + + + spark.sql.repl.eagerEval.truncate --- End diff -- Yep, I just want to keep the same behavior of `dataframe.show`. ``` That's useful for console output, but not so much for notebooks. ``` Notebooks aren't afraid for too many chaacters within a cell, so I just delete this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191080194 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -237,9 +238,13 @@ class Dataset[T] private[sql]( * @param truncate If set to more than 0, truncates strings to `truncate` characters and * all cells will be aligned right. * @param vertical If set to true, prints output rows vertically (one line per column value). + * @param html If set to true, return output as html table. --- End diff -- @viirya @gatorsmile @rdblue Sorry for the late commit, the refactor do in 94f3414. I spend some time on testing and implementing the transformation of rows between python and scala. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191080082 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -358,6 +357,43 @@ class Dataset[T] private[sql]( sb.toString() } + /** + * Transform current row string and append to builder + * + * @param row Current row of string + * @param truncate If set to more than 0, truncates strings to `truncate` characters and + *all cells will be aligned right. + * @param colWidths The width of each column + * @param html If set to true, return output as html table. + * @param head Set to true while current row is table head. + * @param sbStringBuilder for current row. + */ + private[sql] def appendRowString( + row: Seq[String], + truncate: Int, + colWidths: Array[Int], + html: Boolean, + head: Boolean, + sb: StringBuilder): Unit = { +val data = row.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell, colWidths(i)) + } else { +StringUtils.rightPad(cell, colWidths(i)) + } +} +(html, head) match { + case (true, true) => +data.map(StringEscapeUtils.escapeHtml).addString( + sb, "", "\n", "\n") --- End diff -- I change the format in python \_repr\_html\_ in 94f3414. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191080049 --- Diff: python/pyspark/sql/dataframe.py --- @@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False): name | Bob """ if isinstance(truncate, bool) and truncate: -print(self._jdf.showString(n, 20, vertical)) +print(self._jdf.showString(n, 20, vertical, False)) else: -print(self._jdf.showString(n, int(truncate), vertical)) +print(self._jdf.showString(n, int(truncate), vertical, False)) --- End diff -- Fix in 94f3414. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191080066 --- Diff: python/pyspark/sql/dataframe.py --- @@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False): name | Bob """ if isinstance(truncate, bool) and truncate: -print(self._jdf.showString(n, 20, vertical)) +print(self._jdf.showString(n, 20, vertical, False)) else: -print(self._jdf.showString(n, int(truncate), vertical)) +print(self._jdf.showString(n, int(truncate), vertical, False)) def __repr__(self): return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by repr you're --- End diff -- Thanks, change to REPL in 94f3414. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191080057 --- Diff: python/pyspark/sql/tests.py --- @@ -3040,6 +3040,50 @@ def test_csv_sampling_ratio(self): .csv(rdd, samplingRatio=0.5).schema self.assertEquals(schema, StructType([StructField("_c0", IntegerType(), True)])) +def _get_content(self, content): +""" +Strips leading spaces from content up to the first '|' in each line. +""" +import re +pattern = re.compile(r'^ *\|', re.MULTILINE) --- End diff -- Thanks! Fix it in 94f3414. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191080044 --- Diff: python/pyspark/sql/dataframe.py --- @@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False): name | Bob """ if isinstance(truncate, bool) and truncate: -print(self._jdf.showString(n, 20, vertical)) +print(self._jdf.showString(n, 20, vertical, False)) --- End diff -- Thanks, fix in 94f3414. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191080037 --- Diff: python/pyspark/sql/dataframe.py --- @@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False): name | Bob """ if isinstance(truncate, bool) and truncate: -print(self._jdf.showString(n, 20, vertical)) +print(self._jdf.showString(n, 20, vertical, False)) else: -print(self._jdf.showString(n, int(truncate), vertical)) +print(self._jdf.showString(n, int(truncate), vertical, False)) def __repr__(self): return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) --- End diff -- Thanks for your reply, this implement in 94f3414. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191080026 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and repl you're using supports eager evaluation, +dataframe will be ran automatically and html table will feedback the queries user have defined +(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for more details). + + + + spark.sql.repl.eagerEval.showRows --- End diff -- Thanks, change it in 94f3414. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r190803873 --- Diff: python/pyspark/sql/dataframe.py --- @@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False): name | Bob """ if isinstance(truncate, bool) and truncate: -print(self._jdf.showString(n, 20, vertical)) +print(self._jdf.showString(n, 20, vertical, False)) else: -print(self._jdf.showString(n, int(truncate), vertical)) +print(self._jdf.showString(n, int(truncate), vertical, False)) --- End diff -- use named arguments for boolean flags --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r190803855 --- Diff: python/pyspark/sql/dataframe.py --- @@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False): name | Bob """ if isinstance(truncate, bool) and truncate: -print(self._jdf.showString(n, 20, vertical)) +print(self._jdf.showString(n, 20, vertical, False)) --- End diff -- use named arguments for boolean flags --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r190803772 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and repl you're using supports eager evaluation, +dataframe will be ran automatically and html table will feedback the queries user have defined +(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for more details). + + + + spark.sql.repl.eagerEval.showRows + 20 + +Default number of rows in HTML table. + + + + spark.sql.repl.eagerEval.truncate --- End diff -- maybe he wants to follow what dataframe.show does, which truncates num characters within a cell. That's useful for console output, but not so much for notebooks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r190803641 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and repl you're using supports eager evaluation, +dataframe will be ran automatically and html table will feedback the queries user have defined +(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for more details). + + + + spark.sql.repl.eagerEval.showRows --- End diff -- maxNumRows --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r190683568 --- Diff: python/pyspark/sql/dataframe.py --- @@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False): name | Bob """ if isinstance(truncate, bool) and truncate: -print(self._jdf.showString(n, 20, vertical)) +print(self._jdf.showString(n, 20, vertical, False)) else: -print(self._jdf.showString(n, int(truncate), vertical)) +print(self._jdf.showString(n, int(truncate), vertical, False)) def __repr__(self): return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) --- End diff -- I agree that it would be better to respect `spark.sql.repr.eagerEval.enabled` here as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r190683035 --- Diff: python/pyspark/sql/dataframe.py --- @@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False): name | Bob """ if isinstance(truncate, bool) and truncate: -print(self._jdf.showString(n, 20, vertical)) +print(self._jdf.showString(n, 20, vertical, False)) else: -print(self._jdf.showString(n, int(truncate), vertical)) +print(self._jdf.showString(n, int(truncate), vertical, False)) def __repr__(self): return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by repr you're --- End diff -- I think it works either way. REPL is better in my opinion because these settings should (ideally) apply when using any REPL. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r190682693 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.sql.repl.eagerEval.enabled + false + +Enable eager evaluation or not. If true and repl you're using supports eager evaluation, +dataframe will be ran automatically and html table will feedback the queries user have defined +(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for more details). + + + + spark.sql.repl.eagerEval.showRows + 20 + +Default number of rows in HTML table. + + + + spark.sql.repl.eagerEval.truncate --- End diff -- What is the difference between this and showRows? Why are there two properties? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org