Re: [PR] [SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context [spark]
MaxGekk commented on PR #43695: URL: https://github.com/apache/spark/pull/43695#issuecomment-1826715119 @cloud-fan Quite the same: ```scala scala> val divCol = lit(1) / lit(0) val divCol: org.apache.spark.sql.Column = `/`(1, 0) scala> spark.range(1).select(divCol).collect() org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22012 == DataFrame == "div" was called from (:1) (:15) .(:1) ``` but when I create it in an object: ```scala scala> object Obj1 { | val divCol = lit(1) / lit(0) | } object Obj1 scala> spark.range(1).select(Obj1.divCol).collect() org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22012 == DataFrame == "div" was called from Obj1$.(:2) Obj1$lzycompute$1(:1) Obj1(:1) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context [spark]
cloud-fan commented on PR #43695: URL: https://github.com/apache/spark/pull/43695#issuecomment-1826686951 @MaxGekk not quite related in this PR, but what if the expression creation is different from the df creation? like ``` val divCol = lit(1) / lit(0) spark.range(1).select(divCol).collect() ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46084][PS][FOLLOWUP] More refactoring by using `create_map` [spark]
itholic commented on PR #44015: URL: https://github.com/apache/spark/pull/44015#issuecomment-1826538856 I believe this is the last refactoring for the current code base of `CategoricalOps`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-46084][PS][FOLLOWUP] More refactoring by using `create_map` [spark]
itholic opened a new pull request, #44015: URL: https://github.com/apache/spark/pull/44015 ### What changes were proposed in this pull request? This PR follows-up for https://github.com/apache/spark/pull/43993 to make more refactoring for `CategoricalOps`. ### Why are the changes needed? To optimize performance/debuggability/readability by using official API ### Does this PR introduce _any_ user-facing change? No, it's internal refactoring ### How was this patch tested? The existing CI should pass. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405234673 ## python/docs/source/conf.py: ## @@ -194,7 +194,11 @@ # further. For a list of options available for each theme, see the # documentation. html_theme_options = { -"navbar_end": ["version-switcher"] +"navbar_end": ["version-switcher", "theme-switcher"], +"logo": { +"image_light": "_static/spark-logo-light.png", +"image_dark": "_static/spark-logo-dark.png", Review Comment: FYI: The default mode for light/dark is `auto`, which will choose a theme based on the system settings from user, but we can specify one of `dark` or `light` as default manually if we want. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405239729 ## dev/requirements.txt: ## @@ -31,12 +31,12 @@ pandas-stubs<1.2.0.54 mkdocs # Documentation (Python) -pydata_sphinx_theme +pydata_sphinx_theme==0.13 ipython nbsphinx numpydoc -jinja2<3.0.0 -sphinx<3.1.0 +jinja2 +sphinx==4.2.0 Review Comment: I see that other Sphinx versions do not generate documentation properly for some reason. I have tested as many combinations as possible with Jinja2 and pydata_sphinx_theme, but I have confirmed that Sphinx version 4.2.0 currently renders documents in the most optimal form. Will investigate further in the future to support the latest Sphinx if necessary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405239729 ## dev/requirements.txt: ## @@ -31,12 +31,12 @@ pandas-stubs<1.2.0.54 mkdocs # Documentation (Python) -pydata_sphinx_theme +pydata_sphinx_theme==0.13 ipython nbsphinx numpydoc -jinja2<3.0.0 -sphinx<3.1.0 +jinja2 +sphinx==4.2.0 Review Comment: I see that other Sphinx versions do not generate documentation properly for some reason. I have tested as many combinations as possible with Jinja2 and pydata_sphinx_theme, but I have confirmed that Sphinx version 4.2.0 currently renders documents in the most optimal form. Will investigate further in the future to support the latest Sphinx If necessary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46100][CORE][SQL][CONNECT][AVRO] Replace the remaining (string|array).size with (string|array).length [spark]
LuciferYang commented on PR #44014: URL: https://github.com/apache/spark/pull/44014#issuecomment-1826508000 https://github.com/LuciferYang/spark/actions/runs/6993380440/job/19026004116 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-46100][CORE][SQL][CONNECT][AVRO] Replace the remaining (string|array).size with (string|array).length [spark]
LuciferYang opened a new pull request, #44014: URL: https://github.com/apache/spark/pull/44014 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on PR #44012: URL: https://github.com/apache/spark/pull/44012#issuecomment-1826501628 Documentation build passed: https://github.com/itholic/spark/actions/runs/6991575840/job/19023836477 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46096][BUILD] Upgrade `sbt` to 1.9.7 [spark]
LuciferYang commented on PR #44008: URL: https://github.com/apache/spark/pull/44008#issuecomment-1826480761 https://github.com/panbingkun/spark/actions/runs/6758634955/job/18370477191 ![image](https://github.com/apache/spark/assets/1475305/938771ba-65b0-4d97-8e85-d249a600e6a6) https://github.com/apache/spark/pull/43079 As I recall, there were some dependencies in the lint task that could not be downloaded, including the MIMA check and some subsequent subtasks, but @panbingkun would know more about the specific details. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46100][CORE][PYTHON] Reduce stack depth by replace (string|array).size with (string|array).length [spark]
beliefer commented on PR #44011: URL: https://github.com/apache/spark/pull/44011#issuecomment-1826478614 @srowen @LuciferYang Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-46090][SQL] Support plan fragment level SQL configs in AQE [spark]
ulysses-you opened a new pull request, #44013: URL: https://github.com/apache/spark/pull/44013 ### What changes were proposed in this pull request? This pr introduces `case class AdaptiveRuleContext(isSubquery: Boolean, isFinalStage: Boolean)` which can be used inside adaptive sql extension rules through thread local, so that developers can modify the next plan fragment configs using `AdaptiveRuleContext.get()`. To make it easy to modify the shuffle partitions config, this pr adds a new extension entrance `postPlannerStrategyRules` which will be applied between `plannerStrategy` and `queryStagePrepRules`, so it can get the whole plan before injecting exchanges. The plan fragment configs can be propagated through multi-phases, e.g., if set a config in `postPlannerStrategyRules` then the config can be gotten in `queryStagePrepRules`, `queryStageOptimizerRules` and `columnarRules`. The configs will be cleanup before going to execute, so in next round the configs will be empty. ### Why are the changes needed? To support modify the plan fragment level SQL configs through AQE rules. ### Does this PR introduce _any_ user-facing change? no, only affect developers. ### How was this patch tested? add new tests ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46097][SQL] Push down limit 1 though Union and Aggregate [spark]
wangyum commented on code in PR #44009: URL: https://github.com/apache/spark/pull/44009#discussion_r140529 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -771,6 +771,17 @@ object LimitPushDown extends Rule[LogicalPlan] { Limit(le, Project(a.aggregateExpressions, LocalLimit(le, a.child))) case Limit(le @ IntegerLiteral(1), p @ Project(_, a: Aggregate)) if a.groupOnly => Limit(le, p.copy(child = Project(a.aggregateExpressions, LocalLimit(le, a.child +// Push down limit 1 though Union and Aggregate +case Limit(le @ IntegerLiteral(1), u: Union) => + val newUnionChildren = u.children.map { +case a: Aggregate if a.groupOnly => Review Comment: We have supported this before: https://github.com/apache/spark/blob/ddb36f9e0151d3bff8b9c8a2e3bb14a6f6ece218/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L769-L773 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation [spark]
HyukjinKwon commented on PR #44012: URL: https://github.com/apache/spark/pull/44012#issuecomment-1826463424 This is a nice improvement! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-42746][SQL] Add the LISTAGG() aggregate function [spark]
Hisoka-X commented on code in PR #42398: URL: https://github.com/apache/spark/pull/42398#discussion_r1405287835 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala: ## @@ -158,6 +158,69 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark } } + test("SPARK-42746: listagg function") { +withTempView("df", "df2") { + Seq(("a", "b"), ("a", "c"), ("b", "c"), ("b", "d"), (null, null)).toDF("a", "b") +.createOrReplaceTempView("df") + checkAnswer( +sql("select listagg(b) from df group by a"), +Row("") :: Row("b,c") :: Row("c,d") :: Nil) + + checkAnswer( +sql("select listagg(b) from df where 1 != 1"), +Row("") :: Nil) + + checkAnswer( +sql("select listagg(b, '|') from df group by a"), +Row("b|c") :: Row("c|d") :: Row("") :: Nil) + + checkAnswer( +sql("SELECT LISTAGG(a) FROM df"), +Row("a,a,b,b") :: Nil) + + checkAnswer( +sql("SELECT LISTAGG(DISTINCT a) FROM df"), +Row("a,b") :: Nil) + + checkAnswer( +sql("SELECT LISTAGG(a) WITHIN GROUP (ORDER BY a) FROM df"), +Row("a,a,b,b") :: Nil) + + checkAnswer( +sql("SELECT LISTAGG(a) WITHIN GROUP (ORDER BY a DESC) FROM df"), +Row("b,b,a,a") :: Nil) + + checkAnswer( +sql("SELECT LISTAGG(a) WITHIN GROUP (ORDER BY a DESC) " + + "OVER (PARTITION BY b) FROM df"), +Row("a") :: Row("b,a") :: Row("b,a") :: Row("b") :: Row("") :: Nil) + + checkAnswer( +sql("SELECT LISTAGG(a) WITHIN GROUP (ORDER BY b) FROM df"), +Row("a,a,b,b") :: Nil) + + checkAnswer( +sql("SELECT LISTAGG(a) WITHIN GROUP (ORDER BY b DESC) FROM df"), Review Comment: Thanks @hopefulnick , sorry for late response, let me fix it today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44209] Expose amount of shuffle data available on the node [spark]
github-actions[bot] commented on PR #42071: URL: https://github.com/apache/spark/pull/42071#issuecomment-1826450649 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44850][CONNECT] Heartbeat in scala's Spark Connect [spark]
github-actions[bot] commented on PR #42538: URL: https://github.com/apache/spark/pull/42538#issuecomment-1826450643 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405240579 ## dev/requirements.txt: ## @@ -31,12 +31,12 @@ pandas-stubs<1.2.0.54 mkdocs # Documentation (Python) -pydata_sphinx_theme +pydata_sphinx_theme==0.13 Review Comment: I followed [the version used in Pandas](https://github.com/pandas-dev/pandas/blob/main/requirements-dev.txt#L64), and actually I believe this version render the document in the most optimal form after doing several version testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405240579 ## dev/requirements.txt: ## @@ -31,12 +31,12 @@ pandas-stubs<1.2.0.54 mkdocs # Documentation (Python) -pydata_sphinx_theme +pydata_sphinx_theme==0.13 Review Comment: I followed [the version used in Pandas](https://github.com/pandas-dev/pandas/blob/main/requirements-dev.txt#L64), and actually I believe this version render the document in the most optimal form. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405240335 ## dev/requirements.txt: ## @@ -31,12 +31,12 @@ pandas-stubs<1.2.0.54 mkdocs # Documentation (Python) -pydata_sphinx_theme +pydata_sphinx_theme==0.13 Review Comment: I followed [the version used in Pandas](https://github.com/pandas-dev/pandas/blob/main/requirements-dev.txt#L64), and actually tested several versions, rendering the document in the most attractive form. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405240335 ## dev/requirements.txt: ## @@ -31,12 +31,12 @@ pandas-stubs<1.2.0.54 mkdocs # Documentation (Python) -pydata_sphinx_theme +pydata_sphinx_theme==0.13 Review Comment: I followed [the version used in Pandas](https://github.com/pandas-dev/pandas/blob/main/requirements-dev.txt#L64), and actually tested several versions, rendering the document in the most attractive form. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405239729 ## dev/requirements.txt: ## @@ -31,12 +31,12 @@ pandas-stubs<1.2.0.54 mkdocs # Documentation (Python) -pydata_sphinx_theme +pydata_sphinx_theme==0.13 ipython nbsphinx numpydoc -jinja2<3.0.0 -sphinx<3.1.0 +jinja2 +sphinx==4.2.0 Review Comment: I tested that other Sphinx versions do not generate documentation properly for some reason. I have tested as many combinations as possible with Jinja2 and pydata_sphinx_theme, but I have confirmed that Sphinx version 4.2.0 currently renders documents in the most optimal form. Will investigate further in the future to support the latest Sphinx If necessary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405238199 ## python/docs/source/reference/pyspark.pandas/frame.rst: ## @@ -319,8 +320,8 @@ specific plotting methods of the form ``DataFrame.plot.``. .. autosummary:: :toctree: api/ + :template: autosummary/accessor_method.rst - DataFrame.plot Review Comment: In newer versions of Sphinx, the build will fail because `DataFrame.plot` and `Series.plot` are determined to be duplicates of the list of functions described below such as `DataFrame.plot.area`, `DataFrame.plot.barh`, `DataFrame.plot.bar`, etc. In fact, this behavior seems reasonable since `.plot` is simply an accessor keyword and not a function, so I believe we can just simply leave it out of the document. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405238199 ## python/docs/source/reference/pyspark.pandas/frame.rst: ## @@ -319,8 +320,8 @@ specific plotting methods of the form ``DataFrame.plot.``. .. autosummary:: :toctree: api/ + :template: autosummary/accessor_method.rst - DataFrame.plot Review Comment: In newer versions of Sphinx, the build will fail because `DataFrame.plot` and `Series.plot` are determined to be duplicates of the list of functions described below. In fact, this behavior seems reasonable since `.plot` is simply an accessor keyword and not a function, so I believe we can just simply leave it out of the document. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405236858 ## python/docs/source/_templates/autosummary/class_with_docs.rst: ## @@ -47,7 +47,9 @@ .. autosummary:: {% for item in attributes %} - ~{{ name }}.{{ item }} +{% if not (item == 'uid') %} Review Comment: We should manually exclude `uid` from documentation because it is an internal property. We don't include them our current documentation as well, but for some reason newer Sphinx version trying to generate the internal property unexpectedly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405236561 ## python/docs/source/_templates/autosummary/class_with_docs.rst: ## @@ -47,7 +47,9 @@ .. autosummary:: {% for item in attributes %} - ~{{ name }}.{{ item }} +{% if not (item == 'uid') %} Review Comment: We should manually exclude `uid` from documentation because it is an internal property. We don't include them our current documentation as well, but for some reason newer Sphinx version trying to generate the internal property unexpectedly. ## python/docs/source/_templates/autosummary/class_with_docs.rst: ## @@ -47,7 +47,9 @@ .. autosummary:: {% for item in attributes %} - ~{{ name }}.{{ item }} +{% if not (item == 'uid') %} Review Comment: We should manually exclude `uid` from documentation because it is an internal property. We don't include them our current documentation as well, but for some reason newer Sphinx version trying to generate the internal property unexpectedly. ## python/docs/source/_templates/autosummary/class_with_docs.rst: ## @@ -47,7 +47,9 @@ .. autosummary:: {% for item in attributes %} - ~{{ name }}.{{ item }} +{% if not (item == 'uid') %} Review Comment: We should manually exclude uid from documentation because it is an internal property. We don't include them our current documentation as well, but for some reason newer Sphinx version trying to generate the internal property unexpectedly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405236561 ## python/docs/source/_templates/autosummary/class_with_docs.rst: ## @@ -47,7 +47,9 @@ .. autosummary:: {% for item in attributes %} - ~{{ name }}.{{ item }} +{% if not (item == 'uid') %} Review Comment: We should manually exclude `uid` from documentation because it is an internal property. We don't include them our current documentation as well, but for some reason newer Sphinx version trying to generate the internal property which is not correct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405236561 ## python/docs/source/_templates/autosummary/class_with_docs.rst: ## @@ -47,7 +47,9 @@ .. autosummary:: {% for item in attributes %} - ~{{ name }}.{{ item }} +{% if not (item == 'uid') %} Review Comment: We should manually exclude `uid` from documentation because it is an internal property. We don't include them our current documentation as well, but for some reason newer Sphinx version trying to generate the internal property. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405234673 ## python/docs/source/conf.py: ## @@ -194,7 +194,11 @@ # further. For a list of options available for each theme, see the # documentation. html_theme_options = { -"navbar_end": ["version-switcher"] +"navbar_end": ["version-switcher", "theme-switcher"], +"logo": { +"image_light": "_static/spark-logo-light.png", +"image_dark": "_static/spark-logo-dark.png", Review Comment: FYI: The default mode for light/dark is `auto`, which will choose a theme based on the system settings from user, but we can specify one of `dark` or `light` as default manually. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405234673 ## python/docs/source/conf.py: ## @@ -194,7 +194,11 @@ # further. For a list of options available for each theme, see the # documentation. html_theme_options = { -"navbar_end": ["version-switcher"] +"navbar_end": ["version-switcher", "theme-switcher"], +"logo": { +"image_light": "_static/spark-logo-light.png", +"image_dark": "_static/spark-logo-dark.png", Review Comment: FYI: The default mode for light/dark is `auto`, which will choose a theme based on the system settings from user, but we can choose one of `dark` or `light` as default manually. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405233405 ## python/docs/source/reference/pyspark.pandas/frame.rst: ## @@ -299,6 +299,7 @@ in Spark. These can be accessed by ``DataFrame.spark.``. .. autosummary:: :toctree: api/ + :template: autosummary/accessor_method.rst Review Comment: For example, previously the rst file was created as follows: ```rst pyspark.sql.SparkSession.builder.appName .. currentmodule:: pyspark.sql.SparkSession .. automethod:: builder.appName ``` However, in newer Sphinx versions it is generated like this: ```rst pyspark.sql.SparkSession.builder.appName .. currentmodule:: pyspark.sql .. automethod:: SparkSession.builder.appName ``` In the case of functions used through internal classes or accessors like this, the package paths created in a new way will cause Sphinx build to fail. That's why we should use the customized template to correct the module path. See also https://github.com/sphinx-doc/sphinx/issues/7551. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405232095 ## python/docs/source/reference/pyspark.pandas/frame.rst: ## @@ -299,6 +299,7 @@ in Spark. These can be accessed by ``DataFrame.spark.``. .. autosummary:: :toctree: api/ + :template: autosummary/accessor_method.rst Review Comment: For example, previously the rst file was created as follows: ```rst pyspark.sql.SparkSession.builder.appName .. currentmodule:: pyspark.sql.SparkSession .. automethod:: builder.appName ``` However, in newer Sphinx versions it is generated like this: ```rst pyspark.sql.SparkSession.builder.appName .. currentmodule:: pyspark.sql .. automethod:: SparkSession.builder.appName ``` In the case of functions used through internal classes or accessors like this, the package paths created in a new way will cause Sphinx build to fail. See also https://github.com/sphinx-doc/sphinx/issues/7551. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405232095 ## python/docs/source/reference/pyspark.pandas/frame.rst: ## @@ -299,6 +299,7 @@ in Spark. These can be accessed by ``DataFrame.spark.``. .. autosummary:: :toctree: api/ + :template: autosummary/accessor_method.rst Review Comment: For example, previously the rst file was created as follows: ```rst pyspark.sql.SparkSession.builder.appName .. currentmodule:: pyspark.sql.SparkSession .. automethod:: builder.appName ``` However, in newer Sphinx versions it is generated like this: ```rst pyspark.sql.SparkSession.builder.appName .. currentmodule:: pyspark.sql .. automethod:: SparkSession.builder.appName ``` In the case of functions used through internal classes or accessors like this, the package paths created in a new way will cause Sphinx build to fail. See https://github.com/sphinx-doc/sphinx/issues/7551 related issue. Therefore, we should use the customized template such as Pandas does. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405232095 ## python/docs/source/reference/pyspark.pandas/frame.rst: ## @@ -299,6 +299,7 @@ in Spark. These can be accessed by ``DataFrame.spark.``. .. autosummary:: :toctree: api/ + :template: autosummary/accessor_method.rst Review Comment: For example, previously the rst file was created as follows: ```rst pyspark.sql.SparkSession.builder.appName .. currentmodule:: pyspark.sql.SparkSession .. automethod:: builder.appName ``` However, in newer Sphinx versions it is generated like this: ```rst pyspark.sql.SparkSession.builder.appName .. currentmodule:: pyspark.sql .. automethod:: SparkSession.builder.appName ``` In the case of functions used through internal classes or accessors like this, the package paths created in a new way will cause Sphinx build to fail. See also https://github.com/sphinx-doc/sphinx/issues/7551. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405232095 ## python/docs/source/reference/pyspark.pandas/frame.rst: ## @@ -299,6 +299,7 @@ in Spark. These can be accessed by ``DataFrame.spark.``. .. autosummary:: :toctree: api/ + :template: autosummary/accessor_method.rst Review Comment: For example, previously the rst file was created as follows: ```rst pyspark.sql.SparkSession.builder.appName .. currentmodule:: pyspark.sql.SparkSession .. automethod:: builder.appName ``` However, in newer Sphinx versions it is generated like this: ```rst pyspark.sql.SparkSession.builder.appName .. currentmodule:: pyspark.sql .. automethod:: SparkSession.builder.appName ``` In the case of functions used through internal classes or accessors like this, the package paths created in a new way will cause Sphinx build to fail. See https://github.com/sphinx-doc/sphinx/issues/7551 related issue. Therefore, I modified the path to be customized for these cases using a custom template such as Pandas does. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405232095 ## python/docs/source/reference/pyspark.pandas/frame.rst: ## @@ -299,6 +299,7 @@ in Spark. These can be accessed by ``DataFrame.spark.``. .. autosummary:: :toctree: api/ + :template: autosummary/accessor_method.rst Review Comment: For example, previously the rst file was created as follows: ```rst pyspark.sql.SparkSession.builder.appName .. currentmodule:: pyspark.sql.SparkSession .. automethod:: builder.appName ``` However, in newer Sphinx versions it is generated like this: ```rst pyspark.sql.SparkSession.builder.appName .. currentmodule:: pyspark.sql .. automethod:: SparkSession.builder.appName ``` In the case of functions used through internal classes or accessors like this, the package paths created in a new way will cause Sphinx build to fail. Therefore, I modified the path to be customized for these cases using a custom template such as Pandas does. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405232095 ## python/docs/source/reference/pyspark.pandas/frame.rst: ## @@ -299,6 +299,7 @@ in Spark. These can be accessed by ``DataFrame.spark.``. .. autosummary:: :toctree: api/ + :template: autosummary/accessor_method.rst Review Comment: For example, previously the rst file was created as follows: ```rst pyspark.sql.SparkSession.builder.appName .. currentmodule:: pyspark.sql.SparkSession .. automethod:: builder.appName ``` However, in newer Sphinx versions it is generated like this: ```rst pyspark.sql.SparkSession.builder.appName .. currentmodule:: pyspark.sql .. automethod:: SparkSession.builder.appName ``` In the case of functions used through internal classes or accessors like this, the package paths created in a new way will cause Sphinx build to fail. Therefore, I modified the path to be customized for these cases using a custom template such as Pandas does. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]
itholic commented on code in PR #44012: URL: https://github.com/apache/spark/pull/44012#discussion_r1405231062 ## python/docs/source/reference/pyspark.pandas/frame.rst: ## @@ -299,6 +299,7 @@ in Spark. These can be accessed by ``DataFrame.spark.``. .. autosummary:: :toctree: api/ + :template: autosummary/accessor_method.rst Review Comment: With the new version of Sphinx, the package name creation rules for `rst` files that are automatically created when building documents have changed, so we must manually adjust the package path using these templates. This behavior is [used in the same way in Pandas](https://github.com/pandas-dev/pandas/tree/main/doc/_templates/autosummary), so I referred to it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46100][CORE][PYTHON] Reduce stack depth by replace (string|array).size with (string|array).length [spark]
srowen commented on PR #44011: URL: https://github.com/apache/spark/pull/44011#issuecomment-1826415496 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46100][CORE][PYTHON] Reduce stack depth by replace (string|array).size with (string|array).length [spark]
srowen closed pull request #44011: [SPARK-46100][CORE][PYTHON] Reduce stack depth by replace (string|array).size with (string|array).length URL: https://github.com/apache/spark/pull/44011 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46096][BUILD] Upgrade `sbt` to 1.9.7 [spark]
dongjoon-hyun commented on PR #44008: URL: https://github.com/apache/spark/pull/44008#issuecomment-1826410068 Is that a MIMA issue? https://github.com/apache/spark/assets/9700541/ef55b97a-c13a-49eb-ad7b-987cd74e93b3;> From GitHub Log, only `mima` complains while the others are green. https://github.com/apache/spark/assets/9700541/472800a2-f210-46a1-9cfb-581eec50f18c;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46075][CONNECT] Improvements to SparkConnectSessionManager [spark]
rangadi commented on code in PR #43985: URL: https://github.com/apache/spark/pull/43985#discussion_r1405191450 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -47,9 +47,19 @@ case class SessionKey(userId: String, sessionId: String) case class SessionHolder(userId: String, sessionId: String, session: SparkSession) extends Logging { - @volatile private var lastRpcAccessTime: Option[Long] = None + // Time when the session was started. + private val startTime: Long = System.currentTimeMillis() Review Comment: Convention is to use unit suffix (`startTimeMs` for milliseconds here) when the type is long. `Time` is ok if this was a type like `Instant`. ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala: ## @@ -95,47 +92,134 @@ class SparkConnectSessionManager extends Logging { Option(getSession(key, None)) } - private def getSession( - key: SessionKey, - default: Option[Callable[SessionHolder]]): SessionHolder = { -val session = default match { - case Some(callable) => sessionStore.get(key, callable) - case None => sessionStore.getIfPresent(key) -} -// record access time before returning -session match { - case null => -null - case s: SessionHolder => -s.updateAccessTime() -s + private def getSession(key: SessionKey, default: Option[() => SessionHolder]): SessionHolder = { +sessionsLock.synchronized { + // try to get existing session from store + val sessionOpt = sessionStore.get(key) + // create using default if missing + val session = sessionOpt match { +case Some(s) => s +case None => + default match { +case Some(callable) => + val session = callable() + sessionStore.put(key, session) + session +case None => + null + } + } + // record access time before returning + session match { +case null => + null +case s: SessionHolder => + s.updateAccessTime() + s + } } } def closeSession(key: SessionKey): Unit = { -// Invalidate will trigger RemoveSessionListener -sessionStore.invalidate(key) - } - - private class RemoveSessionListener extends RemovalListener[SessionKey, SessionHolder] { -override def onRemoval(notification: RemovalNotification[SessionKey, SessionHolder]): Unit = { - val sessionHolder = notification.getValue - sessionsLock.synchronized { +var sessionHolder: Option[SessionHolder] = None +sessionsLock.synchronized { + sessionHolder = sessionStore.remove(key) + sessionHolder.foreach { s => // First put into closedSessionsCache, so that it cannot get accidentally recreated by // getOrCreateIsolatedSession. -closedSessionsCache.put(sessionHolder.key, sessionHolder.getSessionHolderInfo) +closedSessionsCache.put(s.key, s.getSessionHolderInfo) } - // Rest of the cleanup outside sessionLock - the session cannot be accessed anymore by - // getOrCreateIsolatedSession. - sessionHolder.close() +} +// Rest of the cleanup outside sessionLock - the session cannot be accessed anymore by +// getOrCreateIsolatedSession. +sessionHolder.foreach { s => + s.close() + // Update in closedSessionsCache: above it wasn't updated with closedTime etc. yet. Review Comment: Any concerns if `close()` above throws? Btw, when do we clean up `closedSessionsCache`? ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -186,7 +196,13 @@ case class SessionHolder(userId: String, sessionId: String, session: SparkSessio def classloader: ClassLoader = artifactManager.classloader private[connect] def updateAccessTime(): Unit = { -lastRpcAccessTime = Some(System.currentTimeMillis()) +lastAccessTime = System.currentTimeMillis() +logInfo(s"Session $key accessed, time $lastAccessTime.") Review Comment: Temporary log? ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -205,14 +220,26 @@ case class SessionHolder(userId: String, sessionId: String, session: SparkSessio * Called only by SparkConnectSessionManager. */ private[connect] def close(): Unit = { +// It is called only by SparkConnectSessionManager.closeSession, only after it's removed from +// the sessionStore there guarantees that it is called only once. +if (closedTime.isDefined) { + throw new IllegalStateException(s"Session $key is already closed.") +} + logInfo(s"Closing session with userId: $userId and sessionId: $sessionId") -// After isClosing=true,
Re: [PR] [SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context [spark]
MaxGekk commented on PR #43695: URL: https://github.com/apache/spark/pull/43695#issuecomment-1826388775 > Can we show the impact to the real error message @cloud-fan I added an example, please, take a look at the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45943][SQL] Add a tag to mark resolved subquery plan for ResolveSubquery [spark]
cloud-fan commented on PR #43867: URL: https://github.com/apache/spark/pull/43867#issuecomment-1826351872 > This may also affect other Rules. Since HiveTableRelation is not resolved, the Project.projectList of the parent plan will not be resolved. Good point. Why is `DetermineTableStats` a post-hoc resolution rule then... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1405122383 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -175,6 +178,23 @@ abstract class OffsetWindowFunctionFrameBase( } } + override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { +resetStates(rows) Review Comment: so the fix is always call `resetStates`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45760][SQL][FOLLOWUP] Inline With inside conditional branches [spark]
cloud-fan commented on code in PR #43978: URL: https://github.com/apache/spark/pull/43978#discussion_r1405081903 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpression.scala: ## @@ -35,56 +35,82 @@ object RewriteWithExpression extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = { plan.transformWithPruning(_.containsPattern(WITH_EXPRESSION)) { case p if p.expressions.exists(_.containsPattern(WITH_EXPRESSION)) => -var newChildren = p.children -var newPlan: LogicalPlan = p.transformExpressionsUp { - case With(child, defs) => -val refToExpr = mutable.HashMap.empty[Long, Expression] -val childProjections = Array.fill(newChildren.size)(mutable.ArrayBuffer.empty[Alias]) +val inputPlans = p.children.toArray +var newPlan: LogicalPlan = p.mapExpressions { expr => + rewriteWithExprAndInputPlans(expr, inputPlans) +} +newPlan = newPlan.withNewChildren(inputPlans.toIndexedSeq) +if (p.output == newPlan.output) { + newPlan +} else { + Project(p.output, newPlan) +} +} + } -defs.zipWithIndex.foreach { case (CommonExpressionDef(child, id), index) => - if (CollapseProject.isCheap(child)) { -refToExpr(id) = child - } else { -val childProjectionIndex = newChildren.indexWhere( - c => child.references.subsetOf(c.outputSet) -) -if (childProjectionIndex == -1) { - // When we cannot rewrite the common expressions, force to inline them so that the - // query can still run. This can happen if the join condition contains `With` and - // the common expression references columns from both join sides. - // TODO: things can go wrong if the common expression is nondeterministic. We - // don't fix it for now to match the old buggy behavior when certain - // `RuntimeReplaceable` did not use the `With` expression. - // TODO: we should calculate the ref count and also inline the common expression - // if it's ref count is 1. - refToExpr(id) = child -} else { - val alias = Alias(child, s"_common_expr_$index")() - childProjections(childProjectionIndex) += alias - refToExpr(id) = alias.toAttribute -} - } -} + private def rewriteWithExprAndInputPlans( + e: Expression, + inputPlans: Array[LogicalPlan]): Expression = { +if (!e.containsPattern(WITH_EXPRESSION)) return e +e match { + case w: With => +// Rewrite nested With expression in CommonExpressionDef first. +val defs = w.defs.map(rewriteWithExprAndInputPlans(_, inputPlans)) Review Comment: oh correlated nested `With`! I'm not sure if we want to support it or not... But at least we should fail if we don't want to support it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46036][SQL] Removing error-class from raise_error function [spark]
cloud-fan commented on code in PR #44004: URL: https://github.com/apache/spark/pull/44004#discussion_r1405045604 ## sql/core/src/test/resources/sql-tests/inputs/misc-functions.sql: ## @@ -21,21 +21,30 @@ CREATE TEMPORARY VIEW tbl_misc AS SELECT * FROM (VALUES (1), (8), (2)) AS T(v); SELECT raise_error('error message'); SELECT if(v > 5, raise_error('too big: ' || v), v + 1) FROM tbl_misc; -SELECT raise_error('VIEW_NOT_FOUND', Map('relationName', '`v`')); --- Error class is case insensitive -SELECT raise_error('VIEW_NOT_FOund', Map('relationName', '`v`')); --- parameters are case sensitive -SELECT raise_error('VIEW_NOT_FOund', Map('relationNAME', '`v`')); --- Too few parameters -SELECT raise_error('VIEW_NOT_FOUND', Map()); -- Too many parameters -SELECT raise_error('VIEW_NOT_FOUND', Map('relationName', '`v`', 'totallymadeup', '5')); +SELECT raise_error('error message', Map()); + +-- Too many parameters +SELECT raise_error('error message', 'some args'); + +-- Too few parameters +SELECT raise_error(); + +-- Passing null as message +SELECT raise_error(NULL); + +-- Passing non-string type +SELECT raise_error(1); --- Empty parameter list -SELECT raise_error('ALL_PARTITION_COLUMNS_NOT_ALLOWED', Map()); -SELECT raise_error('ALL_PARTITION_COLUMNS_NOT_ALLOWED', NULL); +-- Passing expression +SELECT raise_error(1 + 1); -SELECT raise_error(NULL, NULL); +-- TODO: Need feedback on proper behaviour here: Review Comment: The implicit cast does not support converting complex type to string type. Actually we can skip these 3 tests as implicit cast is orthogonal to the raise_error function -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46036][SQL] Removing error-class from raise_error function [spark]
cloud-fan commented on PR #44004: URL: https://github.com/apache/spark/pull/44004#issuecomment-1826332136 cc @srielau -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46036][SQL] Removing error-class from raise_error function [spark]
cloud-fan commented on code in PR #44004: URL: https://github.com/apache/spark/pull/44004#discussion_r1405037748 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala: ## @@ -139,6 +128,31 @@ object RaiseError { new RaiseError(errorClass, parms) } +/** + * Throw with the result of an expression (used for debugging). + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_( expr ) - Throws a USER_RAISED_EXCEPTION with `expr` as message.", + examples = """ +Examples: + > SELECT _FUNC_('custom error message'); + [USER_RAISED_EXCEPTION] custom error message + """, + since = "3.1.0", + group = "misc_funcs") +// scalastyle:on line.size.limit +object RaiseErrorExpressionBuilder extends ExpressionBuilder { + override def build(funcName: String, expressions: Seq[Expression]): Expression = { +// for some reason pattern matching doesn't work here... Review Comment: I think explicit length check is better than pattern match here. We don't need this comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46091][KUBERNETES] Respect the defined container SPARK_LOCAL_DIRS env [spark]
turboFei commented on PR #44005: URL: https://github.com/apache/spark/pull/44005#issuecomment-1826300772 gentle ping @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46036][SQL] Removing error-class from raise_error function [spark]
beliefer commented on code in PR #44004: URL: https://github.com/apache/spark/pull/44004#discussion_r1404849612 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala: ## @@ -139,6 +128,31 @@ object RaiseError { new RaiseError(errorClass, parms) } +/** + * Throw with the result of an expression (used for debugging). + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_( expr ) - Throws a USER_RAISED_EXCEPTION with `expr` as message.", + examples = """ +Examples: + > SELECT _FUNC_('custom error message'); + [USER_RAISED_EXCEPTION] custom error message + """, + since = "3.1.0", + group = "misc_funcs") +// scalastyle:on line.size.limit +object RaiseErrorExpressionBuilder extends ExpressionBuilder { Review Comment: `RaiseError` works good, why you create the builder? ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -3269,14 +3269,6 @@ object functions { */ def raise_error(c: Column): Column = Column.fn("raise_error", c) - /** - * Throws an exception with the provided error class and parameter map. - * - * @group misc_funcs - * @since 4.0.0 - */ - def raise_error(c: Column, e: Column): Column = Column.fn("raise_error", c, e) Review Comment: It seems this API added no long. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46100][CORE][PYTHON] Reduce stack depth by replace (string|array).size with (string|array).length [spark]
beliefer commented on PR #44011: URL: https://github.com/apache/spark/pull/44011#issuecomment-1826279786 > Is this all the similar cases in the project? I think we can fix all cases in one PR :) It's very hard. I fix the cases in core module paid two hours. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46097][SQL] Push down limit 1 though Union and Aggregate [spark]
beliefer commented on code in PR #44009: URL: https://github.com/apache/spark/pull/44009#discussion_r1404844214 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LimitPushdownSuite.scala: ## @@ -352,4 +352,21 @@ class LimitPushdownSuite extends PlanTest { comparePlans(Optimize.execute(originalQuery2), originalQuery2) } } + + test("SPARK-46097: Push down limit 1 though Union and Aggregate") { +val unionQuery = Union( + Union( +testRelation.groupBy($"a", $"b")($"a", $"b"), +testRelation2.groupBy($"d", $"e")($"d", $"e"), + ), + testRelation2.groupBy($"e", $"f")($"e", $"f")).limit(1) + +val correctAnswer = Union( + Union( +LocalLimit(1, testRelation).select($"a", $"b"), +LocalLimit(1, testRelation2).select($"d", $"e")).limit(1), + LocalLimit(1, testRelation2).select($"e", $"f")).limit(1) + +comparePlans(Optimize.execute(unionQuery.analyze), correctAnswer.analyze) Review Comment: This test case only ensure the logical plan. Could you add a test case to compare the output data? ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -771,6 +771,17 @@ object LimitPushDown extends Rule[LogicalPlan] { Limit(le, Project(a.aggregateExpressions, LocalLimit(le, a.child))) case Limit(le @ IntegerLiteral(1), p @ Project(_, a: Aggregate)) if a.groupOnly => Limit(le, p.copy(child = Project(a.aggregateExpressions, LocalLimit(le, a.child +// Push down limit 1 though Union and Aggregate +case Limit(le @ IntegerLiteral(1), u: Union) => + val newUnionChildren = u.children.map { +case a: Aggregate if a.groupOnly => Review Comment: I doubt that the output will be changed. `limit 1 after shuffle` is not the same as `limit 1 before shuffle` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46100][CORE][PYTHON] Reduce stack depth by replace (string|array).size with (string|array).length [spark]
LuciferYang commented on PR #44011: URL: https://github.com/apache/spark/pull/44011#issuecomment-1826278609 Is this all the similar cases in the project? I think we can fix all cases in one PR :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46096][BUILD] Upgrade `sbt` to 1.9.7 [spark]
LuciferYang commented on PR #44008: URL: https://github.com/apache/spark/pull/44008#issuecomment-1826278081 IIRC, Did @panbingkun encounter some issues that have not been resolved when he trying to upgrade this version before? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45954][SQL] Remove redundant shuffles [spark]
beliefer commented on code in PR #43841: URL: https://github.com/apache/spark/pull/43841#discussion_r1404840863 ## sql/core/src/test/scala/org/apache/spark/sql/execution/RemoveRedundantShufflesSuite.scala: ## @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.{DataFrame, QueryTest} +import org.apache.spark.sql.execution.adaptive.{AdaptiveSparkPlanHelper, DisableAdaptiveExecutionSuite, EnableAdaptiveExecutionSuite} +import org.apache.spark.sql.execution.exchange.ShuffleExchangeExec +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.test.SharedSparkSession + +abstract class RemoveRedundantShufflesSuiteBase +extends QueryTest +with SharedSparkSession +with AdaptiveSparkPlanHelper { + import testImplicits._ + + private def getShuffleExchangeNum(df: DataFrame): Int = { +collect(df.queryExecution.executedPlan) { + case s: ShuffleExchangeExec => s +}.size + } + + test("Remove redundant shuffle") { +withTempView("t1", "t2") { + spark.range(10).select($"id").createOrReplaceTempView("t1") + spark.range(20).select($"id").createOrReplaceTempView("t2") + Seq(-1, 100).foreach { threshold => +withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> threshold.toString) { + val query = spark.table("t1").repartition(10).join(spark.table("t2"), "id") + query.collect() + val shuffleNum = getShuffleExchangeNum(query) + if (threshold > 0) { Review Comment: The test condition looks not good. How about `threshold == 100`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-46099][PS][DOCS] Refactor "Supported pandas API" generation script [spark]
itholic opened a new pull request, #44010: URL: https://github.com/apache/spark/pull/44010 ### What changes were proposed in this pull request? This PR proposes to refactor the script used to generate the [Supported pandas API](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/supported_pandas_api.html) documentation. The script has been restructured for better readability and maintainability. The refactoring includes: - Simplifying complex functions and breaking them into smaller, more manageable pieces. - Improving variable and function naming for clarity. - Adding comprehensive docstrings in the NumPy docstyle. - Streamlining the flow of the script to enhance logical coherence. ### Why are the changes needed? The previous version of the script was hard to understand and maintain due to its complexity and lack of documentation. This refactoring makes the script more accessible to new contributors and easier to modify or extend in the future. It also ensures that the script adheres to best practices in Python coding, making it a more reliable tool for generating accurate and up-to-date documentation. ### Does this PR introduce _any_ user-facing change? No user-facing changes. This PR only affects the internal documentation generation process. ### How was this patch tested? Tested by generating the documentation manually and verifying that the output remains consistent with the previous version. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org