[jira] [Resolved] (SPARK-48627) Perf improvement for binary to HEX_DISCRETE strings
[ https://issues.apache.org/jira/browse/SPARK-48627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48627. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46984 [https://github.com/apache/spark/pull/46984] > Perf improvement for binary to HEX_DISCRETE strings > --- > > Key: SPARK-48627 > URL: https://issues.apache.org/jira/browse/SPARK-48627 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > +OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5 > +Apple M2 Max > +Cardinality 10: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > + > +Spark 42210 43595 > 1207 0.0 422102.9 1.0X > +Java 238 243 > 2 0.4 2381.9 177.2X {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48627) Perf improvement for binary to HEX_DISCRETE strings
[ https://issues.apache.org/jira/browse/SPARK-48627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48627: Assignee: Kent Yao > Perf improvement for binary to HEX_DISCRETE strings > --- > > Key: SPARK-48627 > URL: https://issues.apache.org/jira/browse/SPARK-48627 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > > {code:java} > +OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5 > +Apple M2 Max > +Cardinality 10: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > + > +Spark 42210 43595 > 1207 0.0 422102.9 1.0X > +Java 238 243 > 2 0.4 2381.9 177.2X {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48577) Replace invalid byte sequences in UTF8Strings
[ https://issues.apache.org/jira/browse/SPARK-48577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48577. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46899 [https://github.com/apache/spark/pull/46899] > Replace invalid byte sequences in UTF8Strings > - > > Key: SPARK-48577 > URL: https://issues.apache.org/jira/browse/SPARK-48577 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48633) Upgrade scalacheck to 1.18.0
[ https://issues.apache.org/jira/browse/SPARK-48633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48633. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46992 [https://github.com/apache/spark/pull/46992] > Upgrade scalacheck to 1.18.0 > > > Key: SPARK-48633 > URL: https://issues.apache.org/jira/browse/SPARK-48633 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Wei Guo >Assignee: Wei Guo >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48587) Avoid storage amplification when accessing sub-Variant
[ https://issues.apache.org/jira/browse/SPARK-48587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48587. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46941 [https://github.com/apache/spark/pull/46941] > Avoid storage amplification when accessing sub-Variant > -- > > Key: SPARK-48587 > URL: https://issues.apache.org/jira/browse/SPARK-48587 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: David Cashman >Assignee: David Cashman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When a variant_get expression returns a Variant, or a nested type containing > Variant, we just return the sub-slice of the Variant value along with the > full metadata, even though most of the metadata is probably unnecessary to > represent the value. This may be very inefficient if the value is then > written to disk (e.g. shuffle file or parquet). We should instead rebuild the > value with minimal metadata. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48640) Perf improvement for format hex from byte array
[ https://issues.apache.org/jira/browse/SPARK-48640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48640: - Summary: Perf improvement for format hex from byte array (was: Perf improvement for format hex) > Perf improvement for format hex from byte array > --- > > Key: SPARK-48640 > URL: https://issues.apache.org/jira/browse/SPARK-48640 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48640) Perf improvement for format hex
[ https://issues.apache.org/jira/browse/SPARK-48640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48640: - Priority: Minor (was: Critical) > Perf improvement for format hex > --- > > Key: SPARK-48640 > URL: https://issues.apache.org/jira/browse/SPARK-48640 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48640) Perf improvement for format hex
[ https://issues.apache.org/jira/browse/SPARK-48640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48640: - Parent: SPARK-48624 Issue Type: Sub-task (was: Improvement) > Perf improvement for format hex > --- > > Key: SPARK-48640 > URL: https://issues.apache.org/jira/browse/SPARK-48640 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48640) Perf improvement for format hex
BingKun Pan created SPARK-48640: --- Summary: Perf improvement for format hex Key: SPARK-48640 URL: https://issues.apache.org/jira/browse/SPARK-48640 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48615) Perf improvement for parsing hex string
[ https://issues.apache.org/jira/browse/SPARK-48615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-48615: Assignee: Kent Yao > Perf improvement for parsing hex string > --- > > Key: SPARK-48615 > URL: https://issues.apache.org/jira/browse/SPARK-48615 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > > {code:java} > > Hex Comparison > OpenJDK > 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5 > Apple M2 Max > Cardinality 100: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > > Apache 5050 5100 > 86 0.2 5050.1 1.0X > Spark 3822 3840 > 30 0.3 3821.6 1.3X > Java 2462 2522 > 87 0.4 2462.1 2.1XOpenJDK 64-Bit Server VM 17.0.10+0 > on Mac OS X 14.5 > Apple M2 Max > Cardinality 200: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > > Apache 10020 10828 > 1154 0.2 5010.1 1.0X > Spark 6875 6966 > 144 0.3 3437.7 1.5X > Java 4999 5092 > 89 0.4 2499.3 2.0XOpenJDK 64-Bit Server VM 17.0.10+0 > on Mac OS X 14.5 > Apple M2 Max > Cardinality 400: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > > Apache 20090 20433 > 433 0.2 5022.5 1.0X > Spark 13389 13620 > 229 0.3 3347.2 1.5X > Java 10023 10069 > 42 0.4 2505.6 2.0XOpenJDK 64-Bit Server VM 17.0.10+0 > on Mac OS X 14.5 > Apple M2 Max > Cardinality 800: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > > Apache 40277 43453 > 2755 0.2 5034.7 1.0X > Spark 27145 27380 > 311 0.3 3393.1 1.5X > Java 19980 21198 > 1473 0.4 2497.5 2.0X {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48615) Perf improvement for parsing hex string
[ https://issues.apache.org/jira/browse/SPARK-48615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-48615. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46972 [https://github.com/apache/spark/pull/46972] > Perf improvement for parsing hex string > --- > > Key: SPARK-48615 > URL: https://issues.apache.org/jira/browse/SPARK-48615 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > > Hex Comparison > OpenJDK > 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5 > Apple M2 Max > Cardinality 100: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > > Apache 5050 5100 > 86 0.2 5050.1 1.0X > Spark 3822 3840 > 30 0.3 3821.6 1.3X > Java 2462 2522 > 87 0.4 2462.1 2.1XOpenJDK 64-Bit Server VM 17.0.10+0 > on Mac OS X 14.5 > Apple M2 Max > Cardinality 200: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > > Apache 10020 10828 > 1154 0.2 5010.1 1.0X > Spark 6875 6966 > 144 0.3 3437.7 1.5X > Java 4999 5092 > 89 0.4 2499.3 2.0XOpenJDK 64-Bit Server VM 17.0.10+0 > on Mac OS X 14.5 > Apple M2 Max > Cardinality 400: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > > Apache 20090 20433 > 433 0.2 5022.5 1.0X > Spark 13389 13620 > 229 0.3 3347.2 1.5X > Java 10023 10069 > 42 0.4 2505.6 2.0XOpenJDK 64-Bit Server VM 17.0.10+0 > on Mac OS X 14.5 > Apple M2 Max > Cardinality 800: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > > Apache 40277 43453 > 2755 0.2 5034.7 1.0X > Spark 27145 27380 > 311 0.3 3393.1 1.5X > Java 19980 21198 > 1473 0.4 2497.5 2.0X {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48639) Add Origin to RelationCommon in protobuf defnition
Hyukjin Kwon created SPARK-48639: Summary: Add Origin to RelationCommon in protobuf defnition Key: SPARK-48639 URL: https://issues.apache.org/jira/browse/SPARK-48639 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Hyukjin Kwon SPARK-48459 adds the new protobuf message for Origin. We should reuse the definition in `RelationCommon` as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48555) Support Column type for several SQL functions in scala and python
[ https://issues.apache.org/jira/browse/SPARK-48555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48555: Assignee: Ron Serruya > Support Column type for several SQL functions in scala and python > - > > Key: SPARK-48555 > URL: https://issues.apache.org/jira/browse/SPARK-48555 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark, Spark Core >Affects Versions: 3.5.1 >Reporter: Ron Serruya >Assignee: Ron Serruya >Priority: Major > Labels: pull-request-available > > Currently, several SQL functions accept both native types and Columns, but > only accept native types in their scala/python APIs: > * array_remove (works in SQL, scala, not in python) > * array_position(works in SQL, scala, not in python) > * map_contains_key (works in SQL, scala, not in python) > * substring (works only in SQL) > For example, this is possible in SQL: > {code:python} > spark.sql("select array_remove(col1, col2) from values(array(1,2,3), 2)") > {code} > But not in python: > {code:python} > df.select(F.array_remove(F.col("col1"), F.col("col2")) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48555) Support Column type for several SQL functions in scala and python
[ https://issues.apache.org/jira/browse/SPARK-48555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48555. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46901 [https://github.com/apache/spark/pull/46901] > Support Column type for several SQL functions in scala and python > - > > Key: SPARK-48555 > URL: https://issues.apache.org/jira/browse/SPARK-48555 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark, Spark Core >Affects Versions: 3.5.1 >Reporter: Ron Serruya >Assignee: Ron Serruya >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, several SQL functions accept both native types and Columns, but > only accept native types in their scala/python APIs: > * array_remove (works in SQL, scala, not in python) > * array_position(works in SQL, scala, not in python) > * map_contains_key (works in SQL, scala, not in python) > * substring (works only in SQL) > For example, this is possible in SQL: > {code:python} > spark.sql("select array_remove(col1, col2) from values(array(1,2,3), 2)") > {code} > But not in python: > {code:python} > df.select(F.array_remove(F.col("col1"), F.col("col2")) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47777) Add spark connect test for python streaming data source
[ https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-4. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46906 [https://github.com/apache/spark/pull/46906] > Add spark connect test for python streaming data source > --- > > Key: SPARK-4 > URL: https://issues.apache.org/jira/browse/SPARK-4 > Project: Spark > Issue Type: Test > Components: PySpark, SS, Tests >Affects Versions: 3.5.1 >Reporter: Chaoqin Li >Assignee: Chaoqin Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Make python streaming data source pyspark test also runs on spark connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47777) Add spark connect test for python streaming data source
[ https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-4: Assignee: Chaoqin Li > Add spark connect test for python streaming data source > --- > > Key: SPARK-4 > URL: https://issues.apache.org/jira/browse/SPARK-4 > Project: Spark > Issue Type: Test > Components: PySpark, SS, Tests >Affects Versions: 3.5.1 >Reporter: Chaoqin Li >Assignee: Chaoqin Li >Priority: Major > Labels: pull-request-available > > Make python streaming data source pyspark test also runs on spark connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48597) Distinguish the streaming nodes from the text representation of logical plan
[ https://issues.apache.org/jira/browse/SPARK-48597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48597. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46953 [https://github.com/apache/spark/pull/46953] > Distinguish the streaming nodes from the text representation of logical plan > > > Key: SPARK-48597 > URL: https://issues.apache.org/jira/browse/SPARK-48597 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We had a hard time to figure out whether the nodes are streaming or not, when > we debugged the issue https://issues.apache.org/jira/browse/SPARK-47305 . > Plan text for logical plan does not show the property of isStreaming, hence > we had to speculate the value based on other context. In addition, even > though the type of leaf node is explicitly meant to be streaming which > enables us to track down the isStreaming for certain subtree, the plan could > be very long and it’s a non-trivial effort to trace down to the leaf nodes. > Also, if the leaf nodes are skipped on the representation due to the size, > there is no way to get the information of isStreaming. > We propose to introduce a marker of the representation for streaming, which > will be shown in the text logical plan. There is no concept of "isStreaming" > in physical plan, so the change only needs to happen in logical plan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48597) Distinguish the streaming nodes from the text representation of logical plan
[ https://issues.apache.org/jira/browse/SPARK-48597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48597: --- Assignee: Jungtaek Lim > Distinguish the streaming nodes from the text representation of logical plan > > > Key: SPARK-48597 > URL: https://issues.apache.org/jira/browse/SPARK-48597 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Labels: pull-request-available > > We had a hard time to figure out whether the nodes are streaming or not, when > we debugged the issue https://issues.apache.org/jira/browse/SPARK-47305 . > Plan text for logical plan does not show the property of isStreaming, hence > we had to speculate the value based on other context. In addition, even > though the type of leaf node is explicitly meant to be streaming which > enables us to track down the isStreaming for certain subtree, the plan could > be very long and it’s a non-trivial effort to trace down to the leaf nodes. > Also, if the leaf nodes are skipped on the representation due to the size, > there is no way to get the information of isStreaming. > We propose to introduce a marker of the representation for streaming, which > will be shown in the text logical plan. There is no concept of "isStreaming" > in physical plan, so the change only needs to happen in logical plan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48574) Fix support for StructTypes with collations
[ https://issues.apache.org/jira/browse/SPARK-48574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48574: --- Labels: pull-request-available (was: ) > Fix support for StructTypes with collations > --- > > Key: SPARK-48574 > URL: https://issues.apache.org/jira/browse/SPARK-48574 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > > While adding expression walker it was noticed that StructType support is > broken. One problem is that `CollationsTypeCasts` is doing a cast in all > BinaryExpressions which includes ExtractValue. Consequently, we are unable to > extract value if we do a cast there, as ExtractValue only supports > nonNullLiterals as extracting keys. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48638) Native QueryExecution information for the dataframe
[ https://issues.apache.org/jira/browse/SPARK-48638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855443#comment-17855443 ] Sem Sinchenko commented on SPARK-48638: --- I'm working on the implementation of that logic in PySpark Classic. > Native QueryExecution information for the dataframe > --- > > Key: SPARK-48638 > URL: https://issues.apache.org/jira/browse/SPARK-48638 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Martin Grund >Priority: Major > Labels: pull-request-available > > Adding a new property to `DataFrame` called `queryExecution` that returns a > class that contains information about the query execution and it's metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48638) Native QueryExecution information for the dataframe
[ https://issues.apache.org/jira/browse/SPARK-48638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48638: --- Labels: pull-request-available (was: ) > Native QueryExecution information for the dataframe > --- > > Key: SPARK-48638 > URL: https://issues.apache.org/jira/browse/SPARK-48638 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Martin Grund >Priority: Major > Labels: pull-request-available > > Adding a new property to `DataFrame` called `queryExecution` that returns a > class that contains information about the query execution and it's metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48638) Native QueryExecution information for the dataframe
Martin Grund created SPARK-48638: Summary: Native QueryExecution information for the dataframe Key: SPARK-48638 URL: https://issues.apache.org/jira/browse/SPARK-48638 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Martin Grund Adding a new property to `DataFrame` called `queryExecution` that returns a class that contains information about the query execution and it's metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48637) On-demand shuffle migration peer refresh during decommission
[ https://issues.apache.org/jira/browse/SPARK-48637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48637: --- Labels: pull-request-available (was: ) > On-demand shuffle migration peer refresh during decommission > > > Key: SPARK-48637 > URL: https://issues.apache.org/jira/browse/SPARK-48637 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.3, 3.2.4, 4.0.0, 3.5.1, 3.3.4, 3.4.3 >Reporter: wuyi >Priority: Major > Labels: pull-request-available > > Currently the shuffle migration peers is refreshed every 30s by default. It > could be more effecient if we refresh the peer immediately once there is a > peer aborted. > (Strictly speaking, we only wait 30s to refresh the new peer when the there > no queued peers (i.e., `ShuffleMigrationRunnable`) in the > `shuffleMigrationPool`.) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48637) On-demand shuffle migration peer refresh during decommission
wuyi created SPARK-48637: Summary: On-demand shuffle migration peer refresh during decommission Key: SPARK-48637 URL: https://issues.apache.org/jira/browse/SPARK-48637 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.4.3, 3.3.4, 3.5.1, 3.2.4, 3.1.3, 4.0.0 Reporter: wuyi Currently the shuffle migration peers is refreshed every 30s by default. It could be more effecient if we refresh the peer immediately once there is a peer aborted. (Strictly speaking, we only wait 30s to refresh the new peer when the there no queued peers (i.e., `ShuffleMigrationRunnable`) in the `shuffleMigrationPool`.) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48636) Event driven block manager decommissioner
wuyi created SPARK-48636: Summary: Event driven block manager decommissioner Key: SPARK-48636 URL: https://issues.apache.org/jira/browse/SPARK-48636 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.4, 3.5.1, 3.2.4, 3.1.3, 4.0.0 Reporter: wuyi The current blockmanager decommisssioner uses the periodic threads to refresh blocks/ peers, monitor progress. It can be low-effeicent this way. For example, in the worst case, it takes 30s (by default) at most for an executor to exit itself even if all the blocks have been migrated. The cause is that the migration status is checked every 30s (by default). So this ticket proposes the blockmanager decommissioner to leverage the events driven way to improve its effeciency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48634) Avoid statically initialize threadpool at ExecutePlanResponseReattachableIterator
[ https://issues.apache.org/jira/browse/SPARK-48634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48634: -- Assignee: Apache Spark > Avoid statically initialize threadpool at > ExecutePlanResponseReattachableIterator > - > > Key: SPARK-48634 > URL: https://issues.apache.org/jira/browse/SPARK-48634 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Avoid having ExecutePlanResponseReattachableIterator._release_thread_pool to > initialize ThreadPool which might be dragged in pickling. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48634) Avoid statically initialize threadpool at ExecutePlanResponseReattachableIterator
[ https://issues.apache.org/jira/browse/SPARK-48634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48634: -- Assignee: (was: Apache Spark) > Avoid statically initialize threadpool at > ExecutePlanResponseReattachableIterator > - > > Key: SPARK-48634 > URL: https://issues.apache.org/jira/browse/SPARK-48634 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > Avoid having ExecutePlanResponseReattachableIterator._release_thread_pool to > initialize ThreadPool which might be dragged in pickling. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org