[jira] [Created] (SPARK-34923) Metadata output should not always be propagated
Karen Feng created SPARK-34923: -- Summary: Metadata output should not always be propagated Key: SPARK-34923 URL: https://issues.apache.org/jira/browse/SPARK-34923 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Karen Feng Today, the vast majority of expressions uncritically propagate metadata output from their children. As a general rule, it seems reasonable that only expressions that propagate their children's output should also propagate their children's metadata output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34901) Executor log pagination UI blocked by privacy blockers
Karen Feng created SPARK-34901: -- Summary: Executor log pagination UI blocked by privacy blockers Key: SPARK-34901 URL: https://issues.apache.org/jira/browse/SPARK-34901 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 3.0.0 Reporter: Karen Feng Executor log pagination is broken when using privacy blockers, such as uBlock Origin. For example, [EasyList|https://easylist.to/] has an [EasyPrivacy|https://easylist.to/easylist/easyprivacy.txt] list that blocks {{/log-view.}} Unfortunately, we use {{/static/log-view.js}} to load logs: https://github.com/apache/spark/blob/c91a75620465c5606b837a595d6cc55c7219026f/core/src/main/scala/org/apache/spark/ui/UIUtils.scala#L240 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34555) Resolve metadata output from DataFrame
[ https://issues.apache.org/jira/browse/SPARK-34555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Feng updated SPARK-34555: --- Summary: Resolve metadata output from DataFrame (was: Resolve metadata output) > Resolve metadata output from DataFrame > -- > > Key: SPARK-34555 > URL: https://issues.apache.org/jira/browse/SPARK-34555 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Karen Feng >Priority: Minor > > Today, we can't resolve a metadata column from a Spark dataframe from Scala > with {{df.col("metadataColName")}}. This is because metadata output is only > used during {{resolveChildren}} (used during SQL resolution); it should > likely also be used by {{resolve}} (used during Scala resolution). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34555) Resolve metadata output
[ https://issues.apache.org/jira/browse/SPARK-34555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17291906#comment-17291906 ] Karen Feng commented on SPARK-34555: I'm working on this. > Resolve metadata output > --- > > Key: SPARK-34555 > URL: https://issues.apache.org/jira/browse/SPARK-34555 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Karen Feng >Priority: Minor > > Today, we can't resolve a metadata column from a Spark dataframe from Scala > with {{df.col("metadataColName")}}. This is because metadata output is only > used during {{resolveChildren}} (used during SQL resolution); it should > likely also be used by {{resolve}} (used during Scala resolution). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34555) Resolve metadata output
[ https://issues.apache.org/jira/browse/SPARK-34555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Feng updated SPARK-34555: --- Description: Today, we can't resolve a metadata column from a Spark dataframe from Scala with {{df.col("metadataColName")}}. This is because metadata output is only used during {{resolveChildren}} (used during SQL resolution); it should likely also be used by {{resolve}} (used during Scala resolution). (was: Today, we can't resolve a metadata column from a Spark SQL dataframe with `df.col("metadataColName")`. This is because metadata output is only used during `resolveChildren`; it should likely also be used by `resolve`.) > Resolve metadata output > --- > > Key: SPARK-34555 > URL: https://issues.apache.org/jira/browse/SPARK-34555 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Karen Feng >Priority: Minor > > Today, we can't resolve a metadata column from a Spark dataframe from Scala > with {{df.col("metadataColName")}}. This is because metadata output is only > used during {{resolveChildren}} (used during SQL resolution); it should > likely also be used by {{resolve}} (used during Scala resolution). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34555) Resolve metadata output
Karen Feng created SPARK-34555: -- Summary: Resolve metadata output Key: SPARK-34555 URL: https://issues.apache.org/jira/browse/SPARK-34555 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.0 Reporter: Karen Feng Today, we can't resolve a metadata column from a Spark SQL dataframe with `df.col("metadataColName")`. This is because metadata output is only used during `resolveChildren`; it should likely also be used by `resolve`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34547) Resolve using child metadata attributes as fallback
Karen Feng created SPARK-34547: -- Summary: Resolve using child metadata attributes as fallback Key: SPARK-34547 URL: https://issues.apache.org/jira/browse/SPARK-34547 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.0 Reporter: Karen Feng Today, child expressions may be resolved based on "real" or metadata output attributes. If a "real" attribute and a metadata attribute have the same name, we should prefer the real one during resolution. This resolution fails today, although the user may not be aware of the metadata. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34547) Resolve using child metadata attributes as fallback
[ https://issues.apache.org/jira/browse/SPARK-34547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17291303#comment-17291303 ] Karen Feng commented on SPARK-34547: I'm working on this. > Resolve using child metadata attributes as fallback > --- > > Key: SPARK-34547 > URL: https://issues.apache.org/jira/browse/SPARK-34547 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Karen Feng >Priority: Major > > Today, child expressions may be resolved based on "real" or metadata output > attributes. If a "real" attribute and a metadata attribute have the same > name, we should prefer the real one during resolution. This resolution fails > today, although the user may not be aware of the metadata. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34527) De-duplicated common columns cannot be resolved from USING/NATURAL JOIN
[ https://issues.apache.org/jira/browse/SPARK-34527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17290211#comment-17290211 ] Karen Feng commented on SPARK-34527: I've implemented a fix for this, will push a PR. > De-duplicated common columns cannot be resolved from USING/NATURAL JOIN > --- > > Key: SPARK-34527 > URL: https://issues.apache.org/jira/browse/SPARK-34527 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Karen Feng >Priority: Minor > > USING/NATURAL JOINS today have unexpectedly asymmetric behavior when > resolving the duplicated common columns. For example, the left key columns > can be resolved from a USING INNER JOIN, but the right key columns cannot. > This is due to the Analyzer's > [rewrite|https://github.com/apache/spark/blob/999d3b89b6df14a5ccb94ffc2ffadb82964e9f7d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L3397] > of NATURAL/USING JOINs, which uses Project to remove the duplicated common > columns. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34527) De-duplicated common columns cannot be resolved from USING/NATURAL JOIN
Karen Feng created SPARK-34527: -- Summary: De-duplicated common columns cannot be resolved from USING/NATURAL JOIN Key: SPARK-34527 URL: https://issues.apache.org/jira/browse/SPARK-34527 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Karen Feng USING/NATURAL JOINS today have unexpectedly asymmetric behavior when resolving the duplicated common columns. For example, the left key columns can be resolved from a USING INNER JOIN, but the right key columns cannot. This is due to the Analyzer's [rewrite|https://github.com/apache/spark/blob/999d3b89b6df14a5ccb94ffc2ffadb82964e9f7d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L3397] of NATURAL/USING JOINs, which uses Project to remove the duplicated common columns. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33600) Group exception messages in execution/datasources/v2
[ https://issues.apache.org/jira/browse/SPARK-33600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17288584#comment-17288584 ] Karen Feng commented on SPARK-33600: I'm working on this. > Group exception messages in execution/datasources/v2 > > > Key: SPARK-33600 > URL: https://issues.apache.org/jira/browse/SPARK-33600 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2' > || Filename || Count || > | AlterTableExec.scala | 1 | > | CreateNamespaceExec.scala| 1 | > | CreateTableExec.scala| 1 | > | DataSourceRDD.scala | 2 | > | DataSourceV2Strategy.scala | 9 | > | DropNamespaceExec.scala | 2 | > | DropTableExec.scala | 1 | > | EmptyPartitionReader.scala | 1 | > | FileDataSourceV2.scala | 1 | > | FilePartitionReader.scala| 2 | > | FilePartitionReaderFactory.scala | 1 | > | ReplaceTableExec.scala | 3 | > | TableCapabilityCheck.scala | 2 | > | V1FallbackWriters.scala | 1 | > | V2SessionCatalog.scala | 14 | > | WriteToDataSourceV2Exec.scala| 10 | > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc' > || Filename || Count || > | JDBCTableCatalog.scala | 3 | > '/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2' > || Filename|| Count || > | DataSourceV2Implicits.scala | 3 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33043) RowMatrix is incompatible with spark.driver.maxResultSize=0
[ https://issues.apache.org/jira/browse/SPARK-33043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209221#comment-17209221 ] Karen Feng commented on SPARK-33043: When can we anticipate a 3.0.2 release with the fix? > RowMatrix is incompatible with spark.driver.maxResultSize=0 > --- > > Key: SPARK-33043 > URL: https://issues.apache.org/jira/browse/SPARK-33043 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.0.0, 3.0.1 >Reporter: Karen Feng >Assignee: Sean R. Owen >Priority: Minor > Fix For: 3.0.2, 3.1.0 > > > RowMatrix does not work if spark.driver.maxResultSize=0, as this requirement > breaks: > > {code:java} > require(maxDriverResultSizeInBytes > aggregatedObjectSizeInBytes, > s"Cannot aggregate object of size $aggregatedObjectSizeInBytes Bytes, " > + s"as it's bigger than maxResultSize ($maxDriverResultSizeInBytes Bytes)") > {code} > > [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L795.] > > This check should likely only happen if maxDriverResultSizeInBytes > 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33043) RowMatrix is incompatible with spark.driver.maxResultSize=0
Karen Feng created SPARK-33043: -- Summary: RowMatrix is incompatible with spark.driver.maxResultSize=0 Key: SPARK-33043 URL: https://issues.apache.org/jira/browse/SPARK-33043 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 3.0.1, 3.0.0 Reporter: Karen Feng RowMatrix does not work if spark.driver.maxResultSize=0, as this requirement breaks: {code:java} require(maxDriverResultSizeInBytes > aggregatedObjectSizeInBytes, s"Cannot aggregate object of size $aggregatedObjectSizeInBytes Bytes, "+ s"as it's bigger than maxResultSize ($maxDriverResultSizeInBytes Bytes)") {code} [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L795.] This check should likely only happen if maxDriverResultSizeInBytes > 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32793) Expose assert_true in Python/Scala APIs and add error message parameter
[ https://issues.apache.org/jira/browse/SPARK-32793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Feng updated SPARK-32793: --- Description: # Add RAISEERROR() (or RAISE_ERROR()) to the API # Add Scala/Python/R version of API for ASSERT_TRUE() # Add an extra parameter to ASSERT_TRUE() as (cond, message), and for which the `message` parameter is only lazily evaluated when the condition is not true # Change the implementation of ASSERT_TRUE() to be rewritten during optimization to IF() instead. was: # Add RAISEERROR() (or RAISE_ERROR()) to the API # Add Scala/Python/R version of API for ASSERT_TRUE() # Add an extra parameter to ASSERT_TRUE() as (cond, message), and for which the `message` parameter is only lazily evaluated when the condition is not true # Change the implementation of ASSERT_TRUE() to be rewritten during optimization to IF() instead. > Expose assert_true in Python/Scala APIs and add error message parameter > --- > > Key: SPARK-32793 > URL: https://issues.apache.org/jira/browse/SPARK-32793 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Karen Feng >Priority: Minor > > # Add RAISEERROR() (or RAISE_ERROR()) to the API > # Add Scala/Python/R version of API for ASSERT_TRUE() > # Add an extra parameter to ASSERT_TRUE() as (cond, message), and for which > the `message` parameter is only lazily evaluated when the condition is not > true > # Change the implementation of ASSERT_TRUE() to be rewritten during > optimization to IF() instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32793) Expose assert_true in Python/Scala APIs and add error message parameter
[ https://issues.apache.org/jira/browse/SPARK-32793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Feng updated SPARK-32793: --- Description: # Add RAISEERROR() (or RAISE_ERROR()) to the API # Add Scala/Python/R version of API for ASSERT_TRUE() # Add an extra parameter to ASSERT_TRUE() as (cond, message), and for which the `message` parameter is only lazily evaluated when the condition is not true # Change the implementation of ASSERT_TRUE() to be rewritten during optimization to IF() instead. was: # assert_true is only available as a Spark SQL expression, and should be exposed as a function in the Scala and Python APIs for easier programmatic access. # The error message thrown when the assertion fails is often not very useful for the user. Add a parameter so that users can pass a custom error message. > Expose assert_true in Python/Scala APIs and add error message parameter > --- > > Key: SPARK-32793 > URL: https://issues.apache.org/jira/browse/SPARK-32793 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Karen Feng >Priority: Minor > > # > Add RAISEERROR() (or RAISE_ERROR()) to the API > # > Add Scala/Python/R version of API for ASSERT_TRUE() > # > Add an extra parameter to ASSERT_TRUE() as (cond, message), and for which the > `message` parameter is only lazily evaluated when the condition is not true > # > Change the implementation of ASSERT_TRUE() to be rewritten during > optimization to IF() instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32793) Expose assert_true in Python/Scala APIs and add error message parameter
[ https://issues.apache.org/jira/browse/SPARK-32793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Feng updated SPARK-32793: --- Fix Version/s: (was: 3.1.0) > Expose assert_true in Python/Scala APIs and add error message parameter > --- > > Key: SPARK-32793 > URL: https://issues.apache.org/jira/browse/SPARK-32793 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Karen Feng >Priority: Minor > > # assert_true is only available as a Spark SQL expression, and should be > exposed as a function in the Scala and Python APIs for easier programmatic > access. > # The error message thrown when the assertion fails is often not very useful > for the user. Add a parameter so that users can pass a custom error message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32793) Expose assert_true in Python/Scala APIs and add error message parameter
Karen Feng created SPARK-32793: -- Summary: Expose assert_true in Python/Scala APIs and add error message parameter Key: SPARK-32793 URL: https://issues.apache.org/jira/browse/SPARK-32793 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.1.0 Reporter: Karen Feng Fix For: 3.1.0 # assert_true is only available as a Spark SQL expression, and should be exposed as a function in the Scala and Python APIs for easier programmatic access. # The error message thrown when the assertion fails is often not very useful for the user. Add a parameter so that users can pass a custom error message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org