[jira] [Created] (SPARK-34923) Metadata output should not always be propagated

2021-03-31 Thread Karen Feng (Jira)
Karen Feng created SPARK-34923:
--

 Summary: Metadata output should not always be propagated
 Key: SPARK-34923
 URL: https://issues.apache.org/jira/browse/SPARK-34923
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Karen Feng


Today, the vast majority of expressions uncritically propagate metadata output 
from their children. As a general rule, it seems reasonable that only 
expressions that propagate their children's output should also propagate their 
children's metadata output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34901) Executor log pagination UI blocked by privacy blockers

2021-03-29 Thread Karen Feng (Jira)
Karen Feng created SPARK-34901:
--

 Summary: Executor log pagination UI blocked by privacy blockers
 Key: SPARK-34901
 URL: https://issues.apache.org/jira/browse/SPARK-34901
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.0.0
Reporter: Karen Feng


Executor log pagination is broken when using privacy blockers, such as uBlock 
Origin. For example, [EasyList|https://easylist.to/] has an 
[EasyPrivacy|https://easylist.to/easylist/easyprivacy.txt] list that blocks 
{{/log-view.}} Unfortunately, we use {{/static/log-view.js}} to load logs: 
https://github.com/apache/spark/blob/c91a75620465c5606b837a595d6cc55c7219026f/core/src/main/scala/org/apache/spark/ui/UIUtils.scala#L240



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34555) Resolve metadata output from DataFrame

2021-02-26 Thread Karen Feng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Feng updated SPARK-34555:
---
Summary: Resolve metadata output from DataFrame  (was: Resolve metadata 
output)

> Resolve metadata output from DataFrame
> --
>
> Key: SPARK-34555
> URL: https://issues.apache.org/jira/browse/SPARK-34555
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Karen Feng
>Priority: Minor
>
> Today, we can't resolve a metadata column from a Spark dataframe from Scala 
> with {{df.col("metadataColName")}}. This is because metadata output is only 
> used during {{resolveChildren}} (used during SQL resolution); it should 
> likely also be used by {{resolve}} (used during Scala resolution).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34555) Resolve metadata output

2021-02-26 Thread Karen Feng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17291906#comment-17291906
 ] 

Karen Feng commented on SPARK-34555:


I'm working on this.

> Resolve metadata output
> ---
>
> Key: SPARK-34555
> URL: https://issues.apache.org/jira/browse/SPARK-34555
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Karen Feng
>Priority: Minor
>
> Today, we can't resolve a metadata column from a Spark dataframe from Scala 
> with {{df.col("metadataColName")}}. This is because metadata output is only 
> used during {{resolveChildren}} (used during SQL resolution); it should 
> likely also be used by {{resolve}} (used during Scala resolution).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34555) Resolve metadata output

2021-02-26 Thread Karen Feng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Feng updated SPARK-34555:
---
Description: Today, we can't resolve a metadata column from a Spark 
dataframe from Scala with {{df.col("metadataColName")}}. This is because 
metadata output is only used during {{resolveChildren}} (used during SQL 
resolution); it should likely also be used by {{resolve}} (used during Scala 
resolution).  (was: Today, we can't resolve a metadata column from a Spark SQL 
dataframe with `df.col("metadataColName")`. This is because metadata output is 
only used during `resolveChildren`; it should likely also be used by `resolve`.)

> Resolve metadata output
> ---
>
> Key: SPARK-34555
> URL: https://issues.apache.org/jira/browse/SPARK-34555
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Karen Feng
>Priority: Minor
>
> Today, we can't resolve a metadata column from a Spark dataframe from Scala 
> with {{df.col("metadataColName")}}. This is because metadata output is only 
> used during {{resolveChildren}} (used during SQL resolution); it should 
> likely also be used by {{resolve}} (used during Scala resolution).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34555) Resolve metadata output

2021-02-26 Thread Karen Feng (Jira)
Karen Feng created SPARK-34555:
--

 Summary: Resolve metadata output
 Key: SPARK-34555
 URL: https://issues.apache.org/jira/browse/SPARK-34555
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
Reporter: Karen Feng


Today, we can't resolve a metadata column from a Spark SQL dataframe with 
`df.col("metadataColName")`. This is because metadata output is only used 
during `resolveChildren`; it should likely also be used by `resolve`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34547) Resolve using child metadata attributes as fallback

2021-02-25 Thread Karen Feng (Jira)
Karen Feng created SPARK-34547:
--

 Summary: Resolve using child metadata attributes as fallback
 Key: SPARK-34547
 URL: https://issues.apache.org/jira/browse/SPARK-34547
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
Reporter: Karen Feng


Today, child expressions may be resolved based on "real" or metadata output 
attributes. If a "real" attribute and a metadata attribute have the same name, 
we should prefer the real one during resolution. This resolution fails today, 
although the user may not be aware of the metadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34547) Resolve using child metadata attributes as fallback

2021-02-25 Thread Karen Feng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17291303#comment-17291303
 ] 

Karen Feng commented on SPARK-34547:


I'm working on this.

> Resolve using child metadata attributes as fallback
> ---
>
> Key: SPARK-34547
> URL: https://issues.apache.org/jira/browse/SPARK-34547
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Karen Feng
>Priority: Major
>
> Today, child expressions may be resolved based on "real" or metadata output 
> attributes. If a "real" attribute and a metadata attribute have the same 
> name, we should prefer the real one during resolution. This resolution fails 
> today, although the user may not be aware of the metadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34527) De-duplicated common columns cannot be resolved from USING/NATURAL JOIN

2021-02-24 Thread Karen Feng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17290211#comment-17290211
 ] 

Karen Feng commented on SPARK-34527:


I've implemented a fix for this, will push a PR.

> De-duplicated common columns cannot be resolved from USING/NATURAL JOIN
> ---
>
> Key: SPARK-34527
> URL: https://issues.apache.org/jira/browse/SPARK-34527
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Karen Feng
>Priority: Minor
>
> USING/NATURAL JOINS today have unexpectedly asymmetric behavior when 
> resolving the duplicated common columns. For example, the left key columns 
> can be resolved from a USING INNER JOIN, but the right key columns cannot. 
> This is due to the Analyzer's 
> [rewrite|https://github.com/apache/spark/blob/999d3b89b6df14a5ccb94ffc2ffadb82964e9f7d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L3397]
>  of NATURAL/USING JOINs, which uses Project to remove the duplicated common 
> columns.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34527) De-duplicated common columns cannot be resolved from USING/NATURAL JOIN

2021-02-24 Thread Karen Feng (Jira)
Karen Feng created SPARK-34527:
--

 Summary: De-duplicated common columns cannot be resolved from 
USING/NATURAL JOIN
 Key: SPARK-34527
 URL: https://issues.apache.org/jira/browse/SPARK-34527
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Karen Feng


USING/NATURAL JOINS today have unexpectedly asymmetric behavior when resolving 
the duplicated common columns. For example, the left key columns can be 
resolved from a USING INNER JOIN, but the right key columns cannot. This is due 
to the Analyzer's 
[rewrite|https://github.com/apache/spark/blob/999d3b89b6df14a5ccb94ffc2ffadb82964e9f7d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L3397]
 of NATURAL/USING JOINs, which uses Project to remove the duplicated common 
columns.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33600) Group exception messages in execution/datasources/v2

2021-02-22 Thread Karen Feng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17288584#comment-17288584
 ] 

Karen Feng commented on SPARK-33600:


I'm working on this.

> Group exception messages in execution/datasources/v2
> 
>
> Key: SPARK-33600
> URL: https://issues.apache.org/jira/browse/SPARK-33600
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Priority: Major
>
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2'
> || Filename ||   Count ||
> | AlterTableExec.scala |   1 |
> | CreateNamespaceExec.scala|   1 |
> | CreateTableExec.scala|   1 |
> | DataSourceRDD.scala  |   2 |
> | DataSourceV2Strategy.scala   |   9 |
> | DropNamespaceExec.scala  |   2 |
> | DropTableExec.scala  |   1 |
> | EmptyPartitionReader.scala   |   1 |
> | FileDataSourceV2.scala   |   1 |
> | FilePartitionReader.scala|   2 |
> | FilePartitionReaderFactory.scala |   1 |
> | ReplaceTableExec.scala   |   3 |
> | TableCapabilityCheck.scala   |   2 |
> | V1FallbackWriters.scala  |   1 |
> | V2SessionCatalog.scala   |  14 |
> | WriteToDataSourceV2Exec.scala|  10 |
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc'
> || Filename   ||   Count ||
> | JDBCTableCatalog.scala |   3 |
> '/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2'
> || Filename||   Count ||
> | DataSourceV2Implicits.scala |   3 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33043) RowMatrix is incompatible with spark.driver.maxResultSize=0

2020-10-06 Thread Karen Feng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209221#comment-17209221
 ] 

Karen Feng commented on SPARK-33043:


When can we anticipate a 3.0.2 release with the fix?

> RowMatrix is incompatible with spark.driver.maxResultSize=0
> ---
>
> Key: SPARK-33043
> URL: https://issues.apache.org/jira/browse/SPARK-33043
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Karen Feng
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 3.0.2, 3.1.0
>
>
> RowMatrix does not work if spark.driver.maxResultSize=0, as this requirement 
> breaks:
>  
> {code:java}
> require(maxDriverResultSizeInBytes > aggregatedObjectSizeInBytes,  
> s"Cannot aggregate object of size $aggregatedObjectSizeInBytes Bytes, "   
>  + s"as it's bigger than maxResultSize ($maxDriverResultSizeInBytes Bytes)")
> {code}
>  
> [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L795.]
>  
> This check should likely only happen if maxDriverResultSizeInBytes > 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33043) RowMatrix is incompatible with spark.driver.maxResultSize=0

2020-09-30 Thread Karen Feng (Jira)
Karen Feng created SPARK-33043:
--

 Summary: RowMatrix is incompatible with 
spark.driver.maxResultSize=0
 Key: SPARK-33043
 URL: https://issues.apache.org/jira/browse/SPARK-33043
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 3.0.1, 3.0.0
Reporter: Karen Feng


RowMatrix does not work if spark.driver.maxResultSize=0, as this requirement 
breaks:

 
{code:java}
require(maxDriverResultSizeInBytes > aggregatedObjectSizeInBytes,  s"Cannot 
aggregate object of size $aggregatedObjectSizeInBytes Bytes, "+ s"as 
it's bigger than maxResultSize ($maxDriverResultSizeInBytes Bytes)")
{code}
 

[https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L795.]

 

This check should likely only happen if maxDriverResultSizeInBytes > 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32793) Expose assert_true in Python/Scala APIs and add error message parameter

2020-09-04 Thread Karen Feng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Feng updated SPARK-32793:
---
Description: 
# Add RAISEERROR() (or RAISE_ERROR()) to the API
 # Add Scala/Python/R version of API for ASSERT_TRUE()
 # Add an extra parameter to ASSERT_TRUE() as (cond, message), and for which 
the `message` parameter is only lazily evaluated when the condition is not true
 # Change the implementation of ASSERT_TRUE() to be rewritten during 
optimization to IF() instead.

  was:
# 
Add RAISEERROR() (or RAISE_ERROR()) to the API
 # 
Add Scala/Python/R version of API for ASSERT_TRUE()
 # 
Add an extra parameter to ASSERT_TRUE() as (cond, message), and for which the 
`message` parameter is only lazily evaluated when the condition is not true
 # 
Change the implementation of ASSERT_TRUE() to be rewritten during optimization 
to IF() instead.


> Expose assert_true in Python/Scala APIs and add error message parameter
> ---
>
> Key: SPARK-32793
> URL: https://issues.apache.org/jira/browse/SPARK-32793
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Karen Feng
>Priority: Minor
>
> # Add RAISEERROR() (or RAISE_ERROR()) to the API
>  # Add Scala/Python/R version of API for ASSERT_TRUE()
>  # Add an extra parameter to ASSERT_TRUE() as (cond, message), and for which 
> the `message` parameter is only lazily evaluated when the condition is not 
> true
>  # Change the implementation of ASSERT_TRUE() to be rewritten during 
> optimization to IF() instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32793) Expose assert_true in Python/Scala APIs and add error message parameter

2020-09-04 Thread Karen Feng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Feng updated SPARK-32793:
---
Description: 
# 
Add RAISEERROR() (or RAISE_ERROR()) to the API
 # 
Add Scala/Python/R version of API for ASSERT_TRUE()
 # 
Add an extra parameter to ASSERT_TRUE() as (cond, message), and for which the 
`message` parameter is only lazily evaluated when the condition is not true
 # 
Change the implementation of ASSERT_TRUE() to be rewritten during optimization 
to IF() instead.

  was:
# assert_true is only available as a Spark SQL expression, and should be 
exposed as a  function in the Scala and Python APIs for easier programmatic 
access.
 # The error message thrown when the assertion fails is often not very useful 
for the user. Add a parameter so that users can pass a custom error message.


> Expose assert_true in Python/Scala APIs and add error message parameter
> ---
>
> Key: SPARK-32793
> URL: https://issues.apache.org/jira/browse/SPARK-32793
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Karen Feng
>Priority: Minor
>
> # 
> Add RAISEERROR() (or RAISE_ERROR()) to the API
>  # 
> Add Scala/Python/R version of API for ASSERT_TRUE()
>  # 
> Add an extra parameter to ASSERT_TRUE() as (cond, message), and for which the 
> `message` parameter is only lazily evaluated when the condition is not true
>  # 
> Change the implementation of ASSERT_TRUE() to be rewritten during 
> optimization to IF() instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32793) Expose assert_true in Python/Scala APIs and add error message parameter

2020-09-03 Thread Karen Feng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Feng updated SPARK-32793:
---
Fix Version/s: (was: 3.1.0)

> Expose assert_true in Python/Scala APIs and add error message parameter
> ---
>
> Key: SPARK-32793
> URL: https://issues.apache.org/jira/browse/SPARK-32793
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Karen Feng
>Priority: Minor
>
> # assert_true is only available as a Spark SQL expression, and should be 
> exposed as a  function in the Scala and Python APIs for easier programmatic 
> access.
>  # The error message thrown when the assertion fails is often not very useful 
> for the user. Add a parameter so that users can pass a custom error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32793) Expose assert_true in Python/Scala APIs and add error message parameter

2020-09-03 Thread Karen Feng (Jira)
Karen Feng created SPARK-32793:
--

 Summary: Expose assert_true in Python/Scala APIs and add error 
message parameter
 Key: SPARK-32793
 URL: https://issues.apache.org/jira/browse/SPARK-32793
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.1.0
Reporter: Karen Feng
 Fix For: 3.1.0


# assert_true is only available as a Spark SQL expression, and should be 
exposed as a  function in the Scala and Python APIs for easier programmatic 
access.
 # The error message thrown when the assertion fails is often not very useful 
for the user. Add a parameter so that users can pass a custom error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2