[jira] [Assigned] (SPARK-39217) Makes DPP support the pruning side has Union

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39217:


Assignee: Apache Spark

> Makes DPP support the pruning side has Union
> 
>
> Key: SPARK-39217
> URL: https://issues.apache.org/jira/browse/SPARK-39217
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> Support the following case:
> {noformat}
> SELECT f.store_id,
>f.date_id,
>s.state_province
> FROM (SELECT 4 AS store_id,
>date_id,
>product_id
>   FROM   fact_sk
>   WHERE  date_id >= 1300
>   UNION ALL
>   SELECT   store_id,
>date_id,
>product_id
>   FROM   fact_stats
>   WHERE  date_id <= 1000) f
> JOIN dim_store s
> ON f.store_id = s.store_id
> WHERE s.country IN ('US', 'NL')
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39217) Makes DPP support the pruning side has Union

2022-05-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538579#comment-17538579
 ] 

Apache Spark commented on SPARK-39217:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/36588

> Makes DPP support the pruning side has Union
> 
>
> Key: SPARK-39217
> URL: https://issues.apache.org/jira/browse/SPARK-39217
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> Support the following case:
> {noformat}
> SELECT f.store_id,
>f.date_id,
>s.state_province
> FROM (SELECT 4 AS store_id,
>date_id,
>product_id
>   FROM   fact_sk
>   WHERE  date_id >= 1300
>   UNION ALL
>   SELECT   store_id,
>date_id,
>product_id
>   FROM   fact_stats
>   WHERE  date_id <= 1000) f
> JOIN dim_store s
> ON f.store_id = s.store_id
> WHERE s.country IN ('US', 'NL')
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39217) Makes DPP support the pruning side has Union

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39217:


Assignee: (was: Apache Spark)

> Makes DPP support the pruning side has Union
> 
>
> Key: SPARK-39217
> URL: https://issues.apache.org/jira/browse/SPARK-39217
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> Support the following case:
> {noformat}
> SELECT f.store_id,
>f.date_id,
>s.state_province
> FROM (SELECT 4 AS store_id,
>date_id,
>product_id
>   FROM   fact_sk
>   WHERE  date_id >= 1300
>   UNION ALL
>   SELECT   store_id,
>date_id,
>product_id
>   FROM   fact_stats
>   WHERE  date_id <= 1000) f
> JOIN dim_store s
> ON f.store_id = s.store_id
> WHERE s.country IN ('US', 'NL')
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39214) Improve errors related to CAST

2022-05-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-39214.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36553
[https://github.com/apache/spark/pull/36553]

> Improve errors related to CAST
> --
>
> Key: SPARK-39214
> URL: https://issues.apache.org/jira/browse/SPARK-39214
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> 1. Rename the error classes INVALID_SYNTAX_FOR_CAST and CAST_CAUSES_OVERFLOW 
> to make more precise and clear.
> 2. Improve error messages of the error classes (use quotes for SQL config and 
> function names).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39217) Makes DPP support the pruning side has Union

2022-05-17 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-39217:
---

 Summary: Makes DPP support the pruning side has Union
 Key: SPARK-39217
 URL: https://issues.apache.org/jira/browse/SPARK-39217
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Yuming Wang


Support the following case:

{noformat}
SELECT f.store_id,
   f.date_id,
   s.state_province
FROM (SELECT 4 AS store_id,
   date_id,
   product_id
  FROM   fact_sk
  WHERE  date_id >= 1300
  UNION ALL
  SELECT   store_id,
   date_id,
   product_id
  FROM   fact_stats
  WHERE  date_id <= 1000) f
JOIN dim_store s
ON f.store_id = s.store_id
WHERE s.country IN ('US', 'NL')
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-38615) Provide error context for runtime ANSI failures

2022-05-17 Thread Gengliang Wang (Jira)


[ https://issues.apache.org/jira/browse/SPARK-38615 ]


Gengliang Wang deleted comment on SPARK-38615:


was (Author: gengliang.wang):
[~maxgekk] I am targeting this one in 3.3 as well. Since it is an error message 
improvement, let's try to finish as much as we can in 3.3.

What do you think?

> Provide error context for runtime ANSI failures
> ---
>
> Key: SPARK-38615
> URL: https://issues.apache.org/jira/browse/SPARK-38615
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Priority: Major
>
> Currently,  there is not enough error context for runtime ANSI failures.
> In the following example, the error message only tells that there is a 
> "divide by zero" error, without pointing out where the exact SQL statement is.
> {code:java}
> > SELECT
>   ss1.ca_county,
>   ss1.d_year,
>   ws2.web_sales / ws1.web_sales web_q1_q2_increase,
>   ss2.store_sales / ss1.store_sales store_q1_q2_increase,
>   ws3.web_sales / ws2.web_sales web_q2_q3_increase,
>   ss3.store_sales / ss2.store_sales store_q2_q3_increase
> FROM
>   ss ss1, ss ss2, ss ss3, ws ws1, ws ws2, ws ws3
> WHERE
>   ss1.d_qoy = 1
> AND ss1.d_year = 2000
> AND ss1.ca_county = ss2.ca_county
> AND ss2.d_qoy = 2
> AND ss2.d_year = 2000
> AND ss2.ca_county = ss3.ca_county
> AND ss3.d_qoy = 3
> AND ss3.d_year = 2000
> AND ss1.ca_county = ws1.ca_county
> AND ws1.d_qoy = 1
> AND ws1.d_year = 2000
> AND ws1.ca_county = ws2.ca_county
> AND ws2.d_qoy = 2
> AND ws2.d_year = 2000
> AND ws1.ca_county = ws3.ca_county
> AND ws3.d_qoy = 3
> AND ws3.d_year = 2000
> AND CASE WHEN ws1.web_sales > 0
> THEN ws2.web_sales / ws1.web_sales
> ELSE NULL END
> > CASE WHEN ss1.store_sales > 0
> THEN ss2.store_sales / ss1.store_sales
>   ELSE NULL END
> AND CASE WHEN ws2.web_sales > 0
> THEN ws3.web_sales / ws2.web_sales
> ELSE NULL END
> > CASE WHEN ss2.store_sales > 0
> THEN ss3.store_sales / ss2.store_sales
>   ELSE NULL END
> ORDER BY ss1.ca_county
>  {code}
> {code:java}
> org.apache.spark.SparkArithmeticException: divide by zero at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:140)
>  at 
> org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:437)
>  at 
> org.apache.spark.sql.catalyst.expressions.DivModLike.eval$(arithmetic.scala:425)
>  at 
> org.apache.spark.sql.catalyst.expressions.Divide.eval(arithmetic.scala:534)
> {code}
>  
> I suggest that we provide details in the error message,  including:
>  * the problematic expression from the original SQL query, e.g. 
> "ss3.store_sales / ss2.store_sales store_q2_q3_increase"
>  * the line number and starting char position of the problematic expression, 
> in case of queries like "select a + b from t1 union select a + b from t2"



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39193) Improve the performance of inferring Timestamp type in JSON/CSV data source

2022-05-17 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-39193.

Fix Version/s: 3.3.1
   Resolution: Fixed

Issue resolved by pull request 36562
[https://github.com/apache/spark/pull/36562]

> Improve the performance of inferring Timestamp type in JSON/CSV data source
> ---
>
> Key: SPARK-39193
> URL: https://issues.apache.org/jira/browse/SPARK-39193
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.1
>
>
> When reading JSON/CSV files with inferring timestamp types 
> `.option("inferTimestamp", true)`, the Timestamp conversion will throw and 
> catch exceptions. As we are putting decent error messages in the exception, 
> the creation of the exceptions is actually not cheap. It consumes more than 
> 90% of the type inference time. 
> We can use the parsing methods which return optional results instead.
> Before the change, it takes 166 seconds to infer a JSON file of 624MB with 
> inferring timestamp enabled.
> After the change, it only 16 seconds.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39216) Issue with correlated subquery and Union

2022-05-17 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-39216:
-
Description: 
 

SPARK-37915 added CollapseProject in rule CombineUnions, but it shouldn't 
collapse projects that contain correlated subqueries since haven't been 
de-correlated (PullupCorrelatedPredicates).

Here is a simple example to reproduce this issue
{code:java}
SELECT (SELECT IF(x, 1, 0)) AS a
FROM (SELECT true) t(x)
UNION 
SELECT 1 AS a {code}
Exception:
{code:java}
java.lang.IllegalStateException: Couldn't find x#4 in [] {code}
 

  was:
 

SPARK-37915 added CollapseProject in rule CombineUnions, but it shouldn't 
collapse projects that contain correlated subqueries since haven't been 
de-correlated (PullupCorrelatedPredicates).

Here is a simple example to reproduce this issue
{code:java}
SELECT (SELECT IF(x, 1, 0)) AS a
FROM (SELECT true) t(x)
UNION 
SELECT 1 AS a {code}
Exception:

 

 
{code:java}
java.lang.IllegalStateException: Couldn't find x#4 in [] {code}
 


> Issue with correlated subquery and Union
> 
>
> Key: SPARK-39216
> URL: https://issues.apache.org/jira/browse/SPARK-39216
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
>  
> SPARK-37915 added CollapseProject in rule CombineUnions, but it shouldn't 
> collapse projects that contain correlated subqueries since haven't been 
> de-correlated (PullupCorrelatedPredicates).
> Here is a simple example to reproduce this issue
> {code:java}
> SELECT (SELECT IF(x, 1, 0)) AS a
> FROM (SELECT true) t(x)
> UNION 
> SELECT 1 AS a {code}
> Exception:
> {code:java}
> java.lang.IllegalStateException: Couldn't find x#4 in [] {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39216) Issue with correlated subquery and Union

2022-05-17 Thread Allison Wang (Jira)
Allison Wang created SPARK-39216:


 Summary: Issue with correlated subquery and Union
 Key: SPARK-39216
 URL: https://issues.apache.org/jira/browse/SPARK-39216
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Allison Wang


 

SPARK-37915 added CollapseProject in rule CombineUnions, but it shouldn't 
collapse projects that contain correlated subqueries since haven't been 
de-correlated (PullupCorrelatedPredicates).

Here is a simple example to reproduce this issue
{code:java}
SELECT (SELECT IF(x, 1, 0)) AS a
FROM (SELECT true) t(x)
UNION 
SELECT 1 AS a {code}
Exception:

 

 
{code:java}
java.lang.IllegalStateException: Couldn't find x#4 in [] {code}
 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39215) Reduce Py4J calls in pyspark.sql.utils.is_timestamp_ntz_preferred

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39215:


Assignee: Apache Spark

> Reduce Py4J calls in pyspark.sql.utils.is_timestamp_ntz_preferred
> -
>
> Key: SPARK-39215
> URL: https://issues.apache.org/jira/browse/SPARK-39215
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> Here 
> https://github.com/apache/spark/blob/master/python/pyspark/sql/utils.py#L296-L302
> It unnecessarily accesses to JVM too often. We can just have a single method 
> to avoid that.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39215) Reduce Py4J calls in pyspark.sql.utils.is_timestamp_ntz_preferred

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39215:


Assignee: (was: Apache Spark)

> Reduce Py4J calls in pyspark.sql.utils.is_timestamp_ntz_preferred
> -
>
> Key: SPARK-39215
> URL: https://issues.apache.org/jira/browse/SPARK-39215
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Here 
> https://github.com/apache/spark/blob/master/python/pyspark/sql/utils.py#L296-L302
> It unnecessarily accesses to JVM too often. We can just have a single method 
> to avoid that.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39215) Reduce Py4J calls in pyspark.sql.utils.is_timestamp_ntz_preferred

2022-05-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538529#comment-17538529
 ] 

Apache Spark commented on SPARK-39215:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/36587

> Reduce Py4J calls in pyspark.sql.utils.is_timestamp_ntz_preferred
> -
>
> Key: SPARK-39215
> URL: https://issues.apache.org/jira/browse/SPARK-39215
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Here 
> https://github.com/apache/spark/blob/master/python/pyspark/sql/utils.py#L296-L302
> It unnecessarily accesses to JVM too often. We can just have a single method 
> to avoid that.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39215) Reduce Py4J calls in pyspark.sql.utils.is_timestamp_ntz_preferred

2022-05-17 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-39215:


 Summary: Reduce Py4J calls in 
pyspark.sql.utils.is_timestamp_ntz_preferred
 Key: SPARK-39215
 URL: https://issues.apache.org/jira/browse/SPARK-39215
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Hyukjin Kwon


Here 
https://github.com/apache/spark/blob/master/python/pyspark/sql/utils.py#L296-L302

It unnecessarily accesses to JVM too often. We can just have a single method to 
avoid that.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39054) GroupByTest failed due to axis Length mismatch

2022-05-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39054.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36581
[https://github.com/apache/spark/pull/36581]

> GroupByTest failed due to axis Length mismatch
> --
>
> Key: SPARK-39054
> URL: https://issues.apache.org/jira/browse/SPARK-39054
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> An error occurred while calling o27083.getResult.
> : org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
>   at 
> org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:97)
>   at 
> org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:93)
>   at sun.reflect.GeneratedMethodAccessor91.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>   at py4j.Gateway.invoke(Gateway.java:282)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at 
> py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
>   at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
>   at java.lang.Thread.run(Thread.java:750)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 808.0 failed 1 times, most recent failure: Lost task 0.0 in 
> stage 808.0 (TID 650) (localhost executor driver): 
> org.apache.spark.api.python.PythonException: Traceback (most recent call 
> last):
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 686, 
> in main
> process()
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 678, 
> in process
> serializer.dump_stream(out_iter, outfile)
>   File 
> "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 343, in dump_stream
> return ArrowStreamSerializer.dump_stream(self, 
> init_stream_yield_batches(), stream)
>   File 
> "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 84, in dump_stream
> for batch in iterator:
>   File 
> "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 336, in init_stream_yield_batches
> for series in iterator:
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 487, 
> in mapper
> return f(keys, vals)
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 207, 
> in 
> return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))]
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 185, 
> in wrapped
> result = f(pd.concat(value_series, axis=1))
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 81, in 
> wrapper
> return f(*args, **kwargs)
>   File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1620, in 
> rename_output
> pdf.columns = return_schema.names
>   File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line 
> 5588, in __setattr__
> return object.__setattr__(self, name, value)
>   File "pandas/_libs/properties.pyx", line 70, in 
> pandas._libs.properties.AxisProperty.__set__
>   File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line 
> 769, in _set_axis
> self._mgr.set_axis(axis, labels)
>   File 
> "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/managers.py", 
> line 214, in set_axis
> self._validate_set_axis(axis, new_labels)
>   File 
> "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/base.py", line 
> 69, in _validate_set_axis
> raise ValueError(
> ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 
> elements {code}
>  
> GroupByTest.test_apply_with_new_dataframe_without_shortcut



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39054) GroupByTest failed due to axis Length mismatch

2022-05-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-39054:


Assignee: Apache Spark

> GroupByTest failed due to axis Length mismatch
> --
>
> Key: SPARK-39054
> URL: https://issues.apache.org/jira/browse/SPARK-39054
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> An error occurred while calling o27083.getResult.
> : org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
>   at 
> org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:97)
>   at 
> org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:93)
>   at sun.reflect.GeneratedMethodAccessor91.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>   at py4j.Gateway.invoke(Gateway.java:282)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at 
> py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
>   at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
>   at java.lang.Thread.run(Thread.java:750)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 808.0 failed 1 times, most recent failure: Lost task 0.0 in 
> stage 808.0 (TID 650) (localhost executor driver): 
> org.apache.spark.api.python.PythonException: Traceback (most recent call 
> last):
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 686, 
> in main
> process()
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 678, 
> in process
> serializer.dump_stream(out_iter, outfile)
>   File 
> "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 343, in dump_stream
> return ArrowStreamSerializer.dump_stream(self, 
> init_stream_yield_batches(), stream)
>   File 
> "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 84, in dump_stream
> for batch in iterator:
>   File 
> "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 336, in init_stream_yield_batches
> for series in iterator:
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 487, 
> in mapper
> return f(keys, vals)
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 207, 
> in 
> return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))]
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 185, 
> in wrapped
> result = f(pd.concat(value_series, axis=1))
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 81, in 
> wrapper
> return f(*args, **kwargs)
>   File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1620, in 
> rename_output
> pdf.columns = return_schema.names
>   File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line 
> 5588, in __setattr__
> return object.__setattr__(self, name, value)
>   File "pandas/_libs/properties.pyx", line 70, in 
> pandas._libs.properties.AxisProperty.__set__
>   File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line 
> 769, in _set_axis
> self._mgr.set_axis(axis, labels)
>   File 
> "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/managers.py", 
> line 214, in set_axis
> self._validate_set_axis(axis, new_labels)
>   File 
> "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/base.py", line 
> 69, in _validate_set_axis
> raise ValueError(
> ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 
> elements {code}
>  
> GroupByTest.test_apply_with_new_dataframe_without_shortcut



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39192) make pandas-on-spark's kurt consistent with pandas

2022-05-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-39192:


Assignee: zhengruifeng

> make pandas-on-spark's kurt consistent with pandas
> --
>
> Key: SPARK-39192
> URL: https://issues.apache.org/jira/browse/SPARK-39192
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39192) make pandas-on-spark's kurt consistent with pandas

2022-05-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39192.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36560
[https://github.com/apache/spark/pull/36560]

> make pandas-on-spark's kurt consistent with pandas
> --
>
> Key: SPARK-39192
> URL: https://issues.apache.org/jira/browse/SPARK-39192
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39143) Support CSV file scans with DEFAULT values

2022-05-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39143.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36501
[https://github.com/apache/spark/pull/36501]

> Support CSV file scans with DEFAULT values
> --
>
> Key: SPARK-39143
> URL: https://issues.apache.org/jira/browse/SPARK-39143
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39143) Support CSV file scans with DEFAULT values

2022-05-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-39143:


Assignee: Daniel

> Support CSV file scans with DEFAULT values
> --
>
> Key: SPARK-39143
> URL: https://issues.apache.org/jira/browse/SPARK-39143
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39104) Null Pointer Exeption on unpersist call

2022-05-17 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-39104.
--
Fix Version/s: 3.3.1
   3.2.2
   3.4.0
   Resolution: Fixed

Issue resolved by pull request 36496
[https://github.com/apache/spark/pull/36496]

> Null Pointer Exeption on unpersist call
> ---
>
> Key: SPARK-39104
> URL: https://issues.apache.org/jira/browse/SPARK-39104
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Denis
>Assignee: Cheng Pan
>Priority: Major
> Fix For: 3.3.1, 3.2.2, 3.4.0
>
>
> DataFrame.unpesist call fails wth NPE
>  
> {code:java}
> java.lang.NullPointerException
>     at 
> org.apache.spark.sql.execution.columnar.CachedRDDBuilder.isCachedRDDLoaded(InMemoryRelation.scala:247)
>     at 
> org.apache.spark.sql.execution.columnar.CachedRDDBuilder.isCachedColumnBuffersLoaded(InMemoryRelation.scala:241)
>     at 
> org.apache.spark.sql.execution.CacheManager.$anonfun$uncacheQuery$8(CacheManager.scala:189)
>     at 
> org.apache.spark.sql.execution.CacheManager.$anonfun$uncacheQuery$8$adapted(CacheManager.scala:176)
>     at 
> scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:304)
>     at scala.collection.Iterator.foreach(Iterator.scala:943)
>     at scala.collection.Iterator.foreach$(Iterator.scala:943)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>     at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>     at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>     at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>     at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:303)
>     at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:297)
>     at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
>     at scala.collection.TraversableLike.filter(TraversableLike.scala:395)
>     at scala.collection.TraversableLike.filter$(TraversableLike.scala:395)
>     at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
>     at 
> org.apache.spark.sql.execution.CacheManager.recacheByCondition(CacheManager.scala:219)
>     at 
> org.apache.spark.sql.execution.CacheManager.uncacheQuery(CacheManager.scala:176)
>     at org.apache.spark.sql.Dataset.unpersist(Dataset.scala:3220)
>     at org.apache.spark.sql.Dataset.unpersist(Dataset.scala:3231){code}
> Looks like syncronization in required for 
> org.apache.spark.sql.execution.columnar.CachedRDDBuilder#isCachedColumnBuffersLoaded
>  
> {code:java}
> def isCachedColumnBuffersLoaded: Boolean = {
>   _cachedColumnBuffers != null && isCachedRDDLoaded
> }
> def isCachedRDDLoaded: Boolean = {
> _cachedColumnBuffersAreLoaded || {
>   val bmMaster = SparkEnv.get.blockManager.master
>   val rddLoaded = _cachedColumnBuffers.partitions.forall { partition =>
> bmMaster.getBlockStatus(RDDBlockId(_cachedColumnBuffers.id, 
> partition.index), false)
>   .exists { case(_, blockStatus) => blockStatus.isCached }
>   }
>   if (rddLoaded) {
> _cachedColumnBuffersAreLoaded = rddLoaded
>   }
>   rddLoaded
>   }
> } {code}
> isCachedRDDLoaded relies on _cachedColumnBuffers != null check while it can 
> be changed concurrently from other thread. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39104) Null Pointer Exeption on unpersist call

2022-05-17 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-39104:


Assignee: Cheng Pan

> Null Pointer Exeption on unpersist call
> ---
>
> Key: SPARK-39104
> URL: https://issues.apache.org/jira/browse/SPARK-39104
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Denis
>Assignee: Cheng Pan
>Priority: Major
>
> DataFrame.unpesist call fails wth NPE
>  
> {code:java}
> java.lang.NullPointerException
>     at 
> org.apache.spark.sql.execution.columnar.CachedRDDBuilder.isCachedRDDLoaded(InMemoryRelation.scala:247)
>     at 
> org.apache.spark.sql.execution.columnar.CachedRDDBuilder.isCachedColumnBuffersLoaded(InMemoryRelation.scala:241)
>     at 
> org.apache.spark.sql.execution.CacheManager.$anonfun$uncacheQuery$8(CacheManager.scala:189)
>     at 
> org.apache.spark.sql.execution.CacheManager.$anonfun$uncacheQuery$8$adapted(CacheManager.scala:176)
>     at 
> scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:304)
>     at scala.collection.Iterator.foreach(Iterator.scala:943)
>     at scala.collection.Iterator.foreach$(Iterator.scala:943)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>     at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>     at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>     at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>     at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:303)
>     at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:297)
>     at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
>     at scala.collection.TraversableLike.filter(TraversableLike.scala:395)
>     at scala.collection.TraversableLike.filter$(TraversableLike.scala:395)
>     at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
>     at 
> org.apache.spark.sql.execution.CacheManager.recacheByCondition(CacheManager.scala:219)
>     at 
> org.apache.spark.sql.execution.CacheManager.uncacheQuery(CacheManager.scala:176)
>     at org.apache.spark.sql.Dataset.unpersist(Dataset.scala:3220)
>     at org.apache.spark.sql.Dataset.unpersist(Dataset.scala:3231){code}
> Looks like syncronization in required for 
> org.apache.spark.sql.execution.columnar.CachedRDDBuilder#isCachedColumnBuffersLoaded
>  
> {code:java}
> def isCachedColumnBuffersLoaded: Boolean = {
>   _cachedColumnBuffers != null && isCachedRDDLoaded
> }
> def isCachedRDDLoaded: Boolean = {
> _cachedColumnBuffersAreLoaded || {
>   val bmMaster = SparkEnv.get.blockManager.master
>   val rddLoaded = _cachedColumnBuffers.partitions.forall { partition =>
> bmMaster.getBlockStatus(RDDBlockId(_cachedColumnBuffers.id, 
> partition.index), false)
>   .exists { case(_, blockStatus) => blockStatus.isCached }
>   }
>   if (rddLoaded) {
> _cachedColumnBuffersAreLoaded = rddLoaded
>   }
>   rddLoaded
>   }
> } {code}
> isCachedRDDLoaded relies on _cachedColumnBuffers != null check while it can 
> be changed concurrently from other thread. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39213) Create ANY_VALUE aggregate function

2022-05-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538447#comment-17538447
 ] 

Apache Spark commented on SPARK-39213:
--

User 'vli-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/36584

> Create ANY_VALUE aggregate function
> ---
>
> Key: SPARK-39213
> URL: https://issues.apache.org/jira/browse/SPARK-39213
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vitalii Li
>Priority: Major
>
> This is a feature request to add an \{{ANY_VALUE}} aggregate function. This 
> would consume input values and quickly return any arbitrary element.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39213) Create ANY_VALUE aggregate function

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39213:


Assignee: (was: Apache Spark)

> Create ANY_VALUE aggregate function
> ---
>
> Key: SPARK-39213
> URL: https://issues.apache.org/jira/browse/SPARK-39213
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vitalii Li
>Priority: Major
>
> This is a feature request to add an \{{ANY_VALUE}} aggregate function. This 
> would consume input values and quickly return any arbitrary element.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39213) Create ANY_VALUE aggregate function

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39213:


Assignee: Apache Spark

> Create ANY_VALUE aggregate function
> ---
>
> Key: SPARK-39213
> URL: https://issues.apache.org/jira/browse/SPARK-39213
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vitalii Li
>Assignee: Apache Spark
>Priority: Major
>
> This is a feature request to add an \{{ANY_VALUE}} aggregate function. This 
> would consume input values and quickly return any arbitrary element.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39213) Create ANY_VALUE aggregate function

2022-05-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538445#comment-17538445
 ] 

Apache Spark commented on SPARK-39213:
--

User 'vli-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/36584

> Create ANY_VALUE aggregate function
> ---
>
> Key: SPARK-39213
> URL: https://issues.apache.org/jira/browse/SPARK-39213
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vitalii Li
>Priority: Major
>
> This is a feature request to add an \{{ANY_VALUE}} aggregate function. This 
> would consume input values and quickly return any arbitrary element.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39212) Use double quotes for values of SQL configs/DS options in error messages

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39212:


Assignee: Max Gekk  (was: Apache Spark)

> Use double quotes for values of SQL configs/DS options in error messages
> 
>
> Key: SPARK-39212
> URL: https://issues.apache.org/jira/browse/SPARK-39212
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> All SQL configs/DS option values should be printed in SQL style in error 
> messages, and wrapped by double quotes. For example, the value true of the 
> config spark.sql.ansi.enabled should be highlighted as "true" to make it more 
> visible in error messages.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39212) Use double quotes for values of SQL configs/DS options in error messages

2022-05-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538379#comment-17538379
 ] 

Apache Spark commented on SPARK-39212:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/36579

> Use double quotes for values of SQL configs/DS options in error messages
> 
>
> Key: SPARK-39212
> URL: https://issues.apache.org/jira/browse/SPARK-39212
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> All SQL configs/DS option values should be printed in SQL style in error 
> messages, and wrapped by double quotes. For example, the value true of the 
> config spark.sql.ansi.enabled should be highlighted as "true" to make it more 
> visible in error messages.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39212) Use double quotes for values of SQL configs/DS options in error messages

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39212:


Assignee: Apache Spark  (was: Max Gekk)

> Use double quotes for values of SQL configs/DS options in error messages
> 
>
> Key: SPARK-39212
> URL: https://issues.apache.org/jira/browse/SPARK-39212
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> All SQL configs/DS option values should be printed in SQL style in error 
> messages, and wrapped by double quotes. For example, the value true of the 
> config spark.sql.ansi.enabled should be highlighted as "true" to make it more 
> visible in error messages.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39214) Improve errors related to CAST

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39214:


Assignee: Apache Spark  (was: Max Gekk)

> Improve errors related to CAST
> --
>
> Key: SPARK-39214
> URL: https://issues.apache.org/jira/browse/SPARK-39214
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> 1. Rename the error classes INVALID_SYNTAX_FOR_CAST and CAST_CAUSES_OVERFLOW 
> to make more precise and clear.
> 2. Improve error messages of the error classes (use quotes for SQL config and 
> function names).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39214) Improve errors related to CAST

2022-05-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538378#comment-17538378
 ] 

Apache Spark commented on SPARK-39214:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/36553

> Improve errors related to CAST
> --
>
> Key: SPARK-39214
> URL: https://issues.apache.org/jira/browse/SPARK-39214
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> 1. Rename the error classes INVALID_SYNTAX_FOR_CAST and CAST_CAUSES_OVERFLOW 
> to make more precise and clear.
> 2. Improve error messages of the error classes (use quotes for SQL config and 
> function names).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39214) Improve errors related to CAST

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39214:


Assignee: Max Gekk  (was: Apache Spark)

> Improve errors related to CAST
> --
>
> Key: SPARK-39214
> URL: https://issues.apache.org/jira/browse/SPARK-39214
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> 1. Rename the error classes INVALID_SYNTAX_FOR_CAST and CAST_CAUSES_OVERFLOW 
> to make more precise and clear.
> 2. Improve error messages of the error classes (use quotes for SQL config and 
> function names).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39214) Improve errors related to CAST

2022-05-17 Thread Max Gekk (Jira)
Max Gekk created SPARK-39214:


 Summary: Improve errors related to CAST
 Key: SPARK-39214
 URL: https://issues.apache.org/jira/browse/SPARK-39214
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk
Assignee: Max Gekk


1. Rename the error classes INVALID_SYNTAX_FOR_CAST and CAST_CAUSES_OVERFLOW to 
make more precise and clear.
2. Improve error messages of the error classes (use quotes for SQL config and 
function names).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39213) Create ANY_VALUE aggregate function

2022-05-17 Thread Vitalii Li (Jira)
Vitalii Li created SPARK-39213:
--

 Summary: Create ANY_VALUE aggregate function
 Key: SPARK-39213
 URL: https://issues.apache.org/jira/browse/SPARK-39213
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Vitalii Li


This is a feature request to add an \{{ANY_VALUE}} aggregate function. This 
would consume input values and quickly return any arbitrary element.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39212) Use double quotes for values of SQL configs/DS options in error messages

2022-05-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-39212:
-
Description: All SQL configs/DS option values should be printed in SQL 
style in error messages, and wrapped by double quotes. For example, the value 
true of the config spark.sql.ansi.enabled should be highlighted as "true" to 
make it more visible in error messages.  (was: All SQL configs should be 
printed in SQL style in error messages, and wrapped by double quotes. For 
example, the config spark.sql.ansi.enabled should be highlighted as 
"spark.sql.ansi.enabled" to make it more visible in error messages.)

> Use double quotes for values of SQL configs/DS options in error messages
> 
>
> Key: SPARK-39212
> URL: https://issues.apache.org/jira/browse/SPARK-39212
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> All SQL configs/DS option values should be printed in SQL style in error 
> messages, and wrapped by double quotes. For example, the value true of the 
> config spark.sql.ansi.enabled should be highlighted as "true" to make it more 
> visible in error messages.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39212) Use double quotes for values of SQL configs/DS options in error messages

2022-05-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-39212:
-
Fix Version/s: (was: 3.3.0)
   (was: 3.4.0)

> Use double quotes for values of SQL configs/DS options in error messages
> 
>
> Key: SPARK-39212
> URL: https://issues.apache.org/jira/browse/SPARK-39212
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> All SQL configs should be printed in SQL style in error messages, and wrapped 
> by double quotes. For example, the config spark.sql.ansi.enabled should be 
> highlighted as "spark.sql.ansi.enabled" to make it more visible in error 
> messages.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39212) Use double quotes for values of SQL configs/DS options in error messages

2022-05-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-39212:
-
Affects Version/s: (was: 3.3.0)

> Use double quotes for values of SQL configs/DS options in error messages
> 
>
> Key: SPARK-39212
> URL: https://issues.apache.org/jira/browse/SPARK-39212
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
>
> All SQL configs should be printed in SQL style in error messages, and wrapped 
> by double quotes. For example, the config spark.sql.ansi.enabled should be 
> highlighted as "spark.sql.ansi.enabled" to make it more visible in error 
> messages.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39212) Use double quotes for values of SQL configs/DS options in error messages

2022-05-17 Thread Max Gekk (Jira)
Max Gekk created SPARK-39212:


 Summary: Use double quotes for values of SQL configs/DS options in 
error messages
 Key: SPARK-39212
 URL: https://issues.apache.org/jira/browse/SPARK-39212
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0, 3.4.0
Reporter: Max Gekk
Assignee: Max Gekk
 Fix For: 3.3.0, 3.4.0


All SQL configs should be printed in SQL style in error messages, and wrapped 
by double quotes. For example, the config spark.sql.ansi.enabled should be 
highlighted as "spark.sql.ansi.enabled" to make it more visible in error 
messages.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39211) Support JSON file scans with default values

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39211:


Assignee: Apache Spark

> Support JSON file scans with default values
> ---
>
> Key: SPARK-39211
> URL: https://issues.apache.org/jira/browse/SPARK-39211
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39211) Support JSON file scans with default values

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39211:


Assignee: (was: Apache Spark)

> Support JSON file scans with default values
> ---
>
> Key: SPARK-39211
> URL: https://issues.apache.org/jira/browse/SPARK-39211
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39211) Support JSON file scans with default values

2022-05-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538338#comment-17538338
 ] 

Apache Spark commented on SPARK-39211:
--

User 'dtenedor' has created a pull request for this issue:
https://github.com/apache/spark/pull/36583

> Support JSON file scans with default values
> ---
>
> Key: SPARK-39211
> URL: https://issues.apache.org/jira/browse/SPARK-39211
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39211) Support JSON file scans with default values

2022-05-17 Thread Daniel (Jira)
Daniel created SPARK-39211:
--

 Summary: Support JSON file scans with default values
 Key: SPARK-39211
 URL: https://issues.apache.org/jira/browse/SPARK-39211
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Daniel






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39210) Provide query context of Decimal overflow in AVG when WSCG is off

2022-05-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538310#comment-17538310
 ] 

Apache Spark commented on SPARK-39210:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36582

> Provide query context of Decimal overflow in AVG when WSCG is off
> -
>
> Key: SPARK-39210
> URL: https://issues.apache.org/jira/browse/SPARK-39210
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39210) Provide query context of Decimal overflow in AVG when WSCG is off

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39210:


Assignee: Gengliang Wang  (was: Apache Spark)

> Provide query context of Decimal overflow in AVG when WSCG is off
> -
>
> Key: SPARK-39210
> URL: https://issues.apache.org/jira/browse/SPARK-39210
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39210) Provide query context of Decimal overflow in AVG when WSCG is off

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39210:


Assignee: Apache Spark  (was: Gengliang Wang)

> Provide query context of Decimal overflow in AVG when WSCG is off
> -
>
> Key: SPARK-39210
> URL: https://issues.apache.org/jira/browse/SPARK-39210
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39054) GroupByTest failed due to axis Length mismatch

2022-05-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538309#comment-17538309
 ] 

Apache Spark commented on SPARK-39054:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36581

> GroupByTest failed due to axis Length mismatch
> --
>
> Key: SPARK-39054
> URL: https://issues.apache.org/jira/browse/SPARK-39054
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>
> {code:java}
> An error occurred while calling o27083.getResult.
> : org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
>   at 
> org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:97)
>   at 
> org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:93)
>   at sun.reflect.GeneratedMethodAccessor91.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>   at py4j.Gateway.invoke(Gateway.java:282)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at 
> py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
>   at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
>   at java.lang.Thread.run(Thread.java:750)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 808.0 failed 1 times, most recent failure: Lost task 0.0 in 
> stage 808.0 (TID 650) (localhost executor driver): 
> org.apache.spark.api.python.PythonException: Traceback (most recent call 
> last):
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 686, 
> in main
> process()
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 678, 
> in process
> serializer.dump_stream(out_iter, outfile)
>   File 
> "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 343, in dump_stream
> return ArrowStreamSerializer.dump_stream(self, 
> init_stream_yield_batches(), stream)
>   File 
> "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 84, in dump_stream
> for batch in iterator:
>   File 
> "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 336, in init_stream_yield_batches
> for series in iterator:
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 487, 
> in mapper
> return f(keys, vals)
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 207, 
> in 
> return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))]
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 185, 
> in wrapped
> result = f(pd.concat(value_series, axis=1))
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 81, in 
> wrapper
> return f(*args, **kwargs)
>   File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1620, in 
> rename_output
> pdf.columns = return_schema.names
>   File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line 
> 5588, in __setattr__
> return object.__setattr__(self, name, value)
>   File "pandas/_libs/properties.pyx", line 70, in 
> pandas._libs.properties.AxisProperty.__set__
>   File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line 
> 769, in _set_axis
> self._mgr.set_axis(axis, labels)
>   File 
> "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/managers.py", 
> line 214, in set_axis
> self._validate_set_axis(axis, new_labels)
>   File 
> "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/base.py", line 
> 69, in _validate_set_axis
> raise ValueError(
> ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 
> elements {code}
>  
> GroupByTest.test_apply_with_new_dataframe_without_shortcut



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39054) GroupByTest failed due to axis Length mismatch

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39054:


Assignee: (was: Apache Spark)

> GroupByTest failed due to axis Length mismatch
> --
>
> Key: SPARK-39054
> URL: https://issues.apache.org/jira/browse/SPARK-39054
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>
> {code:java}
> An error occurred while calling o27083.getResult.
> : org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
>   at 
> org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:97)
>   at 
> org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:93)
>   at sun.reflect.GeneratedMethodAccessor91.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>   at py4j.Gateway.invoke(Gateway.java:282)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at 
> py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
>   at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
>   at java.lang.Thread.run(Thread.java:750)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 808.0 failed 1 times, most recent failure: Lost task 0.0 in 
> stage 808.0 (TID 650) (localhost executor driver): 
> org.apache.spark.api.python.PythonException: Traceback (most recent call 
> last):
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 686, 
> in main
> process()
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 678, 
> in process
> serializer.dump_stream(out_iter, outfile)
>   File 
> "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 343, in dump_stream
> return ArrowStreamSerializer.dump_stream(self, 
> init_stream_yield_batches(), stream)
>   File 
> "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 84, in dump_stream
> for batch in iterator:
>   File 
> "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 336, in init_stream_yield_batches
> for series in iterator:
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 487, 
> in mapper
> return f(keys, vals)
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 207, 
> in 
> return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))]
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 185, 
> in wrapped
> result = f(pd.concat(value_series, axis=1))
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 81, in 
> wrapper
> return f(*args, **kwargs)
>   File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1620, in 
> rename_output
> pdf.columns = return_schema.names
>   File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line 
> 5588, in __setattr__
> return object.__setattr__(self, name, value)
>   File "pandas/_libs/properties.pyx", line 70, in 
> pandas._libs.properties.AxisProperty.__set__
>   File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line 
> 769, in _set_axis
> self._mgr.set_axis(axis, labels)
>   File 
> "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/managers.py", 
> line 214, in set_axis
> self._validate_set_axis(axis, new_labels)
>   File 
> "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/base.py", line 
> 69, in _validate_set_axis
> raise ValueError(
> ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 
> elements {code}
>  
> GroupByTest.test_apply_with_new_dataframe_without_shortcut



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39054) GroupByTest failed due to axis Length mismatch

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39054:


Assignee: Apache Spark

> GroupByTest failed due to axis Length mismatch
> --
>
> Key: SPARK-39054
> URL: https://issues.apache.org/jira/browse/SPARK-39054
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> An error occurred while calling o27083.getResult.
> : org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
>   at 
> org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:97)
>   at 
> org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:93)
>   at sun.reflect.GeneratedMethodAccessor91.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>   at py4j.Gateway.invoke(Gateway.java:282)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at 
> py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
>   at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
>   at java.lang.Thread.run(Thread.java:750)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 808.0 failed 1 times, most recent failure: Lost task 0.0 in 
> stage 808.0 (TID 650) (localhost executor driver): 
> org.apache.spark.api.python.PythonException: Traceback (most recent call 
> last):
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 686, 
> in main
> process()
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 678, 
> in process
> serializer.dump_stream(out_iter, outfile)
>   File 
> "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 343, in dump_stream
> return ArrowStreamSerializer.dump_stream(self, 
> init_stream_yield_batches(), stream)
>   File 
> "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 84, in dump_stream
> for batch in iterator:
>   File 
> "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 336, in init_stream_yield_batches
> for series in iterator:
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 487, 
> in mapper
> return f(keys, vals)
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 207, 
> in 
> return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))]
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 185, 
> in wrapped
> result = f(pd.concat(value_series, axis=1))
>   File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/util.py", line 81, in 
> wrapper
> return f(*args, **kwargs)
>   File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1620, in 
> rename_output
> pdf.columns = return_schema.names
>   File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line 
> 5588, in __setattr__
> return object.__setattr__(self, name, value)
>   File "pandas/_libs/properties.pyx", line 70, in 
> pandas._libs.properties.AxisProperty.__set__
>   File "/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py", line 
> 769, in _set_axis
> self._mgr.set_axis(axis, labels)
>   File 
> "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/managers.py", 
> line 214, in set_axis
> self._validate_set_axis(axis, new_labels)
>   File 
> "/usr/local/lib/python3.9/dist-packages/pandas/core/internals/base.py", line 
> 69, in _validate_set_axis
> raise ValueError(
> ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 
> elements {code}
>  
> GroupByTest.test_apply_with_new_dataframe_without_shortcut



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39210) Provide query context of Decimal overflow in AVG when WSCG is off

2022-05-17 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-39210:
--

 Summary: Provide query context of Decimal overflow in AVG when 
WSCG is off
 Key: SPARK-39210
 URL: https://issues.apache.org/jira/browse/SPARK-39210
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.1
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39208) Fix query context bugs in decimal overflow under codegen mode

2022-05-17 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-39208.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 36577
[https://github.com/apache/spark/pull/36577]

> Fix query context bugs in decimal overflow under codegen mode
> -
>
> Key: SPARK-39208
> URL: https://issues.apache.org/jira/browse/SPARK-39208
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39209) Error occurs when cast a big enough long to timestamp in ANSI mode

2022-05-17 Thread chong (Jira)
chong created SPARK-39209:
-

 Summary: Error occurs when cast  a big enough long to timestamp in 
ANSI mode 
 Key: SPARK-39209
 URL: https://issues.apache.org/jira/browse/SPARK-39209
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
 Environment: Spark 3.3.0
Reporter: chong


 
Got Error when cast a big enough long to a timestamp in ANSI mode, should get 
the max timestamp according to the code in Cast.scala:

 
{code:java}
private[this] def longToTimestamp(t: Long): Long = SECONDS.toMicros(t)

// the logic of SECONDS.toMicros is:
static long x(long d, long m, long over) {     
if (d > Long.MAX_VALUE / 100L) return Long.MAX_VALUE;     
if (d < -(Long.MAX_VALUE / 100L)) return Long.MIN_VALUE;     
return d * m; 
}{code}
 
 

Reproduce steps:
{code:java}
$SPARK_HOME/bin/spark-shell 
import spark.implicits._ val 
df = Seq((Long.MaxValue / 100) + 1).toDF("a") df.selectExpr("cast(a as 
timestamp)").collect()

// the result is right Array[org.apache.spark.sql.Row] = Array([294247-01-10 
12:00:54.775807])
 

import org.apache.spark.sql.types._ 
import org.apache.spark.sql.Row 
val schema = StructType(Array(StructField("a", LongType))) 
val data = Seq(Row((Long.MaxValue / 100) + 1)) 
val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema) 
df.selectExpr("cast(a as timestamp)").collect()
 
// error occurs: 

java.lang.RuntimeException: Error while decoding: 
java.lang.ArithmeticException: long overflow 
createexternalrow(staticinvoke(class 
org.apache.spark.sql.catalyst.util.DateTimeUtils$, ObjectType(class 
java.sql.Timestamp), toJavaTimestamp, input[0, timestamp, true], true, false, 
true), StructField(a,TimestampType,true)) at 
org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1157)
 at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184)
 at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:172)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) 
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) 
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) 
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at 
scala.collection.TraversableLike.map(TraversableLike.scala:286) at 
scala.collection.TraversableLike.map$(TraversableLike.scala:279) at 
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at 
org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3864) at 
org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:3119) at 
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3855) at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
 at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
 at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
 at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3853) at 
org.apache.spark.sql.Dataset.collect(Dataset.scala:3119) ... 55 elided Caused 
by: java.lang.ArithmeticException: long overflow at 
java.lang.Math.multiplyExact(Math.java:892) at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.millisToMicros(DateTimeUtils.scala:240)
 at 
org.apache.spark.sql.catalyst.util.RebaseDateTime$.rebaseGregorianToJulianMicros(RebaseDateTime.scala:370)
 at 
org.apache.spark.sql.catalyst.util.RebaseDateTime$.rebaseGregorianToJulianMicros(RebaseDateTime.scala:390)
 at 
org.apache.spark.sql.catalyst.util.RebaseDateTime$.rebaseGregorianToJulianMicros(RebaseDateTime.scala:411)
 at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.toJavaTimestamp(DateTimeUtils.scala:162)
 at 
org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaTimestamp(DateTimeUtils.scala)
 at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
 Source) at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:181)
 ... 73 more  {code}
 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39102) Replace the usage of guava's Files.createTempDir() with java.nio.file.Files.createTempDirectory()

2022-05-17 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-39102:
-
Issue Type: Improvement  (was: Bug)

> Replace the usage of  guava's Files.createTempDir() with 
> java.nio.file.Files.createTempDirectory()
> --
>
> Key: SPARK-39102
> URL: https://issues.apache.org/jira/browse/SPARK-39102
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0, 3.2.1, 3.4.0
>Reporter: pralabhkumar
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> Hi 
> There are several classes where Spark is using guava's Files.createTempDir() 
> which have vulnerabilities. I think its better to move to 
> java.nio.file.Files.createTempDirectory() for those classes. 
> Classes 
> Java8RDDAPISuite
> JavaAPISuite.java
> RPackageUtilsSuite
> StreamTestHelper
> TestShuffleDataContext
> ExternalBlockHandlerSuite
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39102) Replace the usage of guava's Files.createTempDir() with java.nio.file.Files.createTempDirectory()

2022-05-17 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-39102:


Assignee: Yang Jie

> Replace the usage of  guava's Files.createTempDir() with 
> java.nio.file.Files.createTempDirectory()
> --
>
> Key: SPARK-39102
> URL: https://issues.apache.org/jira/browse/SPARK-39102
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0, 3.2.1, 3.4.0
>Reporter: pralabhkumar
>Assignee: Yang Jie
>Priority: Minor
>
> Hi 
> There are several classes where Spark is using guava's Files.createTempDir() 
> which have vulnerabilities. I think its better to move to 
> java.nio.file.Files.createTempDirectory() for those classes. 
> Classes 
> Java8RDDAPISuite
> JavaAPISuite.java
> RPackageUtilsSuite
> StreamTestHelper
> TestShuffleDataContext
> ExternalBlockHandlerSuite
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39102) Replace the usage of guava's Files.createTempDir() with java.nio.file.Files.createTempDirectory()

2022-05-17 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-39102.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36529
[https://github.com/apache/spark/pull/36529]

> Replace the usage of  guava's Files.createTempDir() with 
> java.nio.file.Files.createTempDirectory()
> --
>
> Key: SPARK-39102
> URL: https://issues.apache.org/jira/browse/SPARK-39102
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0, 3.2.1, 3.4.0
>Reporter: pralabhkumar
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> Hi 
> There are several classes where Spark is using guava's Files.createTempDir() 
> which have vulnerabilities. I think its better to move to 
> java.nio.file.Files.createTempDirectory() for those classes. 
> Classes 
> Java8RDDAPISuite
> JavaAPISuite.java
> RPackageUtilsSuite
> StreamTestHelper
> TestShuffleDataContext
> ExternalBlockHandlerSuite
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39196) Replace getOrElse(null) with orNull

2022-05-17 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-39196:


Assignee: qian

> Replace getOrElse(null) with orNull
> ---
>
> Key: SPARK-39196
> URL: https://issues.apache.org/jira/browse/SPARK-39196
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.3.0
>Reporter: qian
>Assignee: qian
>Priority: Major
>
> Code Simplification. Replace _getOrElse(null)_ with _orNull_



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39196) Replace getOrElse(null) with orNull

2022-05-17 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-39196.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36567
[https://github.com/apache/spark/pull/36567]

> Replace getOrElse(null) with orNull
> ---
>
> Key: SPARK-39196
> URL: https://issues.apache.org/jira/browse/SPARK-39196
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.3.0
>Reporter: qian
>Assignee: qian
>Priority: Major
> Fix For: 3.4.0
>
>
> Code Simplification. Replace _getOrElse(null)_ with _orNull_



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39196) Replace getOrElse(null) with orNull

2022-05-17 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-39196:
-
Priority: Trivial  (was: Major)

> Replace getOrElse(null) with orNull
> ---
>
> Key: SPARK-39196
> URL: https://issues.apache.org/jira/browse/SPARK-39196
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.3.0
>Reporter: qian
>Assignee: qian
>Priority: Trivial
> Fix For: 3.4.0
>
>
> Code Simplification. Replace _getOrElse(null)_ with _orNull_



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39167) Throw an exception w/ an error class for multiple rows from a subquery used as an expression

2022-05-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538196#comment-17538196
 ] 

Apache Spark commented on SPARK-39167:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36580

> Throw an exception w/ an error class for multiple rows from a subquery used 
> as an expression
> 
>
> Key: SPARK-39167
> URL: https://issues.apache.org/jira/browse/SPARK-39167
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Users can trigger an illegal state exception by the SQL statement:
> {code:sql}
> > select (select a from (select 1 as a union all select 2 as a) t) as b
> {code}
> {code:java}
> Caused by: java.lang.IllegalStateException: more than one row returned by a 
> subquery used as an expression:
> Subquery subquery#242, [id=#100]
> +- AdaptiveSparkPlan isFinalPlan=true
>+- == Final Plan ==
>   Union
>   :- *(1) Project [1 AS a#240]
>   :  +- *(1) Scan OneRowRelation[]
>   +- *(2) Project [2 AS a#241]
>  +- *(2) Scan OneRowRelation[]
>+- == Initial Plan ==
>   Union
>   :- Project [1 AS a#240]
>   :  +- Scan OneRowRelation[]
>   +- Project [2 AS a#241]
>  +- Scan OneRowRelation[]
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:83)
> {code}
> but such kind of exceptions are not supposed to be visible to users. Need to 
> introduce an error class (or re-use an existing one), and replace the 
> IllegalStateException.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39167) Throw an exception w/ an error class for multiple rows from a subquery used as an expression

2022-05-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538193#comment-17538193
 ] 

Apache Spark commented on SPARK-39167:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36580

> Throw an exception w/ an error class for multiple rows from a subquery used 
> as an expression
> 
>
> Key: SPARK-39167
> URL: https://issues.apache.org/jira/browse/SPARK-39167
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Users can trigger an illegal state exception by the SQL statement:
> {code:sql}
> > select (select a from (select 1 as a union all select 2 as a) t) as b
> {code}
> {code:java}
> Caused by: java.lang.IllegalStateException: more than one row returned by a 
> subquery used as an expression:
> Subquery subquery#242, [id=#100]
> +- AdaptiveSparkPlan isFinalPlan=true
>+- == Final Plan ==
>   Union
>   :- *(1) Project [1 AS a#240]
>   :  +- *(1) Scan OneRowRelation[]
>   +- *(2) Project [2 AS a#241]
>  +- *(2) Scan OneRowRelation[]
>+- == Initial Plan ==
>   Union
>   :- Project [1 AS a#240]
>   :  +- Scan OneRowRelation[]
>   +- Project [2 AS a#241]
>  +- Scan OneRowRelation[]
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:83)
> {code}
> but such kind of exceptions are not supposed to be visible to users. Need to 
> introduce an error class (or re-use an existing one), and replace the 
> IllegalStateException.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39167) Throw an exception w/ an error class for multiple rows from a subquery used as an expression

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39167:


Assignee: Apache Spark

> Throw an exception w/ an error class for multiple rows from a subquery used 
> as an expression
> 
>
> Key: SPARK-39167
> URL: https://issues.apache.org/jira/browse/SPARK-39167
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Users can trigger an illegal state exception by the SQL statement:
> {code:sql}
> > select (select a from (select 1 as a union all select 2 as a) t) as b
> {code}
> {code:java}
> Caused by: java.lang.IllegalStateException: more than one row returned by a 
> subquery used as an expression:
> Subquery subquery#242, [id=#100]
> +- AdaptiveSparkPlan isFinalPlan=true
>+- == Final Plan ==
>   Union
>   :- *(1) Project [1 AS a#240]
>   :  +- *(1) Scan OneRowRelation[]
>   +- *(2) Project [2 AS a#241]
>  +- *(2) Scan OneRowRelation[]
>+- == Initial Plan ==
>   Union
>   :- Project [1 AS a#240]
>   :  +- Scan OneRowRelation[]
>   +- Project [2 AS a#241]
>  +- Scan OneRowRelation[]
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:83)
> {code}
> but such kind of exceptions are not supposed to be visible to users. Need to 
> introduce an error class (or re-use an existing one), and replace the 
> IllegalStateException.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39167) Throw an exception w/ an error class for multiple rows from a subquery used as an expression

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39167:


Assignee: (was: Apache Spark)

> Throw an exception w/ an error class for multiple rows from a subquery used 
> as an expression
> 
>
> Key: SPARK-39167
> URL: https://issues.apache.org/jira/browse/SPARK-39167
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Users can trigger an illegal state exception by the SQL statement:
> {code:sql}
> > select (select a from (select 1 as a union all select 2 as a) t) as b
> {code}
> {code:java}
> Caused by: java.lang.IllegalStateException: more than one row returned by a 
> subquery used as an expression:
> Subquery subquery#242, [id=#100]
> +- AdaptiveSparkPlan isFinalPlan=true
>+- == Final Plan ==
>   Union
>   :- *(1) Project [1 AS a#240]
>   :  +- *(1) Scan OneRowRelation[]
>   +- *(2) Project [2 AS a#241]
>  +- *(2) Scan OneRowRelation[]
>+- == Initial Plan ==
>   Union
>   :- Project [1 AS a#240]
>   :  +- Scan OneRowRelation[]
>   +- Project [2 AS a#241]
>  +- Scan OneRowRelation[]
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:83)
> {code}
> but such kind of exceptions are not supposed to be visible to users. Need to 
> introduce an error class (or re-use an existing one), and replace the 
> IllegalStateException.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39207) Record SQL text when executing with SparkSession.sql()

2022-05-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538096#comment-17538096
 ] 

Apache Spark commented on SPARK-39207:
--

User 'linhongliu-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/36578

> Record SQL text when executing with SparkSession.sql()
> --
>
> Key: SPARK-39207
> URL: https://issues.apache.org/jira/browse/SPARK-39207
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Linhong Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39207) Record SQL text when executing with SparkSession.sql()

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39207:


Assignee: Apache Spark

> Record SQL text when executing with SparkSession.sql()
> --
>
> Key: SPARK-39207
> URL: https://issues.apache.org/jira/browse/SPARK-39207
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Linhong Liu
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39207) Record SQL text when executing with SparkSession.sql()

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39207:


Assignee: (was: Apache Spark)

> Record SQL text when executing with SparkSession.sql()
> --
>
> Key: SPARK-39207
> URL: https://issues.apache.org/jira/browse/SPARK-39207
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Linhong Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39207) Record SQL text when executing with SparkSession.sql()

2022-05-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538094#comment-17538094
 ] 

Apache Spark commented on SPARK-39207:
--

User 'linhongliu-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/36578

> Record SQL text when executing with SparkSession.sql()
> --
>
> Key: SPARK-39207
> URL: https://issues.apache.org/jira/browse/SPARK-39207
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Linhong Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39208) Fix query context bugs in decimal overflow under codegen mode

2022-05-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538089#comment-17538089
 ] 

Apache Spark commented on SPARK-39208:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36577

> Fix query context bugs in decimal overflow under codegen mode
> -
>
> Key: SPARK-39208
> URL: https://issues.apache.org/jira/browse/SPARK-39208
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39208) Fix query context bugs in decimal overflow under codegen mode

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39208:


Assignee: Gengliang Wang  (was: Apache Spark)

> Fix query context bugs in decimal overflow under codegen mode
> -
>
> Key: SPARK-39208
> URL: https://issues.apache.org/jira/browse/SPARK-39208
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39208) Fix query context bugs in decimal overflow under codegen mode

2022-05-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538088#comment-17538088
 ] 

Apache Spark commented on SPARK-39208:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36577

> Fix query context bugs in decimal overflow under codegen mode
> -
>
> Key: SPARK-39208
> URL: https://issues.apache.org/jira/browse/SPARK-39208
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39208) Fix query context bugs in decimal overflow under codegen mode

2022-05-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39208:


Assignee: Apache Spark  (was: Gengliang Wang)

> Fix query context bugs in decimal overflow under codegen mode
> -
>
> Key: SPARK-39208
> URL: https://issues.apache.org/jira/browse/SPARK-39208
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32268) Bloom Filter Join

2022-05-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538087#comment-17538087
 ] 

Apache Spark commented on SPARK-32268:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/36576

> Bloom Filter Join
> -
>
> Key: SPARK-32268
> URL: https://issues.apache.org/jira/browse/SPARK-32268
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yingyi Bu
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: q16-bloom-filter.jpg, q16-default.jpg
>
>
> We can improve the performance of some joins by pre-filtering one side of a 
> join using a Bloom filter and IN predicate generated from the values from the 
> other side of the join.
>  For 
> example:[tpcds/q16.sql|https://github.com/apache/spark/blob/a78d6ce376edf2a8836e01f47b9dff5371058d4c/sql/core/src/test/resources/tpcds/q16.sql].
>  [Before this 
> optimization|https://issues.apache.org/jira/secure/attachment/13007418/q16-default.jpg].
>  [After this 
> optimization|https://issues.apache.org/jira/secure/attachment/13007416/q16-bloom-filter.jpg].
> *Query Performance Benchmarks: TPC-DS Performance Evaluation*
>  Our setup for running TPC-DS benchmark was as follows: TPC-DS 5T and 
> Partitioned Parquet table
>  
> |Query|Default(Seconds)|Enable Bloom Filter Join(Seconds)|
> |tpcds q16|84|46|
> |tpcds q36|29|21|
> |tpcds q57|39|28|
> |tpcds q94|42|34|
> |tpcds q95|306|288|



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32268) Bloom Filter Join

2022-05-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538086#comment-17538086
 ] 

Apache Spark commented on SPARK-32268:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/36576

> Bloom Filter Join
> -
>
> Key: SPARK-32268
> URL: https://issues.apache.org/jira/browse/SPARK-32268
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yingyi Bu
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: q16-bloom-filter.jpg, q16-default.jpg
>
>
> We can improve the performance of some joins by pre-filtering one side of a 
> join using a Bloom filter and IN predicate generated from the values from the 
> other side of the join.
>  For 
> example:[tpcds/q16.sql|https://github.com/apache/spark/blob/a78d6ce376edf2a8836e01f47b9dff5371058d4c/sql/core/src/test/resources/tpcds/q16.sql].
>  [Before this 
> optimization|https://issues.apache.org/jira/secure/attachment/13007418/q16-default.jpg].
>  [After this 
> optimization|https://issues.apache.org/jira/secure/attachment/13007416/q16-bloom-filter.jpg].
> *Query Performance Benchmarks: TPC-DS Performance Evaluation*
>  Our setup for running TPC-DS benchmark was as follows: TPC-DS 5T and 
> Partitioned Parquet table
>  
> |Query|Default(Seconds)|Enable Bloom Filter Join(Seconds)|
> |tpcds q16|84|46|
> |tpcds q36|29|21|
> |tpcds q57|39|28|
> |tpcds q94|42|34|
> |tpcds q95|306|288|



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39208) Fix query context bugs in decimal overflow under codegen mode

2022-05-17 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-39208:
--

 Summary: Fix query context bugs in decimal overflow under codegen 
mode
 Key: SPARK-39208
 URL: https://issues.apache.org/jira/browse/SPARK-39208
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.1
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38255) Enable a callable in pyspark.pandas.DataFrame.loc

2022-05-17 Thread chandan singh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538067#comment-17538067
 ] 

chandan singh commented on SPARK-38255:
---

Hi,

Following is the example in pandas doc 
[https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html]

Callable that returns a boolean Series
{code:java}
>>> df.loc[lambda df: df['shield'] == 8]  
   max_speed shield 
sidewinder 7  8{code}
 

Below is a toy code example:

 
{code:java}
import pandas as pd
df = pd.DataFrame({"a":[1,2,3,4,5],"b":[4,5,6,6,8]})
def even_index(x):
  return list(map(lambda x:x%2 == 0, df.index.values))
  
df.loc[lambda x:even_index(x)]
{code}
 

 

> Enable a callable in pyspark.pandas.DataFrame.loc
> -
>
> Key: SPARK-38255
> URL: https://issues.apache.org/jira/browse/SPARK-38255
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Kyle Gilde
>Priority: Minor
>
> Hi,
> I was hoping that you would enable a callable to be used in the 
> pyspark.pandas.DataFrame.loc method.
> I use a lambda function in loc all the time in my pandas code, and I was 
> hoping to be able to use most of my pandas code with your new pandas API.
>  
> Thank you!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39207) Record SQL text when executing with SparkSession.sql()

2022-05-17 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-39207:
---

 Summary: Record SQL text when executing with SparkSession.sql()
 Key: SPARK-39207
 URL: https://issues.apache.org/jira/browse/SPARK-39207
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Linhong Liu






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37197) Behaviour inconsistency between pandas and pandas API on Spark

2022-05-17 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang updated SPARK-37197:

Component/s: Pandas API on Spark

> Behaviour inconsistency between pandas and pandas API on Spark
> --
>
> Key: SPARK-37197
> URL: https://issues.apache.org/jira/browse/SPARK-37197
> Project: Spark
>  Issue Type: Umbrella
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 3.2.0
>Reporter: Chuck Connell
>Priority: Major
>
> This JIRA includes tickets on inconsistent behaviors pandas and pandas API on 
> Spark



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38819) Run Pandas on Spark with Pandas 1.4.x

2022-05-17 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang updated SPARK-38819:

Component/s: Pandas API on Spark

> Run Pandas on Spark with Pandas 1.4.x
> -
>
> Key: SPARK-38819
> URL: https://issues.apache.org/jira/browse/SPARK-38819
> Project: Spark
>  Issue Type: Umbrella
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>
> This is a umbrella to track issues when pandas upgrade to 1.4.x
>  
> I disable the fast-failed in test, 19 failed:
> [https://github.com/Yikun/spark/pull/88/checks?check_run_id=5873627048]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39199) Implement pandas API missing parameters

2022-05-17 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang updated SPARK-39199:

Component/s: Pandas API on Spark

> Implement pandas API missing parameters
> ---
>
> Key: SPARK-39199
> URL: https://issues.apache.org/jira/browse/SPARK-39199
> Project: Spark
>  Issue Type: Umbrella
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 3.3.0, 3.4.0, 3.3.1
>Reporter: Xinrong Meng
>Priority: Major
>
> pandas API on Spark aims to achieve full pandas API coverage. Currently, most 
> pandas functions are supported in pandas API on Spark with parameters missing.
> There are some common parameters missing:
> - how to do with NAs: `skipna`, `dropna`
> - filter data types: `numeric_only`, `bool_only`
> - filter result length: `keep`
> - reindex result: `ignore_index`
> They support common use cases and should be prioritized.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org