[jira] [Created] (SPARK-45108) Improve the InjectRuntimeFilter for check probably shuffle

2023-09-08 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-45108:
--

 Summary: Improve the InjectRuntimeFilter for check probably shuffle
 Key: SPARK-45108
 URL: https://issues.apache.org/jira/browse/SPARK-45108
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: jiaan.geng


InjectRuntimeFilter needs to check probably shuffle. But the current code may 
lead to duplicate call of isProbablyShuffleJoin if we need the right side of 
Join node as the application side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44967) Unit should be considered first before using Boolean for TreeNodeTag

2023-08-25 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44967:
--

 Summary: Unit should be considered first before using Boolean for 
TreeNodeTag
 Key: SPARK-44967
 URL: https://issues.apache.org/jira/browse/SPARK-44967
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng


Currently, there are a lot of TreeNodeTag[Boolean] defined.
In fact, we don't require the boolean value boxed into TreeNodeTag, just want 
know it as a flag.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44781) Runtime filter should supports reuse exchange if it can reduce the data size of application side

2023-08-11 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753458#comment-17753458
 ] 

jiaan.geng commented on SPARK-44781:


I'm working on.

> Runtime filter should supports reuse exchange if it can reduce the data size 
> of application side
> 
>
> Key: SPARK-44781
> URL: https://issues.apache.org/jira/browse/SPARK-44781
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark runtime filter only supports using the subquery on one table.
> In fact, we can reuse the exchange, even if it is a shuffle exchange.
> If the shuffle exchange come from a join which has one side with selective 
> predicates, so the results of the join can be used to prune the data amount 
> of the application side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44781) Runtime filter should supports reuse exchange if it can reduce the data size of application side

2023-08-11 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44781:
--

 Summary: Runtime filter should supports reuse exchange if it can 
reduce the data size of application side
 Key: SPARK-44781
 URL: https://issues.apache.org/jira/browse/SPARK-44781
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 4.0.0
Reporter: jiaan.geng


Currently, Spark runtime filter only supports using the subquery on one table.
In fact, we can reuse the exchange, even if it is a shuffle exchange.
If the shuffle exchange come from a join which has one side with selective 
predicates, so the results of the join can be used to prune the data amount of 
the application side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44649) Runtime Filter supports passing equivalent creation side expressions

2023-08-02 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44649:
--

 Summary: Runtime Filter supports passing equivalent creation side 
expressions
 Key: SPARK-44649
 URL: https://issues.apache.org/jira/browse/SPARK-44649
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 4.0.0
Reporter: jiaan.geng



{code:java}
SELECT
  d_year,
  i_brand_id,
  i_class_id,
  i_category_id,
  i_manufact_id,
  cs_quantity - COALESCE(cr_return_quantity, 0) AS sales_cnt,
  cs_ext_sales_price - COALESCE(cr_return_amount, 0.0) AS sales_amt
FROM catalog_sales
  JOIN item ON i_item_sk = cs_item_sk
  JOIN date_dim ON d_date_sk = cs_sold_date_sk
  LEFT JOIN catalog_returns ON (cs_order_number = cr_order_number
AND cs_item_sk = cr_item_sk)
WHERE i_category = 'Books'
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38852) Better Data Source V2 operator pushdown framework

2023-07-30 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-38852:
---
Description: 
Currently, Spark supports push down Filters and Aggregates to data source.
However, the Data Source V2 operator pushdown framework has the following 
shortcomings:

# Only simple filter and aggregate are supported, which makes it impossible to 
apply in most scenarios
# The incompatibility of SQL syntax makes it impossible to apply in most 
scenarios
# Aggregate push down does not support multiple partitions of data sources
# Spark's additional aggregate will cause some overhead
# Limit push down is not supported
# Top n push down is not supported
# Aggregate push down does not support group by expressions
# Aggregate push down does not support not use aggregate functions
# Offset push down is not supported
# Paging push down is not supported
# UDF/UDAF push down is not supported

  was:
Currently, Spark supports push down Filters and Aggregates to data source.
However, the Data Source V2 operator pushdown framework has the following 
shortcomings:

# Only simple filter and aggregate are supported, which makes it impossible to 
apply in most scenarios
# The incompatibility of SQL syntax makes it impossible to apply in most 
scenarios
# Aggregate push down does not support multiple partitions of data sources
# Spark's additional aggregate will cause some overhead
# Limit push down is not supported
# Top n push down is not supported
# Aggregate push down does not support group by expressions
# Aggregate push down does not support not use aggregate functions
# Offset push down is not supported
# Paging push down is not supported


> Better Data Source V2 operator pushdown framework
> -
>
> Key: SPARK-38852
> URL: https://issues.apache.org/jira/browse/SPARK-38852
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark supports push down Filters and Aggregates to data source.
> However, the Data Source V2 operator pushdown framework has the following 
> shortcomings:
> # Only simple filter and aggregate are supported, which makes it impossible 
> to apply in most scenarios
> # The incompatibility of SQL syntax makes it impossible to apply in most 
> scenarios
> # Aggregate push down does not support multiple partitions of data sources
> # Spark's additional aggregate will cause some overhead
> # Limit push down is not supported
> # Top n push down is not supported
> # Aggregate push down does not support group by expressions
> # Aggregate push down does not support not use aggregate functions
> # Offset push down is not supported
> # Paging push down is not supported
> # UDF/UDAF push down is not supported



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44571) Eliminate the Join by combine multiple Aggregates

2023-07-27 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-44571:
---
Summary: Eliminate the Join by combine multiple Aggregates  (was: Eliminate 
the Join by Combine multiple Aggregates)

> Eliminate the Join by combine multiple Aggregates
> -
>
> Key: SPARK-44571
> URL: https://issues.apache.org/jira/browse/SPARK-44571
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> Recently, I investigate the test case q28 which is belong to the TPC-DS 
> queries.
> The query contains multiple scalar subquery with aggregation and connected 
> with inner join.
> If we can merge the filters and aggregates, we can scan data source only once 
> and eliminate the join so as avoid shuffle. Obviously, this change will 
> improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44571) Eliminate the Join by Combine multiple Aggregates

2023-07-27 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748084#comment-17748084
 ] 

jiaan.geng commented on SPARK-44571:


I'm working on.

> Eliminate the Join by Combine multiple Aggregates
> -
>
> Key: SPARK-44571
> URL: https://issues.apache.org/jira/browse/SPARK-44571
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> Recently, I investigate the test case q28 which is belong to the TPC-DS 
> queries.
> The query contains multiple scalar subquery with aggregation and connected 
> with inner join.
> If we can merge the filters and aggregates, we can scan data source only once 
> and eliminate the join so as avoid shuffle. Obviously, this change will 
> improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44571) Eliminate the Join by Combine multiple Aggregates

2023-07-27 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44571:
--

 Summary: Eliminate the Join by Combine multiple Aggregates
 Key: SPARK-44571
 URL: https://issues.apache.org/jira/browse/SPARK-44571
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng


Recently, I investigate the test case q28 which is belong to the TPC-DS queries.

The query contains multiple scalar subquery with aggregation and connected with 
inner join.
If we can merge the filters and aggregates, we can scan data source only once 
and eliminate the join so as avoid shuffle. Obviously, this change will improve 
the performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44371) Define the computing logic through PartitionEvaluator API and use it in CollectLimitExec, CollectTailExec, LocalLimitExec and GlobalLimitExec

2023-07-24 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng resolved SPARK-44371.

Resolution: Won't Fix

> Define the computing logic through PartitionEvaluator API and use it in 
> CollectLimitExec, CollectTailExec, LocalLimitExec and GlobalLimitExec
> -
>
> Key: SPARK-44371
> URL: https://issues.apache.org/jira/browse/SPARK-44371
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-44371) Define the computing logic through PartitionEvaluator API and use it in CollectLimitExec, CollectTailExec, LocalLimitExec and GlobalLimitExec

2023-07-24 Thread jiaan.geng (Jira)


[ https://issues.apache.org/jira/browse/SPARK-44371 ]


jiaan.geng deleted comment on SPARK-44371:


was (Author: beliefer):
I'm working on.

> Define the computing logic through PartitionEvaluator API and use it in 
> CollectLimitExec, CollectTailExec, LocalLimitExec and GlobalLimitExec
> -
>
> Key: SPARK-44371
> URL: https://issues.apache.org/jira/browse/SPARK-44371
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44371) Define the computing logic through PartitionEvaluator API and use it in CollectLimitExec, CollectTailExec, LocalLimitExec and GlobalLimitExec

2023-07-24 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746248#comment-17746248
 ] 

jiaan.geng commented on SPARK-44371:


[~cloud_fan] and I discussed offline, we doesn't need do the change.

> Define the computing logic through PartitionEvaluator API and use it in 
> CollectLimitExec, CollectTailExec, LocalLimitExec and GlobalLimitExec
> -
>
> Key: SPARK-44371
> URL: https://issues.apache.org/jira/browse/SPARK-44371
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44519) SparkConnectServerUtils generated incorrect parameters for jars

2023-07-23 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44519:
--

 Summary: SparkConnectServerUtils generated incorrect parameters 
for jars
 Key: SPARK-44519
 URL: https://issues.apache.org/jira/browse/SPARK-44519
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng


SparkConnectServerUtils generate multiple --jars parameters. It will cause the 
bug that doesn't find out the class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44362) Use PartitionEvaluator API in AggregateInPandasExec,EvalPythonExec,AttachDistributedSequenceExec

2023-07-11 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1774#comment-1774
 ] 

jiaan.geng commented on SPARK-44362:


Thank you.

> Use  PartitionEvaluator API in 
> AggregateInPandasExec,EvalPythonExec,AttachDistributedSequenceExec
> -
>
> Key: SPARK-44362
> URL: https://issues.apache.org/jira/browse/SPARK-44362
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Use  PartitionEvaluator API in
> AggregateInPandasExec
> EvalPythonExec
> AttachDistributedSequenceExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44341) Define the computing logic through PartitionEvaluator API and use it in WindowExec and WindowInPandasExec

2023-07-11 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-44341:
---
Summary: Define the computing logic through PartitionEvaluator API and use 
it in WindowExec and WindowInPandasExec  (was: Define the computing logic 
through PartitionEvaluator API and use it in WindowExec)

> Define the computing logic through PartitionEvaluator API and use it in 
> WindowExec and WindowInPandasExec
> -
>
> Key: SPARK-44341
> URL: https://issues.apache.org/jira/browse/SPARK-44341
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in 
> WindowExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44362) Use PartitionEvaluator API in AggregateInPandasExec, WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec

2023-07-11 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741921#comment-17741921
 ] 

jiaan.geng commented on SPARK-44362:


[~vinodkc] Because WindowInPandasExec related to WindowExec, Could I finish 
them together ?

> Use  PartitionEvaluator API in AggregateInPandasExec, 
> WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec
> -
>
> Key: SPARK-44362
> URL: https://issues.apache.org/jira/browse/SPARK-44362
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Use  PartitionEvaluator API in
> AggregateInPandasExec
> WindowInPandasExec
> EvalPythonExec
> AttachDistributedSequenceExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44371) Define the computing logic through PartitionEvaluator API and use it in CollectLimitExec, CollectTailExec, LocalLimitExec and GlobalLimitExec

2023-07-11 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741882#comment-17741882
 ] 

jiaan.geng commented on SPARK-44371:


I'm working on.

> Define the computing logic through PartitionEvaluator API and use it in 
> CollectLimitExec, CollectTailExec, LocalLimitExec and GlobalLimitExec
> -
>
> Key: SPARK-44371
> URL: https://issues.apache.org/jira/browse/SPARK-44371
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44371) Define the computing logic through PartitionEvaluator API and use it in CollectLimitExec, CollectTailExec, LocalLimitExec and GlobalLimitExec

2023-07-11 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44371:
--

 Summary: Define the computing logic through PartitionEvaluator API 
and use it in CollectLimitExec, CollectTailExec, LocalLimitExec and 
GlobalLimitExec
 Key: SPARK-44371
 URL: https://issues.apache.org/jira/browse/SPARK-44371
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44342) Replace SQLContext with SparkSession for GenTPCDSData

2023-07-08 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-44342:
---
Description: 
The SQLContext is an old API for Spark SQL.
But 

> Replace SQLContext with SparkSession for GenTPCDSData
> -
>
> Key: SPARK-44342
> URL: https://issues.apache.org/jira/browse/SPARK-44342
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> The SQLContext is an old API for Spark SQL.
> But 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44342) Replace SQLContext with SparkSession for GenTPCDSData

2023-07-08 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-44342:
---
Description: 
The SQLContext is an old API for Spark SQL.
But GenTPCDSData still use it directly.

  was:
The SQLContext is an old API for Spark SQL.
But 


> Replace SQLContext with SparkSession for GenTPCDSData
> -
>
> Key: SPARK-44342
> URL: https://issues.apache.org/jira/browse/SPARK-44342
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> The SQLContext is an old API for Spark SQL.
> But GenTPCDSData still use it directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44342) Replace SQLContext with SparkSession for GenTPCDSData

2023-07-08 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44342:
--

 Summary: Replace SQLContext with SparkSession for GenTPCDSData
 Key: SPARK-44342
 URL: https://issues.apache.org/jira/browse/SPARK-44342
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44341) Define the computing logic through PartitionEvaluator API and use it in WindowExec

2023-07-08 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741251#comment-17741251
 ] 

jiaan.geng commented on SPARK-44341:


I'm working on.

> Define the computing logic through PartitionEvaluator API and use it in 
> WindowExec
> --
>
> Key: SPARK-44341
> URL: https://issues.apache.org/jira/browse/SPARK-44341
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in 
> WindowExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44341) Define the computing logic through PartitionEvaluator API and use it in WindowExec

2023-07-08 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44341:
--

 Summary: Define the computing logic through PartitionEvaluator API 
and use it in WindowExec
 Key: SPARK-44341
 URL: https://issues.apache.org/jira/browse/SPARK-44341
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng


Define the computing logic through PartitionEvaluator API and use it in 
WindowExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44340) Define the computing logic through PartitionEvaluator API and use it in WindowGroupLimitExec

2023-07-07 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741242#comment-17741242
 ] 

jiaan.geng commented on SPARK-44340:


I'm working on.

> Define the computing logic through PartitionEvaluator API and use it in 
> WindowGroupLimitExec
> 
>
> Key: SPARK-44340
> URL: https://issues.apache.org/jira/browse/SPARK-44340
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44340) Define the computing logic through PartitionEvaluator API and use it in WindowGroupLimitExec

2023-07-07 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-44340:
---
Description: Define the computing logic through PartitionEvaluator API and 
use it in WindowGroupLimitExec

> Define the computing logic through PartitionEvaluator API and use it in 
> WindowGroupLimitExec
> 
>
> Key: SPARK-44340
> URL: https://issues.apache.org/jira/browse/SPARK-44340
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in 
> WindowGroupLimitExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44340) Define the computing logic through PartitionEvaluator API and use it in WindowGroupLimitExec

2023-07-07 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44340:
--

 Summary: Define the computing logic through PartitionEvaluator API 
and use it in WindowGroupLimitExec
 Key: SPARK-44340
 URL: https://issues.apache.org/jira/browse/SPARK-44340
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44328) Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328]

2023-07-06 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-44328:
---
Summary: Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328]  
(was: Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2329])

> Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328]
> --
>
> Key: SPARK-44328
> URL: https://issues.apache.org/jira/browse/SPARK-44328
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44328) Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2329]

2023-07-06 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44328:
--

 Summary: Assign names to the error class 
_LEGACY_ERROR_TEMP_[2325-2329]
 Key: SPARK-44328
 URL: https://issues.apache.org/jira/browse/SPARK-44328
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44303) Assign names to the error class _LEGACY_ERROR_TEMP_[2320-2324]

2023-07-04 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44303:
--

 Summary: Assign names to the error class 
_LEGACY_ERROR_TEMP_[2320-2324]
 Key: SPARK-44303
 URL: https://issues.apache.org/jira/browse/SPARK-44303
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44292) Assign names to the error class _LEGACY_ERROR_TEMP_[2315-2319]

2023-07-03 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44292:
--

 Summary: Assign names to the error class 
_LEGACY_ERROR_TEMP_[2315-2319]
 Key: SPARK-44292
 URL: https://issues.apache.org/jira/browse/SPARK-44292
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44269) Assign names to the error class _LEGACY_ERROR_TEMP_[2310-2314]

2023-07-01 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44269:
--

 Summary: Assign names to the error class 
_LEGACY_ERROR_TEMP_[2310-2314]
 Key: SPARK-44269
 URL: https://issues.apache.org/jira/browse/SPARK-44269
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44244) Assign names to the error class _LEGACY_ERROR_TEMP_[2305-2309]

2023-06-29 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44244:
--

 Summary: Assign names to the error class 
_LEGACY_ERROR_TEMP_[2305-2309]
 Key: SPARK-44244
 URL: https://issues.apache.org/jira/browse/SPARK-44244
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44169) Assign names to the error class _LEGACY_ERROR_TEMP_[2300-2304]

2023-06-24 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44169:
--

 Summary: Assign names to the error class 
_LEGACY_ERROR_TEMP_[2300-2304]
 Key: SPARK-44169
 URL: https://issues.apache.org/jira/browse/SPARK-44169
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42740) Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-06-21 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng resolved SPARK-42740.

Resolution: Resolved

> Fix the bug that pushdown offset or paging is invalid for some built-in 
> dialect 
> 
>
> Key: SPARK-42740
> URL: https://issues.apache.org/jira/browse/SPARK-42740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, the default pushdown offset like OFFSET n. But some built-in 
> dialect doesn't support the syntax. So when the Spark pushdown offset into 
> these databases, them  throwing errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42740) Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-06-21 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735688#comment-17735688
 ] 

jiaan.geng commented on SPARK-42740:


Resolved by https://github.com/apache/spark/pull/40359

> Fix the bug that pushdown offset or paging is invalid for some built-in 
> dialect 
> 
>
> Key: SPARK-42740
> URL: https://issues.apache.org/jira/browse/SPARK-42740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, the default pushdown offset like OFFSET n. But some built-in 
> dialect doesn't support the syntax. So when the Spark pushdown offset into 
> these databases, them  throwing errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44131) Add call_function and deprecate call_udf for Scala API

2023-06-20 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44131:
--

 Summary: Add call_function and deprecate call_udf for Scala API
 Key: SPARK-44131
 URL: https://issues.apache.org/jira/browse/SPARK-44131
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng


The scala API for SQL exists a method call_udf used to call the user-defined 
functions.
In fact, call_udf also could call the builtin functions.
The behavior is confused for users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-43929) Add date time functions to Scala and Python - part 1

2023-06-20 Thread jiaan.geng (Jira)


[ https://issues.apache.org/jira/browse/SPARK-43929 ]


jiaan.geng deleted comment on SPARK-43929:


was (Author: beliefer):
I will take over this one.

> Add date time functions to Scala and Python - part 1
> 
>
> Key: SPARK-43929
> URL: https://issues.apache.org/jira/browse/SPARK-43929
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.5.0
>
>
> Add following functions:
> * date_diff
> * date_from_unix_date
> * date_part
> * dateadd
> * datepart
> * day
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44073) Add date time functions to Scala and Python - part 2

2023-06-16 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733476#comment-17733476
 ] 

jiaan.geng commented on SPARK-44073:


I will fix this one.

> Add date time functions to Scala and Python - part 2
> 
>
> Key: SPARK-44073
> URL: https://issues.apache.org/jira/browse/SPARK-44073
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * weekday
> * convert_timezone
> * extract
> * now
> * timestamp_micros
> * timestamp_millis
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43929) Add date time functions to Scala and Python - part 1

2023-06-16 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733474#comment-17733474
 ] 

jiaan.geng commented on SPARK-43929:


I will take over this one.

> Add date time functions to Scala and Python - part 1
> 
>
> Key: SPARK-43929
> URL: https://issues.apache.org/jira/browse/SPARK-43929
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * date_diff
> * date_from_unix_date
> * date_part
> * dateadd
> * datepart
> * day
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43928) Add bit operations to Scala and Python

2023-06-13 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732391#comment-17732391
 ] 

jiaan.geng commented on SPARK-43928:


OK. Let me do it.

> Add bit operations to Scala and Python
> --
>
> Key: SPARK-43928
> URL: https://issues.apache.org/jira/browse/SPARK-43928
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * bit_and
> * bit_count
> * bit_get
> * bit_or
> * bit_xor
> * getbit
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-44018) Improve the hashCode for Some DS V2 Expression

2023-06-13 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731988#comment-17731988
 ] 

jiaan.geng edited comment on SPARK-44018 at 6/13/23 8:38 AM:
-

[~dongjoon]Yes. I have created PR for this. 
https://github.com/apache/spark/pull/41543


was (Author: beliefer):
[~dongjoon]Yes. I have created PR for this.

> Improve the hashCode for Some DS V2 Expression
> --
>
> Key: SPARK-44018
> URL: https://issues.apache.org/jira/browse/SPARK-44018
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> The hashCode() of UserDefinedScalarFunc and GeneralScalarExpression is not 
> good enough. UserDefinedAggregateFunc and GeneralAggregateFunc missing 
> hashCode()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44018) Improve the hashCode for Some DS V2 Expression

2023-06-13 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731988#comment-17731988
 ] 

jiaan.geng commented on SPARK-44018:


[~dongjoon]Yes. I have created PR for this.

> Improve the hashCode for Some DS V2 Expression
> --
>
> Key: SPARK-44018
> URL: https://issues.apache.org/jira/browse/SPARK-44018
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> The hashCode() of UserDefinedScalarFunc and GeneralScalarExpression is not 
> good enough. UserDefinedAggregateFunc and GeneralAggregateFunc missing 
> hashCode()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43915) Assign names to the error class _LEGACY_ERROR_TEMP_[2438-2445]

2023-06-12 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-43915:
---
Summary: Assign names to the error class _LEGACY_ERROR_TEMP_[2438-2445]  
(was: Assign a name to the error class _LEGACY_ERROR_TEMP_2428)

> Assign names to the error class _LEGACY_ERROR_TEMP_[2438-2445]
> --
>
> Key: SPARK-43915
> URL: https://issues.apache.org/jira/browse/SPARK-43915
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44018) Improve the hashCode for Some DS V2 Expression

2023-06-10 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44018:
--

 Summary: Improve the hashCode for Some DS V2 Expression
 Key: SPARK-44018
 URL: https://issues.apache.org/jira/browse/SPARK-44018
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng


The hashCode() of UserDefinedScalarFunc and GeneralScalarExpression is not good 
enough. UserDefinedAggregateFunc and GeneralAggregateFunc missing hashCode()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43941) Add any_value, approx_percentile,count_if,first_value,histogram_numeric,last_value to Scala and Python

2023-06-08 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730551#comment-17730551
 ] 

jiaan.geng commented on SPARK-43941:


Because I involved in the development of some functions show above. So I want 
do the job.

> Add any_value, 
> approx_percentile,count_if,first_value,histogram_numeric,last_value to Scala 
> and Python
> --
>
> Key: SPARK-43941
> URL: https://issues.apache.org/jira/browse/SPARK-43941
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * any_value
> * approx_percentile
> * count_if
> * first_value
> * histogram_numeric
> * last_value
> * reduce
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43925) Add any, some, bool_or,bool_and,every to Scala and Python

2023-06-08 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730550#comment-17730550
 ] 

jiaan.geng commented on SPARK-43925:


Let me do it.

> Add any, some, bool_or,bool_and,every to Scala and Python
> -
>
> Key: SPARK-43925
> URL: https://issues.apache.org/jira/browse/SPARK-43925
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * any
> * some
> * bool_or
> * bool_and
> * every
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43992) Add optional pattern for Catalog.listFunctions

2023-06-07 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-43992:
---
Description: 
Currently, the syntax

{code:java}
SHOW FUNCTIONS LIKE pattern
{code}

supports a optional pattern.
But the Catalog.listFunctions missing the function.
In fact, the optional pattern is very useful.

  was:
Currently, the syntax
SHOW FUNCTIONS LIKE pattern
supports a optional pattern.
But the Catalog.listFunctions missing the function.
In fact, the optional pattern is very useful.


> Add optional pattern for Catalog.listFunctions
> --
>
> Key: SPARK-43992
> URL: https://issues.apache.org/jira/browse/SPARK-43992
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, python, SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, the syntax
> {code:java}
> SHOW FUNCTIONS LIKE pattern
> {code}
> supports a optional pattern.
> But the Catalog.listFunctions missing the function.
> In fact, the optional pattern is very useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41289) Feature parity: Catalog API

2023-06-07 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-41289:
---
Component/s: PySpark
 python
 SQL

> Feature parity: Catalog API
> ---
>
> Key: SPARK-41289
> URL: https://issues.apache.org/jira/browse/SPARK-41289
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect, PySpark, python, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43992) Add optional pattern for Catalog.listFunctions

2023-06-07 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43992:
--

 Summary: Add optional pattern for Catalog.listFunctions
 Key: SPARK-43992
 URL: https://issues.apache.org/jira/browse/SPARK-43992
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark, python, SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng


Currently, the syntax
SHOW FUNCTIONS LIKE pattern
supports a optional pattern.
But the Catalog.listFunctions missing the function.
In fact, the optional pattern is very useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43914) Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437]

2023-06-06 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-43914:
---
Summary: Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437]  
(was: Assign a name to the error class _LEGACY_ERROR_TEMP_2427)

> Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437]
> --
>
> Key: SPARK-43914
> URL: https://issues.apache.org/jira/browse/SPARK-43914
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43933) Add linear regression aggregate functions to Scala and Python

2023-06-05 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-43933:
---
Summary: Add linear regression aggregate functions to Scala and Python  
(was: Add regression aggregate functions to Scala and Python)

> Add linear regression aggregate functions to Scala and Python
> -
>
> Key: SPARK-43933
> URL: https://issues.apache.org/jira/browse/SPARK-43933
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * regr_avgx
> * regr_avgy
> * regr_count
> * regr_intercept
> * regr_r2
> * regr_slope
> * regr_sxx
> * regr_sxy
> * regr_syy
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43933) Add regression aggregate functions to Scala and Python

2023-06-05 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-43933:
---
Summary: Add regression aggregate functions to Scala and Python  (was: Add 
regr_* functions to Scala and Python)

> Add regression aggregate functions to Scala and Python
> --
>
> Key: SPARK-43933
> URL: https://issues.apache.org/jira/browse/SPARK-43933
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * regr_avgx
> * regr_avgy
> * regr_count
> * regr_intercept
> * regr_r2
> * regr_slope
> * regr_sxx
> * regr_sxy
> * regr_syy
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43961) Add optional pattern for Catalog.listTables

2023-06-04 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43961:
--

 Summary: Add optional pattern for Catalog.listTables
 Key: SPARK-43961
 URL: https://issues.apache.org/jira/browse/SPARK-43961
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43934) Add regexp_* functions to Scala and Python

2023-06-03 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728955#comment-17728955
 ] 

jiaan.geng commented on SPARK-43934:


Because I involved in the development of regexp_* functions. So I want do the 
job.

> Add regexp_* functions to Scala and Python
> --
>
> Key: SPARK-43934
> URL: https://issues.apache.org/jira/browse/SPARK-43934
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * regexp
> * regexp_count
> * regexp_extract_all
> * regexp_instr
> * regexp_like
> * regexp_substr
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43916) Add percentile like functions to Scala and Python API

2023-06-03 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-43916:
---
Description: 
Add following functions:
percentile
percentile_cont
percentile_disc
median
to:
Scala API
Python API
Spark Connect Scala Client
Spark Connect Python Client

> Add percentile like functions to Scala and Python API
> -
>
> Key: SPARK-43916
> URL: https://issues.apache.org/jira/browse/SPARK-43916
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, R, SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> Add following functions:
> percentile
> percentile_cont
> percentile_disc
> median
> to:
> Scala API
> Python API
> Spark Connect Scala Client
> Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43933) Add regr_* functions to Scala and Python

2023-06-03 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728954#comment-17728954
 ] 

jiaan.geng commented on SPARK-43933:


Because I involved in the development of regr_* functions. So I want do the job.

> Add regr_* functions to Scala and Python
> 
>
> Key: SPARK-43933
> URL: https://issues.apache.org/jira/browse/SPARK-43933
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * regr_avgx
> * regr_avgy
> * regr_count
> * regr_intercept
> * regr_r2
> * regr_slope
> * regr_sxx
> * regr_sxy
> * regr_syy
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43956) Fix the bug doesn't display column's sql for Percentile[Cont|Disc]

2023-06-02 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728926#comment-17728926
 ] 

jiaan.geng commented on SPARK-43956:


resolved by https://github.com/apache/spark/pull/41436

> Fix the bug doesn't display column's sql for Percentile[Cont|Disc]
> --
>
> Key: SPARK-43956
> URL: https://issues.apache.org/jira/browse/SPARK-43956
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.5.0
>
>
> Last year, I committed Percentile[Cont|Disc] functions for Spark SQL.
> Recently, I found the sql method of Percentile[Cont|Disc] doesn't display 
> column's sql suitably.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43956) Fix the bug doesn't display column's sql for Percentile[Cont|Disc]

2023-06-02 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43956:
--

 Summary: Fix the bug doesn't display column's sql for 
Percentile[Cont|Disc]
 Key: SPARK-43956
 URL: https://issues.apache.org/jira/browse/SPARK-43956
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: jiaan.geng
 Fix For: 3.5.0


Last year, I committed Percentile[Cont|Disc] functions for Spark SQL.
Recently, I found the sql method of Percentile[Cont|Disc] doesn't display 
column's sql suitably.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43916) Add percentile like functions to Scala and Python API

2023-06-02 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-43916:
---
Summary: Add percentile like functions to Scala and Python API  (was: Add 
percentile* to Scala and Python API)

> Add percentile like functions to Scala and Python API
> -
>
> Key: SPARK-43916
> URL: https://issues.apache.org/jira/browse/SPARK-43916
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, R, SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43916) Add percentile* to Scala and Python API

2023-06-02 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-43916:
---
Summary: Add percentile* to Scala and Python API  (was: Add percentile to 
Scala and Python API)

> Add percentile* to Scala and Python API
> ---
>
> Key: SPARK-43916
> URL: https://issues.apache.org/jira/browse/SPARK-43916
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, R, SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43916) Add percentile to Scala and Python API

2023-06-02 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-43916:
---
Summary: Add percentile to Scala and Python API  (was: Add percentile to 
Scala, Python and R API)

> Add percentile to Scala and Python API
> --
>
> Key: SPARK-43916
> URL: https://issues.apache.org/jira/browse/SPARK-43916
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, R, SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43907) Add SQL functions into Scala, Python and R API

2023-06-02 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728598#comment-17728598
 ] 

jiaan.geng commented on SPARK-43907:


[~gurwls223]Thank you for your feedback.

> Add SQL functions into Scala, Python and R API
> --
>
> Key: SPARK-43907
> URL: https://issues.apache.org/jira/browse/SPARK-43907
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SparkR, SQL
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> See the discussion in dev mailing list 
> (https://lists.apache.org/thread/0tdcfyzxzcv8w46qbgwys2rormhdgyqg).
> This is an umbrella JIRA to implement all SQL functions in Scala, Python and R



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43916) Add percentile to Scala, Python and R API

2023-06-01 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728345#comment-17728345
 ] 

jiaan.geng commented on SPARK-43916:


Because I involved in the development of percentile. So I want do the job.

> Add percentile to Scala, Python and R API
> -
>
> Key: SPARK-43916
> URL: https://issues.apache.org/jira/browse/SPARK-43916
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, R, SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43907) Add SQL functions into Scala, Python and R API

2023-06-01 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728344#comment-17728344
 ] 

jiaan.geng commented on SPARK-43907:


[~gurwls223]Do we really start this job ?

> Add SQL functions into Scala, Python and R API
> --
>
> Key: SPARK-43907
> URL: https://issues.apache.org/jira/browse/SPARK-43907
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SparkR, SQL
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> See the discussion in dev mailing list 
> (https://lists.apache.org/thread/0tdcfyzxzcv8w46qbgwys2rormhdgyqg).
> This is an umbrella JIRA to implement all SQL functions in Scala, Python and R



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43916) Add percentile to Scala, Python and R API

2023-06-01 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-43916:
---
Summary: Add percentile to Scala, Python and R API  (was: Add percentile to 
Scala, Python, R API)

> Add percentile to Scala, Python and R API
> -
>
> Key: SPARK-43916
> URL: https://issues.apache.org/jira/browse/SPARK-43916
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, R, SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43916) Add percentile to Scala, Python, R API

2023-06-01 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43916:
--

 Summary: Add percentile to Scala, Python, R API
 Key: SPARK-43916
 URL: https://issues.apache.org/jira/browse/SPARK-43916
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, R, SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43913) Assign names to the error class _LEGACY_ERROR_TEMP_[2426-2432]

2023-06-01 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-43913:
---
Summary: Assign names to the error class _LEGACY_ERROR_TEMP_[2426-2432]  
(was: Assign a name to the error class _LEGACY_ERROR_TEMP_[2426-2432])

> Assign names to the error class _LEGACY_ERROR_TEMP_[2426-2432]
> --
>
> Key: SPARK-43913
> URL: https://issues.apache.org/jira/browse/SPARK-43913
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43913) Assign a name to the error class _LEGACY_ERROR_TEMP_[2426-2432]

2023-06-01 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-43913:
---
Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_[2426-2432]  
(was: Assign a name to the error class _LEGACY_ERROR_TEMP_2426)

> Assign a name to the error class _LEGACY_ERROR_TEMP_[2426-2432]
> ---
>
> Key: SPARK-43913
> URL: https://issues.apache.org/jira/browse/SPARK-43913
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37935) Migrate onto error classes

2023-06-01 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728312#comment-17728312
 ] 

jiaan.geng commented on SPARK-37935:


[~maxgekk]I agree 3+ error classes per PR. I will create a single issue for 
these error classes.

> Migrate onto error classes
> --
>
> Key: SPARK-37935
> URL: https://issues.apache.org/jira/browse/SPARK-37935
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.5.0
>
>
> The PR https://github.com/apache/spark/pull/32850 introduced error classes as 
> a part of the error messages framework 
> (https://issues.apache.org/jira/browse/SPARK-33539). Need to migrate all 
> exceptions from QueryExecutionErrors, QueryCompilationErrors and 
> QueryParsingErrors on the error classes using instances of SparkThrowable, 
> and carefully test every error class by writing tests in dedicated test 
> suites:
> *  QueryExecutionErrorsSuite for the errors that are occurred during query 
> execution
> * QueryCompilationErrorsSuite ... query compilation or eagerly executing 
> commands
> * QueryParsingErrorsSuite ... parsing errors
> Here is an example https://github.com/apache/spark/pull/35157 of how an 
> existing Java exception can be replaced, and testing of related error 
> classes.At the end, we should migrate all exceptions from the files 
> Query.*Errors.scala and cover all error classes from the error-classes.json 
> file by tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43914) Assign a name to the error class _LEGACY_ERROR_TEMP_2427

2023-06-01 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43914:
--

 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2427
 Key: SPARK-43914
 URL: https://issues.apache.org/jira/browse/SPARK-43914
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43915) Assign a name to the error class _LEGACY_ERROR_TEMP_2428

2023-06-01 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43915:
--

 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2428
 Key: SPARK-43915
 URL: https://issues.apache.org/jira/browse/SPARK-43915
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43913) Assign a name to the error class _LEGACY_ERROR_TEMP_2426

2023-06-01 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43913:
--

 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2426
 Key: SPARK-43913
 URL: https://issues.apache.org/jira/browse/SPARK-43913
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43879) Decouple handle command and send response on server side

2023-05-29 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43879:
--

 Summary: Decouple handle command and send response on server side
 Key: SPARK-43879
 URL: https://issues.apache.org/jira/browse/SPARK-43879
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.5.0
Reporter: jiaan.geng


SparkConnectStreamHandler treat the request from connect client and send the 
response back to connect client. SparkConnectStreamHandler hold a component 
StreamObserver which is used to send response.
So I think we should keep the StreamObserver could be accessed only with 
SparkConnectStreamHandler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43856) Assign a name to the error class _LEGACY_ERROR_TEMP_2425

2023-05-28 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43856:
--

 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2425
 Key: SPARK-43856
 URL: https://issues.apache.org/jira/browse/SPARK-43856
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43855) Assign a name to the error class _LEGACY_ERROR_TEMP_2423

2023-05-28 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43855:
--

 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2423
 Key: SPARK-43855
 URL: https://issues.apache.org/jira/browse/SPARK-43855
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43854) Assign a name to the error class _LEGACY_ERROR_TEMP_2421

2023-05-28 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43854:
--

 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2421
 Key: SPARK-43854
 URL: https://issues.apache.org/jira/browse/SPARK-43854
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43852) Assign a name to the error class _LEGACY_ERROR_TEMP_2418

2023-05-28 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43852:
--

 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2418
 Key: SPARK-43852
 URL: https://issues.apache.org/jira/browse/SPARK-43852
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43853) Assign a name to the error class _LEGACY_ERROR_TEMP_2419

2023-05-28 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43853:
--

 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2419
 Key: SPARK-43853
 URL: https://issues.apache.org/jira/browse/SPARK-43853
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43829) Improve SparkConnectPlanner by reuse Dataset and avoid construct new Dataset

2023-05-27 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43829:
--

 Summary: Improve SparkConnectPlanner by reuse Dataset and avoid 
construct new Dataset
 Key: SPARK-43829
 URL: https://issues.apache.org/jira/browse/SPARK-43829
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: jiaan.geng


Currently, SparkConnectPlanner.transformRelation always return the LogicalPlan 
of Dataset.
The SparkConnectStreamHandler.handlePlan constructs a new Dataset for it.
Sometimes, SparkConnectStreamHandler.handlePlan could reuse the Dataset created 
by SparkConnectPlanner.transformRelation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43795) Remove parameters not used for SparkConnectPlanner

2023-05-27 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-43795:
---
Issue Type: Improvement  (was: Documentation)

> Remove parameters not used for SparkConnectPlanner
> --
>
> Key: SPARK-43795
> URL: https://issues.apache.org/jira/browse/SPARK-43795
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.5.0
>
>
> Currently, SparkConnectPlanner have some method exists parameter not used at 
> all!
> For example, catalog.getCurrentCatalog not carry any useful parameters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43827) Assign a name to the error class _LEGACY_ERROR_TEMP_2417

2023-05-26 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43827:
--

 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2417
 Key: SPARK-43827
 URL: https://issues.apache.org/jira/browse/SPARK-43827
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43826) Assign a name to the error class _LEGACY_ERROR_TEMP_2416

2023-05-26 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43826:
--

 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2416
 Key: SPARK-43826
 URL: https://issues.apache.org/jira/browse/SPARK-43826
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43823) Assign a name to the error class _LEGACY_ERROR_TEMP_2414

2023-05-26 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43823:
--

 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2414
 Key: SPARK-43823
 URL: https://issues.apache.org/jira/browse/SPARK-43823
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43822) Assign a name to the error class _LEGACY_ERROR_TEMP_2413

2023-05-26 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43822:
--

 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2413
 Key: SPARK-43822
 URL: https://issues.apache.org/jira/browse/SPARK-43822
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43820) Assign a name to the error class _LEGACY_ERROR_TEMP_2411

2023-05-26 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43820:
--

 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2411
 Key: SPARK-43820
 URL: https://issues.apache.org/jira/browse/SPARK-43820
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43807) Migrate _LEGACY_ERROR_TEMP_1269 to PARTITION_SCHEMA_IS_EMPTY

2023-05-25 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43807:
--

 Summary: Migrate _LEGACY_ERROR_TEMP_1269 to 
PARTITION_SCHEMA_IS_EMPTY
 Key: SPARK-43807
 URL: https://issues.apache.org/jira/browse/SPARK-43807
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43795) Remove parameters not used for SparkConnectPlanner

2023-05-25 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43795:
--

 Summary: Remove parameters not used for SparkConnectPlanner
 Key: SPARK-43795
 URL: https://issues.apache.org/jira/browse/SPARK-43795
 Project: Spark
  Issue Type: Documentation
  Components: Connect
Affects Versions: 3.5.0
Reporter: jiaan.geng


Currently, SparkConnectPlanner have some method exists parameter not used at 
all!
For example, catalog.getCurrentCatalog not carry any useful parameters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43792) Add optional pattern for Catalog.listCatalogs

2023-05-25 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-43792:
---
Parent: SPARK-41289
Issue Type: Sub-task  (was: Documentation)

> Add optional pattern for Catalog.listCatalogs
> -
>
> Key: SPARK-43792
> URL: https://issues.apache.org/jira/browse/SPARK-43792
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, python, SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, the syntax 
> {code:java}
> SHOW CATALOGS LIKE pattern
> {code}
>  supports a optional pattern.
> But the catalog.listCatalogs missing the function.
> In fact, the optional pattern is very useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43792) Add optional pattern for Catalog.listCatalogs

2023-05-25 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-43792:
---
Affects Version/s: 3.5.0
   (was: 3.4.1)

> Add optional pattern for Catalog.listCatalogs
> -
>
> Key: SPARK-43792
> URL: https://issues.apache.org/jira/browse/SPARK-43792
> Project: Spark
>  Issue Type: Documentation
>  Components: Connect, PySpark, python, SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, the syntax 
> {code:java}
> SHOW CATALOGS LIKE pattern
> {code}
>  supports a optional pattern.
> But the catalog.listCatalogs missing the function.
> In fact, the optional pattern is very useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43792) Add optional pattern for Catalog.listCatalogs

2023-05-25 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43792:
--

 Summary: Add optional pattern for Catalog.listCatalogs
 Key: SPARK-43792
 URL: https://issues.apache.org/jira/browse/SPARK-43792
 Project: Spark
  Issue Type: Documentation
  Components: Connect, PySpark, python, SQL
Affects Versions: 3.4.1
Reporter: jiaan.geng


Currently, the syntax 
{code:java}
SHOW CATALOGS LIKE pattern
{code}
 supports a optional pattern.
But the catalog.listCatalogs missing the function.
In fact, the optional pattern is very useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43787) Improve the method signature by given default None for Option

2023-05-24 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng resolved SPARK-43787.

Resolution: Won't Fix

> Improve the method signature by given default None for Option 
> --
>
> Key: SPARK-43787
> URL: https://issues.apache.org/jira/browse/SPARK-43787
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> I find some method uses Option[some type] as the parameters and call the 
> method by pass None.
> We can improve them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43787) Improve the method signature by given default None for Option

2023-05-24 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43787:
--

 Summary: Improve the method signature by given default None for 
Option 
 Key: SPARK-43787
 URL: https://issues.apache.org/jira/browse/SPARK-43787
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng


I find some method uses Option[some type] as the parameters and call the method 
by pass None.
We can improve them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-40586) Decouple plan transformation and validation on server side

2023-05-24 Thread jiaan.geng (Jira)


[ https://issues.apache.org/jira/browse/SPARK-40586 ]


jiaan.geng deleted comment on SPARK-40586:


was (Author: beliefer):
I will take a look!

> Decouple plan transformation and validation on server side 
> ---
>
> Key: SPARK-40586
> URL: https://issues.apache.org/jira/browse/SPARK-40586
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>
> Project connect, from some perspectives, can be thought as replacing the SQL 
> parser to generate a parsed (but the difference that is unresolved) plan, 
> then the plan is passed to the analyzer. This means that connect should also 
> do validation on the proto as there are many in-validate parser cases that 
> analyzer does not expect to see, which potentially could cause problems if 
> connect only pass through the proto (of course have it translated) to 
> analyzer.
> Meanwhile I think this is a good idea to decouple the validation and 
> transformation so that we have two stages:
> stage 1: proto validation. For example validate if necessary fields are 
> populated or not.
> stage 2: transformation, which convert the proto to a plan with assumption 
> that the plan is valid parsed version of the plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43785) Improve the document of GenTPCDSData, so that developers could easy to generate TPCDS table data.

2023-05-24 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-43785:
---
Description: 

{code:java}
build/sbt "sql/Test/runMain  --dsdgenDir  --location  
--scaleFactor 1"
{code}

The command show above easy to cause OOM.

Please refer https://issues.apache.org/jira/browse/SPARK-43573


> Improve the document of GenTPCDSData, so that developers could easy to 
> generate TPCDS table data.
> -
>
> Key: SPARK-43785
> URL: https://issues.apache.org/jira/browse/SPARK-43785
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> {code:java}
> build/sbt "sql/Test/runMain  --dsdgenDir  --location  
> --scaleFactor 1"
> {code}
> The command show above easy to cause OOM.
> Please refer https://issues.apache.org/jira/browse/SPARK-43573



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43785) Improve the document of GenTPCDSData, so that developers could easy to generate TPCDS table data.

2023-05-24 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43785:
--

 Summary: Improve the document of GenTPCDSData, so that developers 
could easy to generate TPCDS table data.
 Key: SPARK-43785
 URL: https://issues.apache.org/jira/browse/SPARK-43785
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-41875) Throw proper errors in Dataset.to()

2023-05-24 Thread jiaan.geng (Jira)


[ https://issues.apache.org/jira/browse/SPARK-41875 ]


jiaan.geng deleted comment on SPARK-41875:


was (Author: beliefer):
I will take a look!

> Throw proper errors in Dataset.to()
> ---
>
> Key: SPARK-41875
> URL: https://issues.apache.org/jira/browse/SPARK-41875
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> schema = StructType(
> [StructField("i", StringType(), True), StructField("j", IntegerType(), 
> True)]
> )
> df = self.spark.createDataFrame([("a", 1)], schema)
> schema1 = StructType([StructField("j", StringType()), StructField("i", 
> StringType())])
> df1 = df.to(schema1)
> self.assertEqual(schema1, df1.schema)
> self.assertEqual(df.count(), df1.count())
> schema2 = StructType([StructField("j", LongType())])
> df2 = df.to(schema2)
> self.assertEqual(schema2, df2.schema)
> self.assertEqual(df.count(), df2.count())
> schema3 = StructType([StructField("struct", schema1, False)])
> df3 = df.select(struct("i", "j").alias("struct")).to(schema3)
> self.assertEqual(schema3, df3.schema)
> self.assertEqual(df.count(), df3.count())
> # incompatible field nullability
> schema4 = StructType([StructField("j", LongType(), False)])
> self.assertRaisesRegex(
> AnalysisException, "NULLABLE_COLUMN_OR_FIELD", lambda: df.to(schema4)
> ){code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py",
>  line 1486, in test_to
>     self.assertRaisesRegex(
> AssertionError: AnalysisException not raised by  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40586) Decouple plan transformation and validation on server side

2023-05-24 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725815#comment-17725815
 ] 

jiaan.geng commented on SPARK-40586:


I will take a look!

> Decouple plan transformation and validation on server side 
> ---
>
> Key: SPARK-40586
> URL: https://issues.apache.org/jira/browse/SPARK-40586
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>
> Project connect, from some perspectives, can be thought as replacing the SQL 
> parser to generate a parsed (but the difference that is unresolved) plan, 
> then the plan is passed to the analyzer. This means that connect should also 
> do validation on the proto as there are many in-validate parser cases that 
> analyzer does not expect to see, which potentially could cause problems if 
> connect only pass through the proto (of course have it translated) to 
> analyzer.
> Meanwhile I think this is a good idea to decouple the validation and 
> transformation so that we have two stages:
> stage 1: proto validation. For example validate if necessary fields are 
> populated or not.
> stage 2: transformation, which convert the proto to a plan with assumption 
> that the plan is valid parsed version of the plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-42669) Short circuit local relation rpcs

2023-05-24 Thread jiaan.geng (Jira)


[ https://issues.apache.org/jira/browse/SPARK-42669 ]


jiaan.geng deleted comment on SPARK-42669:


was (Author: beliefer):
I will take a look!

> Short circuit local relation rpcs
> -
>
> Key: SPARK-42669
> URL: https://issues.apache.org/jira/browse/SPARK-42669
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Operations on LocalRelation can mostly be done locally (without sending 
> rpcs). We should leverage this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43766) Assign a name to the error class _LEGACY_ERROR_TEMP_2410

2023-05-23 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43766:
--

 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2410
 Key: SPARK-43766
 URL: https://issues.apache.org/jira/browse/SPARK-43766
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43765) Assign a name to the error class _LEGACY_ERROR_TEMP_2409

2023-05-23 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-43765:
--

 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2409
 Key: SPARK-43765
 URL: https://issues.apache.org/jira/browse/SPARK-43765
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >