[jira] [Created] (SPARK-49810) Extract the preparation of df.sort to parent class

2024-09-27 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49810:
-

 Summary: Extract the preparation of df.sort to parent class 
 Key: SPARK-49810
 URL: https://issues.apache.org/jira/browse/SPARK-49810
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49808) Fix a deadlock in subquery execution due to lazy vals

2024-09-27 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49808:
-

 Summary: Fix a deadlock in subquery execution due to lazy vals
 Key: SPARK-49808
 URL: https://issues.apache.org/jira/browse/SPARK-49808
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49805) Remove internal functions from `function.scala`

2024-09-26 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49805:
-

 Summary: Remove internal functions from `function.scala`
 Key: SPARK-49805
 URL: https://issues.apache.org/jira/browse/SPARK-49805
 Project: Spark
  Issue Type: Improvement
  Components: ML, SQL
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49784) Add more test for spark.sql

2024-09-26 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49784.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 48246
[https://github.com/apache/spark/pull/48246]

> Add more test for spark.sql
> ---
>
> Key: SPARK-49784
> URL: https://issues.apache.org/jira/browse/SPARK-49784
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49784) Add more test for spark.sql

2024-09-25 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49784:
-

 Summary: Add more test for spark.sql
 Key: SPARK-49784
 URL: https://issues.apache.org/jira/browse/SPARK-49784
 Project: Spark
  Issue Type: Test
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49552) Add DataFrame APIs for new SQL functions to generate random strings or numbers within ranges

2024-09-24 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-49552:
-

Assignee: Daniel

> Add DataFrame APIs for new SQL functions to generate random strings or 
> numbers within ranges
> 
>
> Key: SPARK-49552
> URL: https://issues.apache.org/jira/browse/SPARK-49552
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49552) Add DataFrame APIs for new SQL functions to generate random strings or numbers within ranges

2024-09-24 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49552.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 48143
[https://github.com/apache/spark/pull/48143]

> Add DataFrame APIs for new SQL functions to generate random strings or 
> numbers within ranges
> 
>
> Key: SPARK-49552
> URL: https://issues.apache.org/jira/browse/SPARK-49552
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49767) Refactor the internal function invocation

2024-09-24 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49767:
-

 Summary: Refactor the internal function invocation
 Key: SPARK-49767
 URL: https://issues.apache.org/jira/browse/SPARK-49767
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PS
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49734) Make function `shuffle` support `seed` argument

2024-09-22 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-49734:
-

Assignee: Ruifeng Zheng

> Make function `shuffle` support `seed` argument
> ---
>
> Key: SPARK-49734
> URL: https://issues.apache.org/jira/browse/SPARK-49734
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49734) Make function `shuffle` support `seed` argument

2024-09-22 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49734.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 48184
[https://github.com/apache/spark/pull/48184]

> Make function `shuffle` support `seed` argument
> ---
>
> Key: SPARK-49734
> URL: https://issues.apache.org/jira/browse/SPARK-49734
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49734) Make function `shuffle` support `seed` argument

2024-09-20 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49734:
-

 Summary: Make function `shuffle` support `seed` argument
 Key: SPARK-49734
 URL: https://issues.apache.org/jira/browse/SPARK-49734
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49713) Make function `count_min_sketch` accept number arguments

2024-09-19 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-49713:
-

Assignee: Ruifeng Zheng

> Make function `count_min_sketch` accept number arguments
> 
>
> Key: SPARK-49713
> URL: https://issues.apache.org/jira/browse/SPARK-49713
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49713) Make function `count_min_sketch` accept number arguments

2024-09-19 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49713.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 48157
[https://github.com/apache/spark/pull/48157]

> Make function `count_min_sketch` accept number arguments
> 
>
> Key: SPARK-49713
> URL: https://issues.apache.org/jira/browse/SPARK-49713
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49720) Add a script to clean up PySpark temp files

2024-09-19 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49720:
-

 Summary: Add a script to clean up PySpark temp files
 Key: SPARK-49720
 URL: https://issues.apache.org/jira/browse/SPARK-49720
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49719) UUID and SHUFFLE should accept integer seed

2024-09-19 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49719:
-

 Summary: UUID and SHUFFLE should accept integer seed
 Key: SPARK-49719
 URL: https://issues.apache.org/jira/browse/SPARK-49719
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49717) Function parity test ignore private functions

2024-09-19 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49717.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 48163
[https://github.com/apache/spark/pull/48163]

> Function parity test ignore private functions
> -
>
> Key: SPARK-49717
> URL: https://issues.apache.org/jira/browse/SPARK-49717
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49693) Refine the string representation of timedelta

2024-09-19 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49693.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 48159
[https://github.com/apache/spark/pull/48159]

> Refine the string representation of timedelta
> -
>
> Key: SPARK-49693
> URL: https://issues.apache.org/jira/browse/SPARK-49693
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49717) Function parity test ignore private functions

2024-09-19 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-49717:
-

Assignee: Ruifeng Zheng

> Function parity test ignore private functions
> -
>
> Key: SPARK-49717
> URL: https://issues.apache.org/jira/browse/SPARK-49717
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49717) Function parity test ignore private functions

2024-09-19 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-49717:
--
Issue Type: Test  (was: Improvement)

> Function parity test ignore private functions
> -
>
> Key: SPARK-49717
> URL: https://issues.apache.org/jira/browse/SPARK-49717
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49717) Function parity test ignore private functions

2024-09-19 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49717:
-

 Summary: Function parity test ignore private functions
 Key: SPARK-49717
 URL: https://issues.apache.org/jira/browse/SPARK-49717
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49713) Make function `count_min_sketch` accept number arguments

2024-09-18 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49713:
-

 Summary: Make function `count_min_sketch` accept number arguments
 Key: SPARK-49713
 URL: https://issues.apache.org/jira/browse/SPARK-49713
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49692) Refine the string representation of literal date and timestamp

2024-09-18 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49692.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 48137
[https://github.com/apache/spark/pull/48137]

> Refine the string representation of literal date and timestamp
> --
>
> Key: SPARK-49692
> URL: https://issues.apache.org/jira/browse/SPARK-49692
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49692) Refine the string representation of literal date and timestamp

2024-09-18 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-49692:
-

Assignee: Ruifeng Zheng

> Refine the string representation of literal date and timestamp
> --
>
> Key: SPARK-49692
> URL: https://issues.apache.org/jira/browse/SPARK-49692
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49693) Refine the string representation of timedelta

2024-09-17 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49693:
-

 Summary: Refine the string representation of timedelta
 Key: SPARK-49693
 URL: https://issues.apache.org/jira/browse/SPARK-49693
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49692) Refine the string representation of literal date and timestamp

2024-09-17 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49692:
-

 Summary: Refine the string representation of literal date and 
timestamp
 Key: SPARK-49692
 URL: https://issues.apache.org/jira/browse/SPARK-49692
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49691) Function `substring` should accept column names

2024-09-17 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49691:
-

 Summary: Function `substring` should accept column names
 Key: SPARK-49691
 URL: https://issues.apache.org/jira/browse/SPARK-49691
 Project: Spark
  Issue Type: Bug
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49640) Apply Reservoir sampling in `SampledPlotBase`

2024-09-17 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49640.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 48105
[https://github.com/apache/spark/pull/48105]

> Apply Reservoir sampling in `SampledPlotBase`
> -
>
> Key: SPARK-49640
> URL: https://issues.apache.org/jira/browse/SPARK-49640
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49640) Apply Reservoir sampling in `SampledPlotBase`

2024-09-13 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49640:
-

 Summary: Apply Reservoir sampling in `SampledPlotBase`
 Key: SPARK-49640
 URL: https://issues.apache.org/jira/browse/SPARK-49640
 Project: Spark
  Issue Type: Sub-task
  Components: PS
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49531) Support line plot with plotly backend

2024-09-12 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49531.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 48008
[https://github.com/apache/spark/pull/48008]

> Support line plot with plotly backend
> -
>
> Key: SPARK-49531
> URL: https://issues.apache.org/jira/browse/SPARK-49531
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> While Pandas on Spark supports plotting, PySpark currently lacks this 
> feature. The proposed API will enable users to generate visualizations, such 
> as line plots, by leveraging libraries like Plotly. This will provide users 
> with an intuitive, interactive way to explore and understand large datasets 
> directly from PySpark DataFrames, streamlining the data analysis workflow in 
> distributed environments.
>  
> See more at 
> [https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit?usp=sharing]
>  for PySpark Plotting API Specification.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49540) Unify the usage of `distributed_sequence_id`

2024-09-08 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-49540:
-

Assignee: Ruifeng Zheng

> Unify the usage of `distributed_sequence_id`
> 
>
> Key: SPARK-49540
> URL: https://issues.apache.org/jira/browse/SPARK-49540
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49540) Unify the usage of `distributed_sequence_id`

2024-09-08 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49540:
-

 Summary: Unify the usage of `distributed_sequence_id`
 Key: SPARK-49540
 URL: https://issues.apache.org/jira/browse/SPARK-49540
 Project: Spark
  Issue Type: Improvement
  Components: PS
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49202) Apply ArrayBinarySearch for histogram

2024-09-03 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-49202:
--
Summary: Apply ArrayBinarySearch for histogram  (was: Register 
`binary_search_for_buckets` in the Scala side)

> Apply ArrayBinarySearch for histogram
> -
>
> Key: SPARK-49202
> URL: https://issues.apache.org/jira/browse/SPARK-49202
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49496) Refresh testing image for pyarrow 17

2024-09-03 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49496.
---
Resolution: Duplicate

> Refresh testing image for pyarrow 17
> 
>
> Key: SPARK-49496
> URL: https://issues.apache.org/jira/browse/SPARK-49496
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49203) Add expression for `java.util.Arrays.binarySearch`

2024-09-02 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49203.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47741
[https://github.com/apache/spark/pull/47741]

> Add expression for `java.util.Arrays.binarySearch`
> --
>
> Key: SPARK-49203
> URL: https://issues.apache.org/jira/browse/SPARK-49203
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add a dedicated expression 
> {code:java}
> ArrayBinarySearch(array, value) {code}
> for binary search, the behavior should be the same as 
> `java.util.Arrays.binarySearch`, so that we can use it to implement histogram 
> plot in the client side (no longer need to depend on mllib's Bucketizer)
>  
> This expression is for internal purposes, should not be exposed to end users.
> It assume the array is already sorted.
> if array or value is null, returns null.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49496) Refresh testing image for pyarrow 17

2024-09-02 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49496:
-

 Summary: Refresh testing image for pyarrow 17
 Key: SPARK-49496
 URL: https://issues.apache.org/jira/browse/SPARK-49496
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49441) StringIndexer sort arrays in executors

2024-09-02 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49441.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47904
[https://github.com/apache/spark/pull/47904]

> StringIndexer sort arrays in executors
> --
>
> Key: SPARK-49441
> URL: https://issues.apache.org/jira/browse/SPARK-49441
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49412) Compute all box plot metrics in single job

2024-08-27 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49412.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47897
[https://github.com/apache/spark/pull/47897]

> Compute all box plot metrics in single job
> --
>
> Key: SPARK-49412
> URL: https://issues.apache.org/jira/browse/SPARK-49412
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49441) StringIndexer sort arrays in executors

2024-08-27 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49441:
-

 Summary: StringIndexer sort arrays in executors
 Key: SPARK-49441
 URL: https://issues.apache.org/jira/browse/SPARK-49441
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49357) [PYTHON] SparkConnectClient._truncate is not effective on deeply nested messages

2024-08-27 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49357.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47891
[https://github.com/apache/spark/pull/47891]

> [PYTHON] SparkConnectClient._truncate is not effective on deeply nested 
> messages
> 
>
> Key: SPARK-49357
> URL: https://issues.apache.org/jira/browse/SPARK-49357
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Changgyoo Park
>Assignee: Changgyoo Park
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Related to SPARK-49336 (Scala).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49357) [PYTHON] SparkConnectClient._truncate is not effective on deeply nested messages

2024-08-27 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-49357:
-

Assignee: Changgyoo Park

> [PYTHON] SparkConnectClient._truncate is not effective on deeply nested 
> messages
> 
>
> Key: SPARK-49357
> URL: https://issues.apache.org/jira/browse/SPARK-49357
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Changgyoo Park
>Assignee: Changgyoo Park
>Priority: Major
>  Labels: pull-request-available
>
> Related to SPARK-49336 (Scala).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49412) Compute all box plot metrics in single job

2024-08-27 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49412:
-

 Summary: Compute all box plot metrics in single job
 Key: SPARK-49412
 URL: https://issues.apache.org/jira/browse/SPARK-49412
 Project: Spark
  Issue Type: Sub-task
  Components: PS
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49366) Treat Union node as leaf in dataframe column resolution

2024-08-26 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49366.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47853
[https://github.com/apache/spark/pull/47853]

> Treat Union node as leaf in dataframe column resolution
> ---
>
> Key: SPARK-49366
> URL: https://issues.apache.org/jira/browse/SPARK-49366
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49391) Boxplot select outliers by distance from fences

2024-08-26 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49391.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47870
[https://github.com/apache/spark/pull/47870]

> Boxplot select outliers by distance from fences
> ---
>
> Key: SPARK-49391
> URL: https://issues.apache.org/jira/browse/SPARK-49391
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49391) Boxplot select outliers by distance from fences

2024-08-26 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-49391:
-

Assignee: Ruifeng Zheng

> Boxplot select outliers by distance from fences
> ---
>
> Key: SPARK-49391
> URL: https://issues.apache.org/jira/browse/SPARK-49391
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49391) Boxplot select outliers by distance from fences

2024-08-26 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49391:
-

 Summary: Boxplot select outliers by distance from fences
 Key: SPARK-49391
 URL: https://issues.apache.org/jira/browse/SPARK-49391
 Project: Spark
  Issue Type: Sub-task
  Components: PS
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49382) DataFrame boxplot supports fliers

2024-08-25 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49382.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47866
[https://github.com/apache/spark/pull/47866]

> DataFrame boxplot supports fliers
> -
>
> Key: SPARK-49382
> URL: https://issues.apache.org/jira/browse/SPARK-49382
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49387) Fix type hint for `accuracy` in `percentile_approx` and `approx_percentile`

2024-08-25 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49387:
-

 Summary: Fix type hint for `accuracy` in `percentile_approx` and 
`approx_percentile`
 Key: SPARK-49387
 URL: https://issues.apache.org/jira/browse/SPARK-49387
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49382) DataFrame boxplot supports fliers

2024-08-25 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49382:
-

 Summary: DataFrame boxplot supports fliers
 Key: SPARK-49382
 URL: https://issues.apache.org/jira/browse/SPARK-49382
 Project: Spark
  Issue Type: Sub-task
  Components: PS
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49367) Parallelize the KDE computation for plotly backend

2024-08-25 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49367.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47854
[https://github.com/apache/spark/pull/47854]

> Parallelize the KDE computation for plotly backend
> --
>
> Key: SPARK-49367
> URL: https://issues.apache.org/jira/browse/SPARK-49367
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49365) Simplify the bucket aggregation in hist plot

2024-08-24 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49365.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47852
[https://github.com/apache/spark/pull/47852]

> Simplify the bucket aggregation in hist plot 
> -
>
> Key: SPARK-49365
> URL: https://issues.apache.org/jira/browse/SPARK-49365
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49365) Simplify the bucket aggregation in hist plot

2024-08-24 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-49365:
-

Assignee: Ruifeng Zheng

> Simplify the bucket aggregation in hist plot 
> -
>
> Key: SPARK-49365
> URL: https://issues.apache.org/jira/browse/SPARK-49365
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49367) Parallelize the KDE computation for plotly backend

2024-08-22 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49367:
-

 Summary: Parallelize the KDE computation for plotly backend
 Key: SPARK-49367
 URL: https://issues.apache.org/jira/browse/SPARK-49367
 Project: Spark
  Issue Type: Sub-task
  Components: PS
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49365) Simplify the bucket aggregation in hist plot

2024-08-22 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49365:
-

 Summary: Simplify the bucket aggregation in hist plot 
 Key: SPARK-49365
 URL: https://issues.apache.org/jira/browse/SPARK-49365
 Project: Spark
  Issue Type: Sub-task
  Components: PS
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49223) Simplify the StringIndexer.countByValue with builtin functions

2024-08-22 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-49223:
-

Assignee: Ruifeng Zheng

> Simplify the StringIndexer.countByValue with builtin functions
> --
>
> Key: SPARK-49223
> URL: https://issues.apache.org/jira/browse/SPARK-49223
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49223) Simplify the StringIndexer.countByValue with builtin functions

2024-08-22 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49223.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47742
[https://github.com/apache/spark/pull/47742]

> Simplify the StringIndexer.countByValue with builtin functions
> --
>
> Key: SPARK-49223
> URL: https://issues.apache.org/jira/browse/SPARK-49223
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49223) Simplify the StringIndexer.countByValue with builtin functions

2024-08-14 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-49223:
--
Summary: Simplify the StringIndexer.countByValue with builtin functions  
(was: Introduce an expression for multi-column grouped count)

> Simplify the StringIndexer.countByValue with builtin functions
> --
>
> Key: SPARK-49223
> URL: https://issues.apache.org/jira/browse/SPARK-49223
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49203) Add expression for `java.util.Arrays.binarySearch`

2024-08-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-49203:
-

Assignee: BingKun Pan

> Add expression for `java.util.Arrays.binarySearch`
> --
>
> Key: SPARK-49203
> URL: https://issues.apache.org/jira/browse/SPARK-49203
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: BingKun Pan
>Priority: Major
>
> Add a dedicated expression 
> {code:java}
> ArrayBinarySearch(array, value) {code}
> for binary search, the behavior should be the same as 
> `java.util.Arrays.binarySearch`, so that we can use it to implement histogram 
> plot in the client side (no longer need to depend on mllib's Bucketizer)
>  
> This expression is for internal purposes, should not be exposed to end users.
> It assume the array is already sorted.
> if array or value is null, returns null.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49203) Add expression for `java.util.Arrays.binarySearch`

2024-08-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-49203:
--
Description: 
Add a dedicated expression 
{code:java}
ArrayBinarySearch(array, value) {code}
for binary search, the behavior should be the same as 
`java.util.Arrays.binarySearch`, so that we can use it to implement histogram 
plot in the client side (no longer need to depend on mllib's Bucketizer)

 

This expression is for internal purposes, should not be exposed to end users.

It assume the array is already sorted.

if array or value is null, returns null.

 

  was:
Add a dedicated expression 
{code:java}
ArrayBinarySearch(array, value) {code}
for binary search, the behavior should be the same as 
`java.util.Arrays.binarySearch`, so that we can use it to implement histogram 
plot in the client side (no longer need to depend on mllib's Bucketizer)

 

It assume the array is already sorted.

if array or value is null, returns null.

 


> Add expression for `java.util.Arrays.binarySearch`
> --
>
> Key: SPARK-49203
> URL: https://issues.apache.org/jira/browse/SPARK-49203
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add a dedicated expression 
> {code:java}
> ArrayBinarySearch(array, value) {code}
> for binary search, the behavior should be the same as 
> `java.util.Arrays.binarySearch`, so that we can use it to implement histogram 
> plot in the client side (no longer need to depend on mllib's Bucketizer)
>  
> This expression is for internal purposes, should not be exposed to end users.
> It assume the array is already sorted.
> if array or value is null, returns null.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49203) Add expression for `java.util.Arrays.binarySearch`

2024-08-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-49203:
--
Description: 
Add a dedicated expression 
{code:java}
ArrayBinarySearch(array, value) {code}
for binary search, the behavior should be the same as 
`java.util.Arrays.binarySearch`, so that we can use it to implement histogram 
plot in the client side (no longer need to depend on mllib's Bucketizer)

 

It assume the array is already sorted.

if array or value is null, returns null.

 

  was:Add a dedicated expression for binary search, the behavior should be the 
same as 


> Add expression for `java.util.Arrays.binarySearch`
> --
>
> Key: SPARK-49203
> URL: https://issues.apache.org/jira/browse/SPARK-49203
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add a dedicated expression 
> {code:java}
> ArrayBinarySearch(array, value) {code}
> for binary search, the behavior should be the same as 
> `java.util.Arrays.binarySearch`, so that we can use it to implement histogram 
> plot in the client side (no longer need to depend on mllib's Bucketizer)
>  
> It assume the array is already sorted.
> if array or value is null, returns null.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49203) Add expression for `java.util.Arrays.binarySearch`

2024-08-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-49203:
--
Description: Add a dedicated expression for binary search, the behavior 
should be the same as 

> Add expression for `java.util.Arrays.binarySearch`
> --
>
> Key: SPARK-49203
> URL: https://issues.apache.org/jira/browse/SPARK-49203
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add a dedicated expression for binary search, the behavior should be the same 
> as 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49215) Document the NaN handling in df.na.drop

2024-08-12 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49215:
-

 Summary: Document the NaN handling in df.na.drop
 Key: SPARK-49215
 URL: https://issues.apache.org/jira/browse/SPARK-49215
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49185) Reimplement kde plot with Spark SQL

2024-08-11 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49185.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47685
[https://github.com/apache/spark/pull/47685]

> Reimplement kde plot with Spark SQL
> ---
>
> Key: SPARK-49185
> URL: https://issues.apache.org/jira/browse/SPARK-49185
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49202) Register `binary_search_for_buckets` in the Scala side

2024-08-11 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49202:
-

 Summary: Register `binary_search_for_buckets` in the Scala side
 Key: SPARK-49202
 URL: https://issues.apache.org/jira/browse/SPARK-49202
 Project: Spark
  Issue Type: Sub-task
  Components: PS
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49186) Compute multiple column kde plots with single pass

2024-08-09 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49186:
-

 Summary: Compute multiple column kde plots with single pass
 Key: SPARK-49186
 URL: https://issues.apache.org/jira/browse/SPARK-49186
 Project: Spark
  Issue Type: Sub-task
  Components: PS, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng
Assignee: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49185) Reimplement kde plot with Spark SQL

2024-08-09 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-49185:
-

Assignee: Ruifeng Zheng

> Reimplement kde plot with Spark SQL
> ---
>
> Key: SPARK-49185
> URL: https://issues.apache.org/jira/browse/SPARK-49185
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49185) Reimplement kde plot with Spark SQL

2024-08-09 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49185:
-

 Summary: Reimplement kde plot with Spark SQL
 Key: SPARK-49185
 URL: https://issues.apache.org/jira/browse/SPARK-49185
 Project: Spark
  Issue Type: Sub-task
  Components: PS, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49170) Upgrade snappy to 1.1.10.6

2024-08-09 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49170.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47675
[https://github.com/apache/spark/pull/47675]

> Upgrade snappy to 1.1.10.6
> --
>
> Key: SPARK-49170
> URL: https://issues.apache.org/jira/browse/SPARK-49170
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49172) Refine the type hints in functions

2024-08-08 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49172.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47677
[https://github.com/apache/spark/pull/47677]

> Refine the type hints in functions
> --
>
> Key: SPARK-49172
> URL: https://issues.apache.org/jira/browse/SPARK-49172
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49112) Make createLocalRelationProto support TimestampType

2024-08-05 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-49112:
--
Summary: Make createLocalRelationProto support TimestampType  (was: Make 
createLocalRelationProto support relation with TimestampType)

> Make createLocalRelationProto support TimestampType
> ---
>
> Key: SPARK-49112
> URL: https://issues.apache.org/jira/browse/SPARK-49112
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49112) Make createLocalRelationProto support relation with TimestampType

2024-08-05 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-49112:
--
Summary: Make createLocalRelationProto support relation with TimestampType  
(was: createLocalRelationProto able to create relation with TimestampType)

> Make createLocalRelationProto support relation with TimestampType
> -
>
> Key: SPARK-49112
> URL: https://issues.apache.org/jira/browse/SPARK-49112
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49112) createLocalRelationProto able to create relation with TimestampType

2024-08-05 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49112:
-

 Summary: createLocalRelationProto able to create relation with 
TimestampType
 Key: SPARK-49112
 URL: https://issues.apache.org/jira/browse/SPARK-49112
 Project: Spark
  Issue Type: Improvement
  Components: Connect, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49047) Truncate the message for logging

2024-08-04 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-49047.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47554
[https://github.com/apache/spark/pull/47554]

> Truncate the message for logging
> 
>
> Key: SPARK-49047
> URL: https://issues.apache.org/jira/browse/SPARK-49047
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We should hide {{data}} in {{LocalRelation}}. That can be shown and exposed 
> through the log



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49047) Truncate the message for logging

2024-07-31 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-49047:
--
Summary: Truncate the message for logging  (was: Hide data in LocalRelation 
message)

> Truncate the message for logging
> 
>
> Key: SPARK-49047
> URL: https://issues.apache.org/jira/browse/SPARK-49047
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> We should hide {{data}} in {{LocalRelation}}. That can be shown and exposed 
> through the log



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49053) Make model save/load helper functions accept spark session

2024-07-29 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49053:
-

 Summary: Make model save/load helper functions accept spark session
 Key: SPARK-49053
 URL: https://issues.apache.org/jira/browse/SPARK-49053
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49035) Eliminate TypeVar `ColumnOrName_`

2024-07-28 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-49035:
-

 Summary: Eliminate TypeVar `ColumnOrName_`
 Key: SPARK-49035
 URL: https://issues.apache.org/jira/browse/SPARK-49035
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48998) Meta algorithms save/load model with SparkSession

2024-07-26 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-48998.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47477
[https://github.com/apache/spark/pull/47477]

> Meta algorithms save/load model with SparkSession
> -
>
> Key: SPARK-48998
> URL: https://issues.apache.org/jira/browse/SPARK-48998
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48998) Meta algorithms save/load model with SparkSession

2024-07-24 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-48998:
-

 Summary: Meta algorithms save/load model with SparkSession
 Key: SPARK-48998
 URL: https://issues.apache.org/jira/browse/SPARK-48998
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48988) Make DefaultParamsReader/Writer handle metadata with spark session

2024-07-23 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-48988:
-

 Summary: Make DefaultParamsReader/Writer handle metadata with 
spark session
 Key: SPARK-48988
 URL: https://issues.apache.org/jira/browse/SPARK-48988
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48980) Avoid per-row param read in `LSH/DCT/NGram/PolynomialExpansion`

2024-07-23 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-48980:
-

 Summary: Avoid per-row param read in 
`LSH/DCT/NGram/PolynomialExpansion`
 Key: SPARK-48980
 URL: https://issues.apache.org/jira/browse/SPARK-48980
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48972) Unify the literal string handling

2024-07-22 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-48972:
-

 Summary: Unify the literal string handling
 Key: SPARK-48972
 URL: https://issues.apache.org/jira/browse/SPARK-48972
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48959) Make NoSuchNamespaceException extend NoSuchDatabaseException

2024-07-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-48959:
--
Summary: Make NoSuchNamespaceException extend NoSuchDatabaseException  
(was: Make NoSuchNamespaceException extend NoSuchNamespaceException)

> Make NoSuchNamespaceException extend NoSuchDatabaseException
> 
>
> Key: SPARK-48959
> URL: https://issues.apache.org/jira/browse/SPARK-48959
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48959) Make NoSuchNamespaceException extend NoSuchNamespaceException

2024-07-21 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-48959:
-

 Summary: Make NoSuchNamespaceException extend 
NoSuchNamespaceException
 Key: SPARK-48959
 URL: https://issues.apache.org/jira/browse/SPARK-48959
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48954) Rename unreleased try_remainder() function to try_mod()

2024-07-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-48954.
---
Resolution: Fixed

Issue resolved by pull request 47427
[https://github.com/apache/spark/pull/47427]

> Rename unreleased try_remainder() function to try_mod()
> ---
>
> Key: SPARK-48954
> URL: https://issues.apache.org/jira/browse/SPARK-48954
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> the try_remainder() function is the try_* version of `%` and `mod`.
> As such, given that there is NO `remainder()` function and no other product 
> seems to have try_remainder() we want to rename try_remainder to try_mod()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48954) Rename unreleased try_remainder() function to try_mod()

2024-07-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-48954:
-

Assignee: Serge Rielau

> Rename unreleased try_remainder() function to try_mod()
> ---
>
> Key: SPARK-48954
> URL: https://issues.apache.org/jira/browse/SPARK-48954
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> the try_remainder() function is the try_* version of `%` and `mod`.
> As such, given that there is NO `remainder()` function and no other product 
> seems to have try_remainder() we want to rename try_remainder to try_mod()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48955) ArrayCompact's datatype should be containsNull = false

2024-07-20 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-48955:
-

 Summary: ArrayCompact's datatype should be containsNull = false
 Key: SPARK-48955
 URL: https://issues.apache.org/jira/browse/SPARK-48955
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48945) Simplify regex functions with `lit`

2024-07-19 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-48945:
--
Summary: Simplify regex functions with `lit`  (was: Simplify a group of 
function with `lit`)

> Simplify regex functions with `lit`
> ---
>
> Key: SPARK-48945
> URL: https://issues.apache.org/jira/browse/SPARK-48945
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48945) Simplify a group of function with `lit`

2024-07-19 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-48945:
-

 Summary: Simplify a group of function with `lit`
 Key: SPARK-48945
 URL: https://issues.apache.org/jira/browse/SPARK-48945
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48944) Unify the JSON-format schema handling in Connect Server

2024-07-19 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-48944:
-

 Summary: Unify the JSON-format schema handling in Connect Server
 Key: SPARK-48944
 URL: https://issues.apache.org/jira/browse/SPARK-48944
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48924) Add a pandas-like `make_interval` helper function

2024-07-17 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-48924:
-

 Summary: Add a pandas-like `make_interval` helper function
 Key: SPARK-48924
 URL: https://issues.apache.org/jira/browse/SPARK-48924
 Project: Spark
  Issue Type: Improvement
  Components: PS
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48892) Avoid per-row param read in `Tokenizer`

2024-07-16 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-48892:
-

Assignee: Ruifeng Zheng

> Avoid per-row param read in `Tokenizer`
> ---
>
> Key: SPARK-48892
> URL: https://issues.apache.org/jira/browse/SPARK-48892
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48892) Avoid per-row param read in `Tokenizer`

2024-07-16 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-48892.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47342
[https://github.com/apache/spark/pull/47342]

> Avoid per-row param read in `Tokenizer`
> ---
>
> Key: SPARK-48892
> URL: https://issues.apache.org/jira/browse/SPARK-48892
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48884) Remove unused helper function `PythonSQLUtils.makeInterval`

2024-07-15 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-48884.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47330
[https://github.com/apache/spark/pull/47330]

> Remove unused helper function `PythonSQLUtils.makeInterval`
> ---
>
> Key: SPARK-48884
> URL: https://issues.apache.org/jira/browse/SPARK-48884
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48897) make `from_xml` support StructType schema

2024-07-15 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-48897:
-

 Summary: make `from_xml` support StructType schema
 Key: SPARK-48897
 URL: https://issues.apache.org/jira/browse/SPARK-48897
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48892) Avoid per-row param read in `Tokenizer`

2024-07-14 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-48892:
-

 Summary: Avoid per-row param read in `Tokenizer`
 Key: SPARK-48892
 URL: https://issues.apache.org/jira/browse/SPARK-48892
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48884) Remove unused helper function `PythonSQLUtils.makeInterval`

2024-07-12 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-48884:
-

 Summary: Remove unused helper function 
`PythonSQLUtils.makeInterval`
 Key: SPARK-48884
 URL: https://issues.apache.org/jira/browse/SPARK-48884
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48878) Add doctest for `options` in json functions

2024-07-12 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-48878.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47319
[https://github.com/apache/spark/pull/47319]

> Add doctest for `options` in json functions
> ---
>
> Key: SPARK-48878
> URL: https://issues.apache.org/jira/browse/SPARK-48878
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48878) Add doctest for `options` in json functions

2024-07-12 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-48878:
-

 Summary: Add doctest for `options` in json functions
 Key: SPARK-48878
 URL: https://issues.apache.org/jira/browse/SPARK-48878
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48877) Test the default column name of array functions

2024-07-12 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-48877:
-

 Summary: Test the default column name of array functions
 Key: SPARK-48877
 URL: https://issues.apache.org/jira/browse/SPARK-48877
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48842) Document non-determinism of max_by and min_by

2024-07-11 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-48842.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47266
[https://github.com/apache/spark/pull/47266]

> Document non-determinism of max_by and min_by
> -
>
> Key: SPARK-48842
> URL: https://issues.apache.org/jira/browse/SPARK-48842
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48842) Document non-determinism of max_by and min_by

2024-07-11 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-48842:
-

Assignee: Ruifeng Zheng

> Document non-determinism of max_by and min_by
> -
>
> Key: SPARK-48842
> URL: https://issues.apache.org/jira/browse/SPARK-48842
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >