[jira] [Resolved] (SPARK-49776) Support pie plots

2024-09-26 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-49776.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 48256
[https://github.com/apache/spark/pull/48256]

> Support pie plots
> -
>
> Key: SPARK-49776
> URL: https://issues.apache.org/jira/browse/SPARK-49776
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49796) Support pie subplots with plotly backend

2024-09-26 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49796:
-
Summary: Support pie subplots with plotly backend  (was: Support pie 
subplots)

> Support pie subplots with plotly backend
> 
>
> Key: SPARK-49796
> URL: https://issues.apache.org/jira/browse/SPARK-49796
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PS, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49796) Support pie subplots

2024-09-26 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-49796:


 Summary: Support pie subplots
 Key: SPARK-49796
 URL: https://issues.apache.org/jira/browse/SPARK-49796
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PS, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49792) Upgrade to numpy 2 for building and testing Spark branches

2024-09-25 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49792:
-
Summary: Upgrade to numpy 2 for building and testing Spark branches  (was: 
Upgrade NumPy to 2.1.0)

> Upgrade to numpy 2 for building and testing Spark branches
> --
>
> Key: SPARK-49792
> URL: https://issues.apache.org/jira/browse/SPARK-49792
> Project: Spark
>  Issue Type: Story
>  Components: Build, PS
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49793) Enable PredictBatchUDFTests.test_caching for NumPy 2

2024-09-25 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49793:
-
Description: 
 
{code:java}
import numpy as np
import pandas as pd
from pyspark.ml.functions import predict_batch_udf
from pyspark.sql.types import DoubleType
from pyspark.sql.functions import struct
data = np.arange(0, 36, dtype=np.float64).reshape(-1, 4)
pdf = pd.DataFrame(data, columns=["a", "b", "c", "d"])
df = spark.createDataFrame(pdf)
def make_predict_fn():
    fake_output = np.random.random()
    def predict(inputs):
        return np.array([fake_output for i in inputs])
    return predict
 
identity = predict_batch_udf(make_predict_fn, return_type=DoubleType(), 
batch_size=5)
df1 = df.withColumn("preds", identity(struct("a"))).toPandas()
df2 = df.withColumn("preds", identity(struct("a"))).toPandas()
{code}
NumPy 2.1.0
{code:java}
>>> df1
  a b c d preds
0   0.0   1.0   2.0   3.0  0.431752
1   4.0   5.0   6.0   7.0  0.912097
2   8.0   9.0  10.0  11.0  0.679628
3  12.0  13.0  14.0  15.0  0.853850
4  16.0  17.0  18.0  19.0  0.389971
5  20.0  21.0  22.0  23.0  0.654521
6  24.0  25.0  26.0  27.0  0.430569
7  28.0  29.0  30.0  31.0  0.331055
8  32.0  33.0  34.0  35.0  0.306073
>>> df2
  a b c d preds
0   0.0   1.0   2.0   3.0  0.679628
1   4.0   5.0   6.0   7.0  0.430569
2   8.0   9.0  10.0  11.0  0.853850
3  12.0  13.0  14.0  15.0  0.306073
4  16.0  17.0  18.0  19.0  0.654521
5  20.0  21.0  22.0  23.0  0.389971
6  24.0  25.0  26.0  27.0  0.507598
7  28.0  29.0  30.0  31.0  0.912097
8  32.0  33.0  34.0  35.0  0.431752 {code}
which should be
{code:java}
>>> df1
      a     b     c     d     preds
0   0.0   1.0   2.0   3.0  0.685941
1   4.0   5.0   6.0   7.0  0.685941
2   8.0   9.0  10.0  11.0  0.685941
3  12.0  13.0  14.0  15.0  0.685941
4  16.0  17.0  18.0  19.0  0.685941
5  20.0  21.0  22.0  23.0  0.685941
6  24.0  25.0  26.0  27.0  0.685941
7  28.0  29.0  30.0  31.0  0.685941
8  32.0  33.0  34.0  35.0  0.685941
>>> df2
      a     b     c     d     preds
0   0.0   1.0   2.0   3.0  0.685941
1   4.0   5.0   6.0   7.0  0.685941
2   8.0   9.0  10.0  11.0  0.685941
3  12.0  13.0  14.0  15.0  0.685941
4  16.0  17.0  18.0  19.0  0.685941
5  20.0  21.0  22.0  23.0  0.685941
6  24.0  25.0  26.0  27.0  0.685941
7  28.0  29.0  30.0  31.0  0.685941
8  32.0  33.0  34.0  35.0  0.685941 {code}
 

> Enable PredictBatchUDFTests.test_caching for NumPy 2
> 
>
> Key: SPARK-49793
> URL: https://issues.apache.org/jira/browse/SPARK-49793
> Project: Spark
>  Issue Type: Story
>  Components: ML, Tests
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
>  
> {code:java}
> import numpy as np
> import pandas as pd
> from pyspark.ml.functions import predict_batch_udf
> from pyspark.sql.types import DoubleType
> from pyspark.sql.functions import struct
> data = np.arange(0, 36, dtype=np.float64).reshape(-1, 4)
> pdf = pd.DataFrame(data, columns=["a", "b", "c", "d"])
> df = spark.createDataFrame(pdf)
> def make_predict_fn():
>     fake_output = np.random.random()
>     def predict(inputs):
>         return np.array([fake_output for i in inputs])
>     return predict
>  
> identity = predict_batch_udf(make_predict_fn, return_type=DoubleType(), 
> batch_size=5)
> df1 = df.withColumn("preds", identity(struct("a"))).toPandas()
> df2 = df.withColumn("preds", identity(struct("a"))).toPandas()
> {code}
> NumPy 2.1.0
> {code:java}
> >>> df1
>   a b c d preds
> 0   0.0   1.0   2.0   3.0  0.431752
> 1   4.0   5.0   6.0   7.0  0.912097
> 2   8.0   9.0  10.0  11.0  0.679628
> 3  12.0  13.0  14.0  15.0  0.853850
> 4  16.0  17.0  18.0  19.0  0.389971
> 5  20.0  21.0  22.0  23.0  0.654521
> 6  24.0  25.0  26.0  27.0  0.430569
> 7  28.0  29.0  30.0  31.0  0.331055
> 8  32.0  33.0  34.0  35.0  0.306073
> >>> df2
>   a b c d preds
> 0   0.0   1.0   2.0   3.0  0.679628
> 1   4.0   5.0   6.0   7.0  0.430569
> 2   8.0   9.0  10.0  11.0  0.853850
> 3  12.0  13.0  14.0  15.0  0.306073
> 4  16.0  17.0  18.0  19.0  0.654521
> 5  20.0  21.0  22.0  23.0  0.389971
> 6  24.0  25.0  26.0  27.0  0.507598
> 7  28.0  29.0  30.0  31.0  0.912097
> 8  32.0  33.0  34.0  35.0  0.431752 {code}
> which should be
> {code:java}
> >>> df1
>       a     b     c     d     preds
> 0   0.0   1.0   2.0   3.0  0.685941
> 1   4.0   5.0   6.0   7.0  0.685941
> 2   8.0   9.0  10.0  11.0  0.685941
> 3  12.0  13.0  14.0  15.0  0.685941
> 4  16.0  17.0  18.0  19.0  0.685941
> 5  20.0  21.0  22.0  23.0  0.685941
> 6  24.0  25.0  26.0  27.0  0.685941
> 7  28.0  29.0  30.0  31.0  0.685941
> 8  32.0  33.0  34.0  35.0  0.685941
> >>> df2
>       a     b     c     d     preds
> 0   0.0   1.0   2.0   3.0  0.685941
> 1   4.0   5.0   6.0   7.0  0.685941
> 2   8.0 

[jira] [Created] (SPARK-49793) Enable PredictBatchUDFTests.test_caching for NumPy 2

2024-09-25 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-49793:


 Summary: Enable PredictBatchUDFTests.test_caching for NumPy 2
 Key: SPARK-49793
 URL: https://issues.apache.org/jira/browse/SPARK-49793
 Project: Spark
  Issue Type: Story
  Components: ML, Tests
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49776) Support pie plots

2024-09-24 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-49776:


 Summary: Support pie plots
 Key: SPARK-49776
 URL: https://issues.apache.org/jira/browse/SPARK-49776
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49694) Support scatter plots

2024-09-24 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-49694:


Assignee: Xinrong Meng

> Support scatter plots
> -
>
> Key: SPARK-49694
> URL: https://issues.apache.org/jira/browse/SPARK-49694
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49694) Support scatter plots

2024-09-24 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-49694.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 48219
[https://github.com/apache/spark/pull/48219]

> Support scatter plots
> -
>
> Key: SPARK-49694
> URL: https://issues.apache.org/jira/browse/SPARK-49694
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49765) Adjust documentation of "spark.sql.pyspark.plotting.max_rows"

2024-09-24 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-49765:


 Summary: Adjust documentation of 
"spark.sql.pyspark.plotting.max_rows"
 Key: SPARK-49765
 URL: https://issues.apache.org/jira/browse/SPARK-49765
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49764) Support area plots

2024-09-24 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-49764:


 Summary: Support area plots
 Key: SPARK-49764
 URL: https://issues.apache.org/jira/browse/SPARK-49764
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49626) Support horizontal and vertical bar plots

2024-09-23 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-49626:


Assignee: Xinrong Meng

> Support horizontal and vertical bar plots
> -
>
> Key: SPARK-49626
> URL: https://issues.apache.org/jira/browse/SPARK-49626
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Support horizontal and vertical bar plot



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-49626) Support horizontal and vertical bar plots

2024-09-23 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-49626.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 48100
[https://github.com/apache/spark/pull/48100]

> Support horizontal and vertical bar plots
> -
>
> Key: SPARK-49626
> URL: https://issues.apache.org/jira/browse/SPARK-49626
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Support horizontal and vertical bar plot



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49716) Imrpove documentation and test of barh plot

2024-09-19 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-49716:


 Summary: Imrpove documentation and test of barh plot
 Key: SPARK-49716
 URL: https://issues.apache.org/jira/browse/SPARK-49716
 Project: Spark
  Issue Type: Sub-task
  Components: PS
Affects Versions: 4.0.0
Reporter: Xinrong Meng


- Update the documentation for barh plot to clarify the difference between axis 
interpretation in Plotly and Matplotlib.
- Test multiple columns as category axis.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49716) Fix documentation and add test of barh plot

2024-09-19 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49716:
-
Component/s: Documentation

> Fix documentation and add test of barh plot
> ---
>
> Key: SPARK-49716
> URL: https://issues.apache.org/jira/browse/SPARK-49716
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PS
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> - Update the documentation for barh plot to clarify the difference between 
> axis interpretation in Plotly and Matplotlib.
> - Test multiple columns as category axis.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49716) Fix documentation and add test of barh plot

2024-09-19 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49716:
-
Summary: Fix documentation and add test of barh plot  (was: Imrpove 
documentation and test of barh plot)

> Fix documentation and add test of barh plot
> ---
>
> Key: SPARK-49716
> URL: https://issues.apache.org/jira/browse/SPARK-49716
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> - Update the documentation for barh plot to clarify the difference between 
> axis interpretation in Plotly and Matplotlib.
> - Test multiple columns as category axis.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49694) Support scatter plots

2024-09-18 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-49694:


 Summary: Support scatter plots
 Key: SPARK-49694
 URL: https://issues.apache.org/jira/browse/SPARK-49694
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49626) Support horizontal and vertical bar plots

2024-09-12 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49626:
-
Summary: Support horizontal and vertical bar plots  (was: Support 
horizontal and vertical bar plot)

> Support horizontal and vertical bar plots
> -
>
> Key: SPARK-49626
> URL: https://issues.apache.org/jira/browse/SPARK-49626
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Support horizontal and vertical bar plot



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49626) Support horizontal and vertical bar plot

2024-09-12 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-49626:


 Summary: Support horizontal and vertical bar plot
 Key: SPARK-49626
 URL: https://issues.apache.org/jira/browse/SPARK-49626
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Support horizontal and vertical bar plot



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49607) Eliminate "spark.sql.pyspark.plotting.sample_ratio" config

2024-09-12 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-49607:


 Summary: Eliminate "spark.sql.pyspark.plotting.sample_ratio" config
 Key: SPARK-49607
 URL: https://issues.apache.org/jira/browse/SPARK-49607
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


We may eliminate  "spark.sql.pyspark.plotting.sample_ratio" config later with a 
better sampling approach.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49606) Improve documentation of Pandas on Spark plotting API

2024-09-12 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-49606:


 Summary: Improve documentation of Pandas on Spark plotting API
 Key: SPARK-49606
 URL: https://issues.apache.org/jira/browse/SPARK-49606
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PS
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Improve documentation of Pandas on Spark plotting API following pandas 2.2 
(stable), see https://pandas.pydata.org/docs/reference/frame.html.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49531) Support line plot with plotly backend

2024-09-11 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49531:
-
Description: 
While Pandas on Spark supports plotting, PySpark currently lacks this feature. 
The proposed API will enable users to generate visualizations, such as line 
plots, by leveraging libraries like Plotly. This will provide users with an 
intuitive, interactive way to explore and understand large datasets directly 
from PySpark DataFrames, streamlining the data analysis workflow in distributed 
environments.

 

See more at 
[https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit?usp=sharing]
 for PySpark Plotting API Specification.

 

  was:
While Pandas on Spark supports plotting, PySpark currently lacks this feature. 
The proposed API will enable users to generate visualizations, such as line 
plots, by leveraging libraries like Plotly. This will provide users with an 
intuitive, interactive way to explore and understand large datasets directly 
from PySpark DataFrames, streamlining the data analysis workflow in distributed 
environments.

 

See more at 
[https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit
 
|https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit.]for
 PySpark Plotting API Specification.

 


> Support line plot with plotly backend
> -
>
> Key: SPARK-49531
> URL: https://issues.apache.org/jira/browse/SPARK-49531
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> While Pandas on Spark supports plotting, PySpark currently lacks this 
> feature. The proposed API will enable users to generate visualizations, such 
> as line plots, by leveraging libraries like Plotly. This will provide users 
> with an intuitive, interactive way to explore and understand large datasets 
> directly from PySpark DataFrames, streamlining the data analysis workflow in 
> distributed environments.
>  
> See more at 
> [https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit?usp=sharing]
>  for PySpark Plotting API Specification.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49531) Support line plot with plotly backend

2024-09-11 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49531:
-
Description: 
While Pandas on Spark supports plotting, PySpark currently lacks this feature. 
The proposed API will enable users to generate visualizations, such as line 
plots, by leveraging libraries like Plotly. This will provide users with an 
intuitive, interactive way to explore and understand large datasets directly 
from PySpark DataFrames, streamlining the data analysis workflow in distributed 
environments.

 

See more at 
[https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit
 
|https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit.]for
 PySpark Plotting API Specification.

 

  was:
While Pandas on Spark supports plotting, PySpark currently lacks this feature. 
The proposed API will enable users to generate visualizations, such as line 
plots, by leveraging libraries like Plotly. This will provide users with an 
intuitive, interactive way to explore and understand large datasets directly 
from PySpark DataFrames, streamlining the data analysis workflow in distributed 
environments.

 

See more at 
[https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit.]

 


> Support line plot with plotly backend
> -
>
> Key: SPARK-49531
> URL: https://issues.apache.org/jira/browse/SPARK-49531
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> While Pandas on Spark supports plotting, PySpark currently lacks this 
> feature. The proposed API will enable users to generate visualizations, such 
> as line plots, by leveraging libraries like Plotly. This will provide users 
> with an intuitive, interactive way to explore and understand large datasets 
> directly from PySpark DataFrames, streamlining the data analysis workflow in 
> distributed environments.
>  
> See more at 
> [https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit
>  
> |https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit.]for
>  PySpark Plotting API Specification.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49531) Support line plot with plotly backend

2024-09-11 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49531:
-
Description: 
While Pandas on Spark supports plotting, PySpark currently lacks this feature. 
The proposed API will enable users to generate visualizations, such as line 
plots, by leveraging libraries like Plotly. This will provide users with an 
intuitive, interactive way to explore and understand large datasets directly 
from PySpark DataFrames, streamlining the data analysis workflow in distributed 
environments.

 

See more at 
[https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit.]

 

  was:Pandas on Spark DataFrame supports plotting, but PySpark DataFrame does 
not. Enabling line plot for PySpark DataFrame with Plotly backend as default 
(initially), would improve data visualization capabilities.


> Support line plot with plotly backend
> -
>
> Key: SPARK-49531
> URL: https://issues.apache.org/jira/browse/SPARK-49531
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> While Pandas on Spark supports plotting, PySpark currently lacks this 
> feature. The proposed API will enable users to generate visualizations, such 
> as line plots, by leveraging libraries like Plotly. This will provide users 
> with an intuitive, interactive way to explore and understand large datasets 
> directly from PySpark DataFrames, streamlining the data analysis workflow in 
> distributed environments.
>  
> See more at 
> [https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit.]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49531) Support line plot with plotly backend

2024-09-11 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49531:
-
Description: Pandas on Spark DataFrame supports plotting, but PySpark 
DataFrame does not. Enabling line plot for PySpark DataFrame with Plotly 
backend as default (initially), would improve data visualization capabilities.

> Support line plot with plotly backend
> -
>
> Key: SPARK-49531
> URL: https://issues.apache.org/jira/browse/SPARK-49531
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Pandas on Spark DataFrame supports plotting, but PySpark DataFrame does not. 
> Enabling line plot for PySpark DataFrame with Plotly backend as default 
> (initially), would improve data visualization capabilities.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49595) Fix DataFrame.unpivot/melt in Spark Connect Scala Client

2024-09-10 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49595:
-
Summary: Fix DataFrame.unpivot/melt in Spark Connect Scala Client  (was: 
Fix DataFrame.unpivot/melt in Spark Connect)

> Fix DataFrame.unpivot/melt in Spark Connect Scala Client
> 
>
> Key: SPARK-49595
> URL: https://issues.apache.org/jira/browse/SPARK-49595
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49595) Fix DataFrame.unpivot/melt in Spark Connect

2024-09-10 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-49595:


 Summary: Fix DataFrame.unpivot/melt in Spark Connect
 Key: SPARK-49595
 URL: https://issues.apache.org/jira/browse/SPARK-49595
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49582) Improve "dispatch_window_method" utility and docstring

2024-09-10 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49582:
-
Summary: Improve "dispatch_window_method" utility and docstring  (was: Fix 
"dispatch_window_method" utility and docstring)

> Improve "dispatch_window_method" utility and docstring
> --
>
> Key: SPARK-49582
> URL: https://issues.apache.org/jira/browse/SPARK-49582
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> - Fix "dispatch_window_method" from always assuming the correct Window class 
> based on the environment to dynamically checking the type of the first 
> argument.
> - Improve docstrings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49582) Improve "dispatch_window_method" utility and docstring

2024-09-10 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49582:
-
Issue Type: Improvement  (was: Bug)

> Improve "dispatch_window_method" utility and docstring
> --
>
> Key: SPARK-49582
> URL: https://issues.apache.org/jira/browse/SPARK-49582
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> - Fix "dispatch_window_method" from always assuming the correct Window class 
> based on the environment to dynamically checking the type of the first 
> argument.
> - Improve docstrings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49582) Fix "dispatch_window_method" utility and docstring

2024-09-10 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49582:
-
Summary: Fix "dispatch_window_method" utility and docstring  (was: Fix 
"dispatch_window_method" utility and documentation)

> Fix "dispatch_window_method" utility and docstring
> --
>
> Key: SPARK-49582
> URL: https://issues.apache.org/jira/browse/SPARK-49582
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> - Fix "dispatch_window_method" from always assuming the correct Window class 
> based on the environment to dynamically checking the type of the first 
> argument.
> - Improve docstrings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49532) Improve documentation of "plotting.sample_ratio" option

2024-09-06 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-49532:


 Summary: Improve documentation of "plotting.sample_ratio" option
 Key: SPARK-49532
 URL: https://issues.apache.org/jira/browse/SPARK-49532
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Xinrong Meng


The current documentation incorrectly suggests that "plotting.sample_ratio" 
defaults to "plotting.max_rows". In reality, if "plotting.sample_ratio" is not 
explicitly set, it is *derived* based on the ratio of "plotting.max_rows" to 
the dataset size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49530) PySpark Plotting

2024-09-05 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-49530:


 Summary: PySpark Plotting
 Key: SPARK-49530
 URL: https://issues.apache.org/jira/browse/SPARK-49530
 Project: Spark
  Issue Type: Umbrella
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49521) Remove the ambiguous term “constructor” from the documentation for logical plan nodes

2024-09-04 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-49521:


 Summary: Remove the ambiguous term “constructor” from the 
documentation for logical plan nodes
 Key: SPARK-49521
 URL: https://issues.apache.org/jira/browse/SPARK-49521
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Xinrong Meng


There are three uses of "constructor" in the documentation for logical plan 
nodes, which are confusing due to their overlap in meaning with Scala 
constructors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49487) ProtoToParsedPlanTestSuite should recognize rule in BaseSessionStateBuilder.scala

2024-09-04 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49487:
-
Issue Type: Test  (was: Improvement)

> ProtoToParsedPlanTestSuite should recognize rule in 
> BaseSessionStateBuilder.scala
> -
>
> Key: SPARK-49487
> URL: https://issues.apache.org/jira/browse/SPARK-49487
> Project: Spark
>  Issue Type: Test
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
> A new Analyzer is created in the test suite ProtoToParsedPlanTestSuite, see 
> [here|https://github.com/apache/spark/blob/master/sql/connect/server/src/test/scala/org/apache/spark/sql/connect/ProtoToParsedPlanTestSuite.scala#L160].
> This is why it didn’t recognize the rule added in BaseSessionStateBuilder, 
> for example, 
> [here|https://github.com/apache/spark/pull/47884/files#diff-9806431743675ca892eb73a801af2f4c43086f87ecbc0c94900c8f18660f4a50R208].
> Overriding extendedResolutionRules in the new Analyzer of the test suite is 
> not feasible cause the SparkSession is null at the moment, leading to
>  
> {{[info]   java.lang.NullPointerException: Cannot invoke 
> "org.apache.spark.sql.SparkSession.sessionState()" because 
> "this.$outer.org$apache$spark$sql$catalyst$analysis$ResolveTranspose$$sparkSession"
>  is null}}
>  
> [https://github.com/apache/spark/pull/47884/files#diff-18773e9500b5f13ce74b6f9c01bfee44b6b5a70fc3378997cfb691c503d87bdaR182]
>  should be enabled after the fix of test suite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49487) ProtoToParsedPlanTestSuite should recognize rules in BaseSessionStateBuilder.scala

2024-09-04 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49487:
-
Summary: ProtoToParsedPlanTestSuite should recognize rules in 
BaseSessionStateBuilder.scala  (was: ProtoToParsedPlanTestSuite should 
recognize rule in BaseSessionStateBuilder.scala)

> ProtoToParsedPlanTestSuite should recognize rules in 
> BaseSessionStateBuilder.scala
> --
>
> Key: SPARK-49487
> URL: https://issues.apache.org/jira/browse/SPARK-49487
> Project: Spark
>  Issue Type: Test
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
> A new Analyzer is created in the test suite ProtoToParsedPlanTestSuite, see 
> [here|https://github.com/apache/spark/blob/master/sql/connect/server/src/test/scala/org/apache/spark/sql/connect/ProtoToParsedPlanTestSuite.scala#L160].
> This is why it didn’t recognize the rule added in BaseSessionStateBuilder, 
> for example, 
> [here|https://github.com/apache/spark/pull/47884/files#diff-9806431743675ca892eb73a801af2f4c43086f87ecbc0c94900c8f18660f4a50R208].
> Overriding extendedResolutionRules in the new Analyzer of the test suite is 
> not feasible cause the SparkSession is null at the moment, leading to
>  
> {{[info]   java.lang.NullPointerException: Cannot invoke 
> "org.apache.spark.sql.SparkSession.sessionState()" because 
> "this.$outer.org$apache$spark$sql$catalyst$analysis$ResolveTranspose$$sparkSession"
>  is null}}
>  
> [https://github.com/apache/spark/pull/47884/files#diff-18773e9500b5f13ce74b6f9c01bfee44b6b5a70fc3378997cfb691c503d87bdaR182]
>  should be enabled after the fix of test suite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49487) ProtoToParsedPlanTestSuite should recognize rule in BaseSessionStateBuilder.scala

2024-09-04 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49487:
-
Description: 
A new Analyzer is created in the test suite ProtoToParsedPlanTestSuite, see 
[here|https://github.com/apache/spark/blob/master/sql/connect/server/src/test/scala/org/apache/spark/sql/connect/ProtoToParsedPlanTestSuite.scala#L160].

This is why it didn’t recognize the rule added in BaseSessionStateBuilder, for 
example, 
[here|https://github.com/apache/spark/pull/47884/files#diff-9806431743675ca892eb73a801af2f4c43086f87ecbc0c94900c8f18660f4a50R208].

Overriding extendedResolutionRules in the new Analyzer of the test suite is not 
feasible cause the SparkSession is null at the moment, leading to

 

{{[info]   java.lang.NullPointerException: Cannot invoke 
"org.apache.spark.sql.SparkSession.sessionState()" because 
"this.$outer.org$apache$spark$sql$catalyst$analysis$ResolveTranspose$$sparkSession"
 is null}}

 

[https://github.com/apache/spark/pull/47884/files#diff-18773e9500b5f13ce74b6f9c01bfee44b6b5a70fc3378997cfb691c503d87bdaR182]
 should be enabled after the fix of test suite.

> ProtoToParsedPlanTestSuite should recognize rule in 
> BaseSessionStateBuilder.scala
> -
>
> Key: SPARK-49487
> URL: https://issues.apache.org/jira/browse/SPARK-49487
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
> A new Analyzer is created in the test suite ProtoToParsedPlanTestSuite, see 
> [here|https://github.com/apache/spark/blob/master/sql/connect/server/src/test/scala/org/apache/spark/sql/connect/ProtoToParsedPlanTestSuite.scala#L160].
> This is why it didn’t recognize the rule added in BaseSessionStateBuilder, 
> for example, 
> [here|https://github.com/apache/spark/pull/47884/files#diff-9806431743675ca892eb73a801af2f4c43086f87ecbc0c94900c8f18660f4a50R208].
> Overriding extendedResolutionRules in the new Analyzer of the test suite is 
> not feasible cause the SparkSession is null at the moment, leading to
>  
> {{[info]   java.lang.NullPointerException: Cannot invoke 
> "org.apache.spark.sql.SparkSession.sessionState()" because 
> "this.$outer.org$apache$spark$sql$catalyst$analysis$ResolveTranspose$$sparkSession"
>  is null}}
>  
> [https://github.com/apache/spark/pull/47884/files#diff-18773e9500b5f13ce74b6f9c01bfee44b6b5a70fc3378997cfb691c503d87bdaR182]
>  should be enabled after the fix of test suite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49487) ProtoToParsedPlanTestSuite should recognize rule in BaseSessionStateBuilder.scala

2024-09-04 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49487:
-
Environment: (was: Please see [this 
comment|https://github.com/apache/spark/pull/47884#issuecomment-2323642874] for 
context.


My guess is that the SparkSession in the ProtoToParsedPlanTestSuite does not 
pick up [this rule 
|https://github.com/apache/spark/pull/47884/files#diff-9806431743675ca892eb73a801af2f4c43086f87ecbc0c94900c8f18660f4a50]defined
 at BaseSessionStateBuilder.scala.

 )

> ProtoToParsedPlanTestSuite should recognize rule in 
> BaseSessionStateBuilder.scala
> -
>
> Key: SPARK-49487
> URL: https://issues.apache.org/jira/browse/SPARK-49487
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49487) ProtoToParsedPlanTestSuite should recognize rule in BaseSessionStateBuilder.scala

2024-09-01 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-49487:


 Summary: ProtoToParsedPlanTestSuite should recognize rule in 
BaseSessionStateBuilder.scala
 Key: SPARK-49487
 URL: https://issues.apache.org/jira/browse/SPARK-49487
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
 Environment: Please see [this 
comment|https://github.com/apache/spark/pull/47884#issuecomment-2323642874] for 
context.


My guess is that the SparkSession in the ProtoToParsedPlanTestSuite does not 
pick up [this rule 
|https://github.com/apache/spark/pull/47884/files#diff-9806431743675ca892eb73a801af2f4c43086f87ecbc0c94900c8f18660f4a50]defined
 at BaseSessionStateBuilder.scala.

 
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49383) Support Transpose DataFrame API

2024-08-27 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-49383:
-
Description: 
Support Transpose as Scala/Python DataFrame API in both Spark Connect and 
Classic Spark.

Transposing data is a crucial operation in data analysis, enabling the 
transformation of rows into columns. This operation is widely used in tools 
like pandas and numpy, allowing for more flexible data manipulation and 
visualization.

While Apache Spark supports unpivot and pivot operations, it currently lacks a 
built-in transpose function. Implementing a transpose operation in Spark would 
enhance its data processing capabilities, aligning it with the functionalities 
available in pandas and numpy, and further empowering users in their data 
analysis workflows.

Please see 
[https://docs.google.com/document/d/1QSmG81qQ-muab0UOeqgDAELqF7fJTH8GnxCJF4Ir-kA/edit]
 for a detailed design.

  was:
Support Transpose as Scala/Python DataFrame API in both Spark Connect and 
Classic Spark.

Transposing data is a crucial operation in data analysis, enabling the 
transformation of rows into columns. This operation is widely used in tools 
like pandas and numpy, allowing for more flexible data manipulation and 
visualization.

While Apache Spark supports unpivot and pivot operations, it currently lacks a 
built-in transpose function. Implementing a transpose operation in Spark would 
enhance its data processing capabilities, aligning it with the functionalities 
available in pandas and numpy, and further empowering users in their data 
analysis workflows.

 


> Support Transpose DataFrame API
> ---
>
> Key: SPARK-49383
> URL: https://issues.apache.org/jira/browse/SPARK-49383
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Support Transpose as Scala/Python DataFrame API in both Spark Connect and 
> Classic Spark.
> Transposing data is a crucial operation in data analysis, enabling the 
> transformation of rows into columns. This operation is widely used in tools 
> like pandas and numpy, allowing for more flexible data manipulation and 
> visualization.
> While Apache Spark supports unpivot and pivot operations, it currently lacks 
> a built-in transpose function. Implementing a transpose operation in Spark 
> would enhance its data processing capabilities, aligning it with the 
> functionalities available in pandas and numpy, and further empowering users 
> in their data analysis workflows.
> Please see 
> [https://docs.google.com/document/d/1QSmG81qQ-muab0UOeqgDAELqF7fJTH8GnxCJF4Ir-kA/edit]
>  for a detailed design.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48516) Turn on Arrow optimization for Python UDFs by default

2024-06-03 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-48516:


 Summary: Turn on Arrow optimization for Python UDFs by default
 Key: SPARK-48516
 URL: https://issues.apache.org/jira/browse/SPARK-48516
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Turn on Arrow optimization for Python UDFs by default



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48515) Enable Arrow optimization for Python UDFs

2024-06-03 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-48515:


 Summary: Enable Arrow optimization for Python UDFs
 Key: SPARK-48515
 URL: https://issues.apache.org/jira/browse/SPARK-48515
 Project: Spark
  Issue Type: Umbrella
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Enable Arrow optimization for Python UDFs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47891) Improve docstring of mapInPandas

2024-04-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-47891:
-
Description: 
Improve docstring of mapInPandas
 * "using a Python native function that takes and outputs a pandas DataFrame" 
is confusing cause the function takes and outputs "ITERATOR of pandas 
DataFrames" instead.
 * "All columns are passed together as an iterator of pandas DataFrames" easily 
mislead users to think the entire DataFrame will be passed together, "a batch 
of rows" is used instead.

  was:Improve docstring of mapInPandas


> Improve docstring of mapInPandas
> 
>
> Key: SPARK-47891
> URL: https://issues.apache.org/jira/browse/SPARK-47891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Improve docstring of mapInPandas
>  * "using a Python native function that takes and outputs a pandas DataFrame" 
> is confusing cause the function takes and outputs "ITERATOR of pandas 
> DataFrames" instead.
>  * "All columns are passed together as an iterator of pandas DataFrames" 
> easily mislead users to think the entire DataFrame will be passed together, 
> "a batch of rows" is used instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47876) Improve docstring of mapInArrow

2024-04-16 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-47876.
--
Resolution: Done

Resolved by https://github.com/apache/spark/pull/46088

> Improve docstring of mapInArrow
> ---
>
> Key: SPARK-47876
> URL: https://issues.apache.org/jira/browse/SPARK-47876
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Improve docstring of mapInArrow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47876) Improve docstring of mapInArrow

2024-04-16 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-47876:


 Summary: Improve docstring of mapInArrow
 Key: SPARK-47876
 URL: https://issues.apache.org/jira/browse/SPARK-47876
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Improve docstring of mapInArrow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47823) Improve appName and getOrCreate usage for Spark Connect

2024-04-11 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-47823:
-
Description: 
 

In Spark Connect
{code:java}
spark = SparkSession.builder.appName("...").getOrCreate(){code}
 

raises error

 
{code:java}
[CANNOT_CONFIGURE_SPARK_CONNECT_MASTER] Spark Connect server and Spark master 
cannot be configured together: Spark master [...], Spark Connect [...]{code}
 

We should ban the usage of appName in Spark Connect

 

  was:
 

In Spark Connect
{code:java}
spark = SparkSession.builder.appName("...").getOrCreate(){code}
 

raises error{{{}{}}}

 
{code:java}
[CANNOT_CONFIGURE_SPARK_CONNECT_MASTER] Spark Connect server and Spark master 
cannot be configured together: Spark master [...], Spark Connect [...]{code}
 

We should ban the usage of appName in Spark Connect

 


> Improve appName and getOrCreate usage for Spark Connect
> ---
>
> Key: SPARK-47823
> URL: https://issues.apache.org/jira/browse/SPARK-47823
> Project: Spark
>  Issue Type: Story
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
>  
> In Spark Connect
> {code:java}
> spark = SparkSession.builder.appName("...").getOrCreate(){code}
>  
> raises error
>  
> {code:java}
> [CANNOT_CONFIGURE_SPARK_CONNECT_MASTER] Spark Connect server and Spark master 
> cannot be configured together: Spark master [...], Spark Connect [...]{code}
>  
> We should ban the usage of appName in Spark Connect
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47823) Improve appName and getOrCreate usage for Spark Connect

2024-04-11 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-47823:


 Summary: Improve appName and getOrCreate usage for Spark Connect
 Key: SPARK-47823
 URL: https://issues.apache.org/jira/browse/SPARK-47823
 Project: Spark
  Issue Type: Story
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


 

In Spark Connect
{code:java}
spark = SparkSession.builder.appName("...").getOrCreate(){code}
 

raises error{{{}{}}}

 
{code:java}
[CANNOT_CONFIGURE_SPARK_CONNECT_MASTER] Spark Connect server and Spark master 
cannot be configured together: Spark master [...], Spark Connect [...]{code}
 

We should ban the usage of appName in Spark Connect

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47677) Pandas circular import error in Python 3.10

2024-04-01 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-47677:
-
Description: 
{{AttributeError: partially initialized module 'pandas' has no attribute 
'_pandas_datetime_CAPI' (most likely due to a circular import)}}

 

The above error appears in multiple tests with Python 3.10.

Python 3.11, 3.12 and pypy3 don't have the issue.

 

See [https://github.com/apache/spark/actions/runs/8469356110/job/23208894575] 
for details.

  was:
{{AttributeError: partially initialized module 'pandas' has no attribute 
'_pandas_datetime_CAPI' (most likely due to a circular import)}}

 

The above error appears in multiple tests with Python 3.10.

See [https://github.com/apache/spark/actions/runs/8469356110/job/23208894575] 
for details.


> Pandas circular import error in Python 3.10 
> 
>
> Key: SPARK-47677
> URL: https://issues.apache.org/jira/browse/SPARK-47677
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
> {{AttributeError: partially initialized module 'pandas' has no attribute 
> '_pandas_datetime_CAPI' (most likely due to a circular import)}}
>  
> The above error appears in multiple tests with Python 3.10.
> Python 3.11, 3.12 and pypy3 don't have the issue.
>  
> See [https://github.com/apache/spark/actions/runs/8469356110/job/23208894575] 
> for details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47677) Pandas circular import error in Python 3.10

2024-04-01 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-47677:


 Summary: Pandas circular import error in Python 3.10 
 Key: SPARK-47677
 URL: https://issues.apache.org/jira/browse/SPARK-47677
 Project: Spark
  Issue Type: Test
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Xinrong Meng


{{AttributeError: partially initialized module 'pandas' has no attribute 
'_pandas_datetime_CAPI' (most likely due to a circular import)}}

 

The above error appears in multiple tests with Python 3.10.

See [https://github.com/apache/spark/actions/runs/8469356110/job/23208894575] 
for details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47276) Introduce `spark.profile.clear` for SparkSession-based profiling

2024-03-07 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-47276.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45378
[https://github.com/apache/spark/pull/45378]

> Introduce `spark.profile.clear` for SparkSession-based profiling
> 
>
> Key: SPARK-47276
> URL: https://issues.apache.org/jira/browse/SPARK-47276
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Introduce `spark.profile.clear` for SparkSession-based profiling



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47276) Introduce `spark.profile.clear` for SparkSession-based profiling

2024-03-04 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-47276:


 Summary: Introduce `spark.profile.clear` for SparkSession-based 
profiling
 Key: SPARK-47276
 URL: https://issues.apache.org/jira/browse/SPARK-47276
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Introduce `spark.profile.clear` for SparkSession-based profiling



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46975) Support dedicated fallback methods

2024-02-23 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-46975.
--
Resolution: Done

Resolved by https://github.com/apache/spark/pull/45026

> Support dedicated fallback methods
> --
>
> Key: SPARK-46975
> URL: https://issues.apache.org/jira/browse/SPARK-46975
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46975) Support dedicated fallback methods

2024-02-23 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-46975:


Assignee: Ruifeng Zheng

> Support dedicated fallback methods
> --
>
> Key: SPARK-46975
> URL: https://issues.apache.org/jira/browse/SPARK-46975
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819779#comment-17819779
 ] 

Xinrong Meng edited comment on SPARK-47132 at 2/22/24 7:21 PM:
---

[~wunderalbert] would you double check if you set up your Jira account 
correctly? I somehow couldn't assign the ticket to you.

 

!image-2024-02-22-11-21-30-460.png!


was (Author: xinrongm):
[~wunderalbert] would you double check if you set up your Jira account 
correctly? I somehow couldn't assign the ticket to you.

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: image-2024-02-22-11-18-02-429.png, 
> image-2024-02-22-11-21-30-460.png
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819779#comment-17819779
 ] 

Xinrong Meng commented on SPARK-47132:
--

[~wunderalbert] would you double check if you set up your Jira account 
correctly? I somehow couldn't assign the ticket to you.

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: image-2024-02-22-11-18-02-429.png
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819780#comment-17819780
 ] 

Xinrong Meng commented on SPARK-47132:
--

Resolved by https://github.com/apache/spark/pull/45197.

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: image-2024-02-22-11-18-02-429.png
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-47132:
-
Attachment: image-2024-02-22-11-18-02-429.png

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: image-2024-02-22-11-18-02-429.png
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-47132:
-
Issue Type: Documentation  (was: Bug)

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-47132:
-
Affects Version/s: 4.0.0
   (was: 3.5.0)

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819777#comment-17819777
 ] 

Xinrong Meng commented on SPARK-47132:
--

I modified the ticket to Documentation (from Bug) and 4.0.0 (from 3.5.0).

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47078) Documentation for SparkSession-based Profilers

2024-02-16 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-47078:


 Summary: Documentation for SparkSession-based Profilers
 Key: SPARK-47078
 URL: https://issues.apache.org/jira/browse/SPARK-47078
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47014) Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession

2024-02-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-47014:


Assignee: Xinrong Meng

> Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession
> -
>
> Key: SPARK-47014
> URL: https://issues.apache.org/jira/browse/SPARK-47014
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47014) Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession

2024-02-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-47014.
--
Resolution: Done

Resolved by https://github.com/apache/spark/pull/45073

> Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession
> -
>
> Key: SPARK-47014
> URL: https://issues.apache.org/jira/browse/SPARK-47014
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47014) Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession

2024-02-08 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-47014:


 Summary: Implement methods dumpPerfProfiles and dumpMemoryProfiles 
of SparkSession
 Key: SPARK-47014
 URL: https://issues.apache.org/jira/browse/SPARK-47014
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46690) Support profiling on FlatMapCoGroupsInBatchExec

2024-02-08 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-46690:


Assignee: Xinrong Meng

> Support profiling on FlatMapCoGroupsInBatchExec
> ---
>
> Key: SPARK-46690
> URL: https://issues.apache.org/jira/browse/SPARK-46690
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46690) Support profiling on FlatMapCoGroupsInBatchExec

2024-02-08 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-46690.
--
Resolution: Done

Resolved by https://github.com/apache/spark/pull/45050

> Support profiling on FlatMapCoGroupsInBatchExec
> ---
>
> Key: SPARK-46690
> URL: https://issues.apache.org/jira/browse/SPARK-46690
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46689) Support profiling on FlatMapGroupsInBatchExec

2024-02-08 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-46689.
--
Resolution: Done

Resolved by https://github.com/apache/spark/pull/45050

> Support profiling on FlatMapGroupsInBatchExec
> -
>
> Key: SPARK-46689
> URL: https://issues.apache.org/jira/browse/SPARK-46689
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46689) Support profiling on FlatMapGroupsInBatchExec

2024-02-08 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-46689:


Assignee: Xinrong Meng

> Support profiling on FlatMapGroupsInBatchExec
> -
>
> Key: SPARK-46689
> URL: https://issues.apache.org/jira/browse/SPARK-46689
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46925) Add a warning that instructs to install memory_profiler for memory profiling

2024-01-30 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46925:


 Summary: Add a warning that instructs to install memory_profiler 
for memory profiling
 Key: SPARK-46925
 URL: https://issues.apache.org/jira/browse/SPARK-46925
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Add a warning that instructs to install memory_profiler for memory profiling



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46880) Improve and test warning for Arrow-optimized Python UDF

2024-01-26 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46880:


 Summary: Improve and test warning for Arrow-optimized Python UDF
 Key: SPARK-46880
 URL: https://issues.apache.org/jira/browse/SPARK-46880
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark, Tests
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Improve and test warning for Arrow-optimized Python UDF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46467) Improve and test exceptions of TimedeltaIndex

2024-01-19 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-46467.
--
  Assignee: Xinrong Meng
Resolution: Not A Problem

We don't have a plan to migrate Pandas API on Spark to PySpark error framework, 
instead, it should follow Pandas standard. So no proposed changes for now.

> Improve and test exceptions of TimedeltaIndex
> -
>
> Key: SPARK-46467
> URL: https://issues.apache.org/jira/browse/SPARK-46467
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46781) Test data source (pyspark.sql.datasource)

2024-01-19 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46781:


 Summary: Test data source (pyspark.sql.datasource)
 Key: SPARK-46781
 URL: https://issues.apache.org/jira/browse/SPARK-46781
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Test custom data source and input partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46781) Test custom data source and input partition (pyspark.sql.datasource)

2024-01-19 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46781:
-
Summary: Test custom data source and input partition 
(pyspark.sql.datasource)  (was: Test data source (pyspark.sql.datasource))

> Test custom data source and input partition (pyspark.sql.datasource)
> 
>
> Key: SPARK-46781
> URL: https://issues.apache.org/jira/browse/SPARK-46781
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Test custom data source and input partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42862) Review and fix issues in Core API docs

2024-01-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42862:
-
Parent: SPARK-42523  (was: SPARK-42693)

> Review and fix issues in Core API docs
> --
>
> Key: SPARK-42862
> URL: https://issues.apache.org/jira/browse/SPARK-42862
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Yuanjian Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42863) Review and fix issues in PySpark API docs

2024-01-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42863:
-
Parent: SPARK-42523  (was: SPARK-42693)

> Review and fix issues in PySpark API docs
> -
>
> Key: SPARK-42863
> URL: https://issues.apache.org/jira/browse/SPARK-42863
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42864) Review and fix issues in MLlib API docs

2024-01-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42864:
-
Parent: SPARK-42523  (was: SPARK-42693)

> Review and fix issues in MLlib API docs
> ---
>
> Key: SPARK-42864
> URL: https://issues.apache.org/jira/browse/SPARK-42864
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42861) Review and fix issues in SQL API docs

2024-01-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42861:
-
Parent: SPARK-42523  (was: SPARK-42693)

> Review and fix issues in SQL API docs
> -
>
> Key: SPARK-42861
> URL: https://issues.apache.org/jira/browse/SPARK-42861
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42866) Review and fix issues in Spark Connect - Scala API docs

2024-01-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42866:
-
Parent: SPARK-42523  (was: SPARK-42693)

> Review and fix issues in Spark Connect - Scala API docs
> ---
>
> Key: SPARK-42866
> URL: https://issues.apache.org/jira/browse/SPARK-42866
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42693) API Auditing

2024-01-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42693:
-
Parent: SPARK-42523
Issue Type: Sub-task  (was: Story)

> API Auditing
> 
>
> Key: SPARK-42693
> URL: https://issues.apache.org/jira/browse/SPARK-42693
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark, Spark Core, SQL, Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Blocker
>
> Audit user-facing API of Spark 3.4. The main goal is to ensure public API 
> docs to be ready for release, for example, no private classes/methods is 
> leaking and marked public.
> There are 3 common ways to audit API:
>  * build docs (into a local website) against branch-3.4 to check
>  * 'git diff' to check the source code differences between v3.3.2 and 
> branch-3.4
>  * [https://github.com/apache/spark-website/pull/443] shows most of the API 
> doc differences between v3.3.2 and the 3.4.0 RC4(the latest RC); commits are 
> categorized by components



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42523) Apache Spark 3.4 release

2024-01-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-42523.
--
Resolution: Done

> Apache Spark 3.4 release
> 
>
> Key: SPARK-42523
> URL: https://issues.apache.org/jira/browse/SPARK-42523
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> An umbrella for Apache Spark 3.4 release



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46467) Improve and test exceptions of TimedeltaIndex

2023-12-20 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46467:


 Summary: Improve and test exceptions of TimedeltaIndex
 Key: SPARK-46467
 URL: https://issues.apache.org/jira/browse/SPARK-46467
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46459) Fix bundler to 2.4.22 to unclock CI

2023-12-19 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46459:


 Summary: Fix bundler to 2.4.22 to unclock CI
 Key: SPARK-46459
 URL: https://issues.apache.org/jira/browse/SPARK-46459
 Project: Spark
  Issue Type: Story
  Components: Build, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Fix bundler to 2.4.22 to unclock CI



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46386) Improve assertions of observation (pyspark.sql.observation)

2023-12-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46386:
-
Summary: Improve assertions of observation (pyspark.sql.observation)  (was: 
Improve and test assertions of observation (pyspark.sql.observation))

> Improve assertions of observation (pyspark.sql.observation)
> ---
>
> Key: SPARK-46386
> URL: https://issues.apache.org/jira/browse/SPARK-46386
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46386) Improve and test assertions of observation (pyspark.sql.observation)

2023-12-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46386:
-
Parent: (was: SPARK-46041)
Issue Type: Improvement  (was: Sub-task)

> Improve and test assertions of observation (pyspark.sql.observation)
> 
>
> Key: SPARK-46386
> URL: https://issues.apache.org/jira/browse/SPARK-46386
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46413) Validate returnType of Arrow Python UDF

2023-12-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46413:
-
Description: Validate returnType of Arrow Python UDF  (was: Check 
returnType of Arrow Python UDF)

> Validate returnType of Arrow Python UDF
> ---
>
> Key: SPARK-46413
> URL: https://issues.apache.org/jira/browse/SPARK-46413
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Validate returnType of Arrow Python UDF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46413) Validate returnType of Arrow Python UDF

2023-12-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46413:
-
Summary: Validate returnType of Arrow Python UDF  (was: Check returnType of 
Arrow Python UDF)

> Validate returnType of Arrow Python UDF
> ---
>
> Key: SPARK-46413
> URL: https://issues.apache.org/jira/browse/SPARK-46413
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Check returnType of Arrow Python UDF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46413) Check returnType of Arrow Python UDF

2023-12-14 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46413:


 Summary: Check returnType of Arrow Python UDF
 Key: SPARK-46413
 URL: https://issues.apache.org/jira/browse/SPARK-46413
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Check returnType of Arrow Python UDF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46398) Test rangeBetween window function (pyspark.sql.window)

2023-12-13 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46398:


 Summary: Test rangeBetween window function (pyspark.sql.window)
 Key: SPARK-46398
 URL: https://issues.apache.org/jira/browse/SPARK-46398
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46386) Improve and test assertions of observation (pyspark.sql.observation)

2023-12-12 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46386:


 Summary: Improve and test assertions of observation 
(pyspark.sql.observation)
 Key: SPARK-46386
 URL: https://issues.apache.org/jira/browse/SPARK-46386
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46385) Test aggregate functions for groups (pyspark.sql.group)

2023-12-12 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46385:


 Summary: Test aggregate functions for groups (pyspark.sql.group)
 Key: SPARK-46385
 URL: https://issues.apache.org/jira/browse/SPARK-46385
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46277) Validate startup urls with the config being set

2023-12-07 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-46277.
--
Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/44194

> Validate startup urls with the config being set
> ---
>
> Key: SPARK-46277
> URL: https://issues.apache.org/jira/browse/SPARK-46277
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-12-05-15-39-08-830.png
>
>
> !image-2023-12-05-15-39-08-830.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46277) Validate startup urls with the config being set

2023-12-07 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-46277:


Assignee: Xinrong Meng

> Validate startup urls with the config being set
> ---
>
> Key: SPARK-46277
> URL: https://issues.apache.org/jira/browse/SPARK-46277
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-12-05-15-39-08-830.png
>
>
> !image-2023-12-05-15-39-08-830.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46291) Koalas Testing Migration

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46291:
-
Description: Test migration from Koalas to Spark repository, including 
setting up the testing environment and dependencies, and CI jobs.

> Koalas Testing Migration
> 
>
> Key: SPARK-46291
> URL: https://issues.apache.org/jira/browse/SPARK-46291
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Test migration from Koalas to Spark repository, including setting up the 
> testing environment and dependencies, and CI jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46291) Koalas Testing Migration

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46291:
-
Summary: Koalas Testing Migration  (was: Testing migration)

> Koalas Testing Migration
> 
>
> Key: SPARK-46291
> URL: https://issues.apache.org/jira/browse/SPARK-46291
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46291) Testing migration

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-46291:


Assignee: Xinrong Meng

> Testing migration
> -
>
> Key: SPARK-46291
> URL: https://issues.apache.org/jira/browse/SPARK-46291
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46291) Testing migration

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-46291.
--
Resolution: Done

> Testing migration
> -
>
> Key: SPARK-46291
> URL: https://issues.apache.org/jira/browse/SPARK-46291
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34999) Consolidate PySpark testing utils

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-34999:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Consolidate PySpark testing utils
> -
>
> Key: SPARK-34999
> URL: https://issues.apache.org/jira/browse/SPARK-34999
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> `python/pyspark/pandas/testing` hold test utilites for pandas-on-spark, and 
> `python/pyspark/testing` contain test utilities for pyspark. Consolidating 
> them makes code cleaner and easier to maintain.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35012) Port Koalas DataFrame related unit tests into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35012:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port Koalas DataFrame related unit tests into PySpark
> -
>
> Key: SPARK-35012
> URL: https://issues.apache.org/jira/browse/SPARK-35012
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas DataFrame related unit tests to [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35300) Standardize module name in install.rst

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35300:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Standardize module name in install.rst
> --
>
> Key: SPARK-35300
> URL: https://issues.apache.org/jira/browse/SPARK-35300
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> We should use the full names of modules in install.rst.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35034) Port Koalas miscellaneous unit tests into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35034:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port Koalas miscellaneous unit tests into PySpark
> -
>
> Key: SPARK-35034
> URL: https://issues.apache.org/jira/browse/SPARK-35034
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas miscellaneous unit tests to [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35035) Port Koalas internal implementation unit tests into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35035:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port Koalas internal implementation unit tests into PySpark
> ---
>
> Key: SPARK-35035
> URL: https://issues.apache.org/jira/browse/SPARK-35035
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas internal implementation related unit tests to 
> [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >