[jira] [Resolved] (SPARK-49776) Support pie plots
[ https://issues.apache.org/jira/browse/SPARK-49776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-49776. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 48256 [https://github.com/apache/spark/pull/48256] > Support pie plots > - > > Key: SPARK-49776 > URL: https://issues.apache.org/jira/browse/SPARK-49776 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49796) Support pie subplots with plotly backend
[ https://issues.apache.org/jira/browse/SPARK-49796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49796: - Summary: Support pie subplots with plotly backend (was: Support pie subplots) > Support pie subplots with plotly backend > > > Key: SPARK-49796 > URL: https://issues.apache.org/jira/browse/SPARK-49796 > Project: Spark > Issue Type: Sub-task > Components: Connect, PS, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49796) Support pie subplots
Xinrong Meng created SPARK-49796: Summary: Support pie subplots Key: SPARK-49796 URL: https://issues.apache.org/jira/browse/SPARK-49796 Project: Spark Issue Type: Sub-task Components: Connect, PS, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49792) Upgrade to numpy 2 for building and testing Spark branches
[ https://issues.apache.org/jira/browse/SPARK-49792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49792: - Summary: Upgrade to numpy 2 for building and testing Spark branches (was: Upgrade NumPy to 2.1.0) > Upgrade to numpy 2 for building and testing Spark branches > -- > > Key: SPARK-49792 > URL: https://issues.apache.org/jira/browse/SPARK-49792 > Project: Spark > Issue Type: Story > Components: Build, PS >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49793) Enable PredictBatchUDFTests.test_caching for NumPy 2
[ https://issues.apache.org/jira/browse/SPARK-49793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49793: - Description: {code:java} import numpy as np import pandas as pd from pyspark.ml.functions import predict_batch_udf from pyspark.sql.types import DoubleType from pyspark.sql.functions import struct data = np.arange(0, 36, dtype=np.float64).reshape(-1, 4) pdf = pd.DataFrame(data, columns=["a", "b", "c", "d"]) df = spark.createDataFrame(pdf) def make_predict_fn(): fake_output = np.random.random() def predict(inputs): return np.array([fake_output for i in inputs]) return predict identity = predict_batch_udf(make_predict_fn, return_type=DoubleType(), batch_size=5) df1 = df.withColumn("preds", identity(struct("a"))).toPandas() df2 = df.withColumn("preds", identity(struct("a"))).toPandas() {code} NumPy 2.1.0 {code:java} >>> df1 a b c d preds 0 0.0 1.0 2.0 3.0 0.431752 1 4.0 5.0 6.0 7.0 0.912097 2 8.0 9.0 10.0 11.0 0.679628 3 12.0 13.0 14.0 15.0 0.853850 4 16.0 17.0 18.0 19.0 0.389971 5 20.0 21.0 22.0 23.0 0.654521 6 24.0 25.0 26.0 27.0 0.430569 7 28.0 29.0 30.0 31.0 0.331055 8 32.0 33.0 34.0 35.0 0.306073 >>> df2 a b c d preds 0 0.0 1.0 2.0 3.0 0.679628 1 4.0 5.0 6.0 7.0 0.430569 2 8.0 9.0 10.0 11.0 0.853850 3 12.0 13.0 14.0 15.0 0.306073 4 16.0 17.0 18.0 19.0 0.654521 5 20.0 21.0 22.0 23.0 0.389971 6 24.0 25.0 26.0 27.0 0.507598 7 28.0 29.0 30.0 31.0 0.912097 8 32.0 33.0 34.0 35.0 0.431752 {code} which should be {code:java} >>> df1 a b c d preds 0 0.0 1.0 2.0 3.0 0.685941 1 4.0 5.0 6.0 7.0 0.685941 2 8.0 9.0 10.0 11.0 0.685941 3 12.0 13.0 14.0 15.0 0.685941 4 16.0 17.0 18.0 19.0 0.685941 5 20.0 21.0 22.0 23.0 0.685941 6 24.0 25.0 26.0 27.0 0.685941 7 28.0 29.0 30.0 31.0 0.685941 8 32.0 33.0 34.0 35.0 0.685941 >>> df2 a b c d preds 0 0.0 1.0 2.0 3.0 0.685941 1 4.0 5.0 6.0 7.0 0.685941 2 8.0 9.0 10.0 11.0 0.685941 3 12.0 13.0 14.0 15.0 0.685941 4 16.0 17.0 18.0 19.0 0.685941 5 20.0 21.0 22.0 23.0 0.685941 6 24.0 25.0 26.0 27.0 0.685941 7 28.0 29.0 30.0 31.0 0.685941 8 32.0 33.0 34.0 35.0 0.685941 {code} > Enable PredictBatchUDFTests.test_caching for NumPy 2 > > > Key: SPARK-49793 > URL: https://issues.apache.org/jira/browse/SPARK-49793 > Project: Spark > Issue Type: Story > Components: ML, Tests >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > > {code:java} > import numpy as np > import pandas as pd > from pyspark.ml.functions import predict_batch_udf > from pyspark.sql.types import DoubleType > from pyspark.sql.functions import struct > data = np.arange(0, 36, dtype=np.float64).reshape(-1, 4) > pdf = pd.DataFrame(data, columns=["a", "b", "c", "d"]) > df = spark.createDataFrame(pdf) > def make_predict_fn(): > fake_output = np.random.random() > def predict(inputs): > return np.array([fake_output for i in inputs]) > return predict > > identity = predict_batch_udf(make_predict_fn, return_type=DoubleType(), > batch_size=5) > df1 = df.withColumn("preds", identity(struct("a"))).toPandas() > df2 = df.withColumn("preds", identity(struct("a"))).toPandas() > {code} > NumPy 2.1.0 > {code:java} > >>> df1 > a b c d preds > 0 0.0 1.0 2.0 3.0 0.431752 > 1 4.0 5.0 6.0 7.0 0.912097 > 2 8.0 9.0 10.0 11.0 0.679628 > 3 12.0 13.0 14.0 15.0 0.853850 > 4 16.0 17.0 18.0 19.0 0.389971 > 5 20.0 21.0 22.0 23.0 0.654521 > 6 24.0 25.0 26.0 27.0 0.430569 > 7 28.0 29.0 30.0 31.0 0.331055 > 8 32.0 33.0 34.0 35.0 0.306073 > >>> df2 > a b c d preds > 0 0.0 1.0 2.0 3.0 0.679628 > 1 4.0 5.0 6.0 7.0 0.430569 > 2 8.0 9.0 10.0 11.0 0.853850 > 3 12.0 13.0 14.0 15.0 0.306073 > 4 16.0 17.0 18.0 19.0 0.654521 > 5 20.0 21.0 22.0 23.0 0.389971 > 6 24.0 25.0 26.0 27.0 0.507598 > 7 28.0 29.0 30.0 31.0 0.912097 > 8 32.0 33.0 34.0 35.0 0.431752 {code} > which should be > {code:java} > >>> df1 > a b c d preds > 0 0.0 1.0 2.0 3.0 0.685941 > 1 4.0 5.0 6.0 7.0 0.685941 > 2 8.0 9.0 10.0 11.0 0.685941 > 3 12.0 13.0 14.0 15.0 0.685941 > 4 16.0 17.0 18.0 19.0 0.685941 > 5 20.0 21.0 22.0 23.0 0.685941 > 6 24.0 25.0 26.0 27.0 0.685941 > 7 28.0 29.0 30.0 31.0 0.685941 > 8 32.0 33.0 34.0 35.0 0.685941 > >>> df2 > a b c d preds > 0 0.0 1.0 2.0 3.0 0.685941 > 1 4.0 5.0 6.0 7.0 0.685941 > 2 8.0
[jira] [Created] (SPARK-49793) Enable PredictBatchUDFTests.test_caching for NumPy 2
Xinrong Meng created SPARK-49793: Summary: Enable PredictBatchUDFTests.test_caching for NumPy 2 Key: SPARK-49793 URL: https://issues.apache.org/jira/browse/SPARK-49793 Project: Spark Issue Type: Story Components: ML, Tests Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49776) Support pie plots
Xinrong Meng created SPARK-49776: Summary: Support pie plots Key: SPARK-49776 URL: https://issues.apache.org/jira/browse/SPARK-49776 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49694) Support scatter plots
[ https://issues.apache.org/jira/browse/SPARK-49694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-49694: Assignee: Xinrong Meng > Support scatter plots > - > > Key: SPARK-49694 > URL: https://issues.apache.org/jira/browse/SPARK-49694 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-49694) Support scatter plots
[ https://issues.apache.org/jira/browse/SPARK-49694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-49694. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 48219 [https://github.com/apache/spark/pull/48219] > Support scatter plots > - > > Key: SPARK-49694 > URL: https://issues.apache.org/jira/browse/SPARK-49694 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49765) Adjust documentation of "spark.sql.pyspark.plotting.max_rows"
Xinrong Meng created SPARK-49765: Summary: Adjust documentation of "spark.sql.pyspark.plotting.max_rows" Key: SPARK-49765 URL: https://issues.apache.org/jira/browse/SPARK-49765 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49764) Support area plots
Xinrong Meng created SPARK-49764: Summary: Support area plots Key: SPARK-49764 URL: https://issues.apache.org/jira/browse/SPARK-49764 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49626) Support horizontal and vertical bar plots
[ https://issues.apache.org/jira/browse/SPARK-49626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-49626: Assignee: Xinrong Meng > Support horizontal and vertical bar plots > - > > Key: SPARK-49626 > URL: https://issues.apache.org/jira/browse/SPARK-49626 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Support horizontal and vertical bar plot -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-49626) Support horizontal and vertical bar plots
[ https://issues.apache.org/jira/browse/SPARK-49626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-49626. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 48100 [https://github.com/apache/spark/pull/48100] > Support horizontal and vertical bar plots > - > > Key: SPARK-49626 > URL: https://issues.apache.org/jira/browse/SPARK-49626 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Support horizontal and vertical bar plot -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49716) Imrpove documentation and test of barh plot
Xinrong Meng created SPARK-49716: Summary: Imrpove documentation and test of barh plot Key: SPARK-49716 URL: https://issues.apache.org/jira/browse/SPARK-49716 Project: Spark Issue Type: Sub-task Components: PS Affects Versions: 4.0.0 Reporter: Xinrong Meng - Update the documentation for barh plot to clarify the difference between axis interpretation in Plotly and Matplotlib. - Test multiple columns as category axis. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49716) Fix documentation and add test of barh plot
[ https://issues.apache.org/jira/browse/SPARK-49716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49716: - Component/s: Documentation > Fix documentation and add test of barh plot > --- > > Key: SPARK-49716 > URL: https://issues.apache.org/jira/browse/SPARK-49716 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PS >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > - Update the documentation for barh plot to clarify the difference between > axis interpretation in Plotly and Matplotlib. > - Test multiple columns as category axis. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49716) Fix documentation and add test of barh plot
[ https://issues.apache.org/jira/browse/SPARK-49716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49716: - Summary: Fix documentation and add test of barh plot (was: Imrpove documentation and test of barh plot) > Fix documentation and add test of barh plot > --- > > Key: SPARK-49716 > URL: https://issues.apache.org/jira/browse/SPARK-49716 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > - Update the documentation for barh plot to clarify the difference between > axis interpretation in Plotly and Matplotlib. > - Test multiple columns as category axis. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49694) Support scatter plots
Xinrong Meng created SPARK-49694: Summary: Support scatter plots Key: SPARK-49694 URL: https://issues.apache.org/jira/browse/SPARK-49694 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49626) Support horizontal and vertical bar plots
[ https://issues.apache.org/jira/browse/SPARK-49626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49626: - Summary: Support horizontal and vertical bar plots (was: Support horizontal and vertical bar plot) > Support horizontal and vertical bar plots > - > > Key: SPARK-49626 > URL: https://issues.apache.org/jira/browse/SPARK-49626 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > Support horizontal and vertical bar plot -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49626) Support horizontal and vertical bar plot
Xinrong Meng created SPARK-49626: Summary: Support horizontal and vertical bar plot Key: SPARK-49626 URL: https://issues.apache.org/jira/browse/SPARK-49626 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Support horizontal and vertical bar plot -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49607) Eliminate "spark.sql.pyspark.plotting.sample_ratio" config
Xinrong Meng created SPARK-49607: Summary: Eliminate "spark.sql.pyspark.plotting.sample_ratio" config Key: SPARK-49607 URL: https://issues.apache.org/jira/browse/SPARK-49607 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng We may eliminate "spark.sql.pyspark.plotting.sample_ratio" config later with a better sampling approach. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49606) Improve documentation of Pandas on Spark plotting API
Xinrong Meng created SPARK-49606: Summary: Improve documentation of Pandas on Spark plotting API Key: SPARK-49606 URL: https://issues.apache.org/jira/browse/SPARK-49606 Project: Spark Issue Type: Sub-task Components: Documentation, PS Affects Versions: 4.0.0 Reporter: Xinrong Meng Improve documentation of Pandas on Spark plotting API following pandas 2.2 (stable), see https://pandas.pydata.org/docs/reference/frame.html. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49531) Support line plot with plotly backend
[ https://issues.apache.org/jira/browse/SPARK-49531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49531: - Description: While Pandas on Spark supports plotting, PySpark currently lacks this feature. The proposed API will enable users to generate visualizations, such as line plots, by leveraging libraries like Plotly. This will provide users with an intuitive, interactive way to explore and understand large datasets directly from PySpark DataFrames, streamlining the data analysis workflow in distributed environments. See more at [https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit?usp=sharing] for PySpark Plotting API Specification. was: While Pandas on Spark supports plotting, PySpark currently lacks this feature. The proposed API will enable users to generate visualizations, such as line plots, by leveraging libraries like Plotly. This will provide users with an intuitive, interactive way to explore and understand large datasets directly from PySpark DataFrames, streamlining the data analysis workflow in distributed environments. See more at [https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit |https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit.]for PySpark Plotting API Specification. > Support line plot with plotly backend > - > > Key: SPARK-49531 > URL: https://issues.apache.org/jira/browse/SPARK-49531 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > While Pandas on Spark supports plotting, PySpark currently lacks this > feature. The proposed API will enable users to generate visualizations, such > as line plots, by leveraging libraries like Plotly. This will provide users > with an intuitive, interactive way to explore and understand large datasets > directly from PySpark DataFrames, streamlining the data analysis workflow in > distributed environments. > > See more at > [https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit?usp=sharing] > for PySpark Plotting API Specification. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49531) Support line plot with plotly backend
[ https://issues.apache.org/jira/browse/SPARK-49531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49531: - Description: While Pandas on Spark supports plotting, PySpark currently lacks this feature. The proposed API will enable users to generate visualizations, such as line plots, by leveraging libraries like Plotly. This will provide users with an intuitive, interactive way to explore and understand large datasets directly from PySpark DataFrames, streamlining the data analysis workflow in distributed environments. See more at [https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit |https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit.]for PySpark Plotting API Specification. was: While Pandas on Spark supports plotting, PySpark currently lacks this feature. The proposed API will enable users to generate visualizations, such as line plots, by leveraging libraries like Plotly. This will provide users with an intuitive, interactive way to explore and understand large datasets directly from PySpark DataFrames, streamlining the data analysis workflow in distributed environments. See more at [https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit.] > Support line plot with plotly backend > - > > Key: SPARK-49531 > URL: https://issues.apache.org/jira/browse/SPARK-49531 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > While Pandas on Spark supports plotting, PySpark currently lacks this > feature. The proposed API will enable users to generate visualizations, such > as line plots, by leveraging libraries like Plotly. This will provide users > with an intuitive, interactive way to explore and understand large datasets > directly from PySpark DataFrames, streamlining the data analysis workflow in > distributed environments. > > See more at > [https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit > > |https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit.]for > PySpark Plotting API Specification. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49531) Support line plot with plotly backend
[ https://issues.apache.org/jira/browse/SPARK-49531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49531: - Description: While Pandas on Spark supports plotting, PySpark currently lacks this feature. The proposed API will enable users to generate visualizations, such as line plots, by leveraging libraries like Plotly. This will provide users with an intuitive, interactive way to explore and understand large datasets directly from PySpark DataFrames, streamlining the data analysis workflow in distributed environments. See more at [https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit.] was:Pandas on Spark DataFrame supports plotting, but PySpark DataFrame does not. Enabling line plot for PySpark DataFrame with Plotly backend as default (initially), would improve data visualization capabilities. > Support line plot with plotly backend > - > > Key: SPARK-49531 > URL: https://issues.apache.org/jira/browse/SPARK-49531 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > While Pandas on Spark supports plotting, PySpark currently lacks this > feature. The proposed API will enable users to generate visualizations, such > as line plots, by leveraging libraries like Plotly. This will provide users > with an intuitive, interactive way to explore and understand large datasets > directly from PySpark DataFrames, streamlining the data analysis workflow in > distributed environments. > > See more at > [https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit.] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49531) Support line plot with plotly backend
[ https://issues.apache.org/jira/browse/SPARK-49531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49531: - Description: Pandas on Spark DataFrame supports plotting, but PySpark DataFrame does not. Enabling line plot for PySpark DataFrame with Plotly backend as default (initially), would improve data visualization capabilities. > Support line plot with plotly backend > - > > Key: SPARK-49531 > URL: https://issues.apache.org/jira/browse/SPARK-49531 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Pandas on Spark DataFrame supports plotting, but PySpark DataFrame does not. > Enabling line plot for PySpark DataFrame with Plotly backend as default > (initially), would improve data visualization capabilities. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49595) Fix DataFrame.unpivot/melt in Spark Connect Scala Client
[ https://issues.apache.org/jira/browse/SPARK-49595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49595: - Summary: Fix DataFrame.unpivot/melt in Spark Connect Scala Client (was: Fix DataFrame.unpivot/melt in Spark Connect) > Fix DataFrame.unpivot/melt in Spark Connect Scala Client > > > Key: SPARK-49595 > URL: https://issues.apache.org/jira/browse/SPARK-49595 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49595) Fix DataFrame.unpivot/melt in Spark Connect
Xinrong Meng created SPARK-49595: Summary: Fix DataFrame.unpivot/melt in Spark Connect Key: SPARK-49595 URL: https://issues.apache.org/jira/browse/SPARK-49595 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49582) Improve "dispatch_window_method" utility and docstring
[ https://issues.apache.org/jira/browse/SPARK-49582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49582: - Summary: Improve "dispatch_window_method" utility and docstring (was: Fix "dispatch_window_method" utility and docstring) > Improve "dispatch_window_method" utility and docstring > -- > > Key: SPARK-49582 > URL: https://issues.apache.org/jira/browse/SPARK-49582 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > - Fix "dispatch_window_method" from always assuming the correct Window class > based on the environment to dynamically checking the type of the first > argument. > - Improve docstrings. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49582) Improve "dispatch_window_method" utility and docstring
[ https://issues.apache.org/jira/browse/SPARK-49582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49582: - Issue Type: Improvement (was: Bug) > Improve "dispatch_window_method" utility and docstring > -- > > Key: SPARK-49582 > URL: https://issues.apache.org/jira/browse/SPARK-49582 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > - Fix "dispatch_window_method" from always assuming the correct Window class > based on the environment to dynamically checking the type of the first > argument. > - Improve docstrings. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49582) Fix "dispatch_window_method" utility and docstring
[ https://issues.apache.org/jira/browse/SPARK-49582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49582: - Summary: Fix "dispatch_window_method" utility and docstring (was: Fix "dispatch_window_method" utility and documentation) > Fix "dispatch_window_method" utility and docstring > -- > > Key: SPARK-49582 > URL: https://issues.apache.org/jira/browse/SPARK-49582 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > - Fix "dispatch_window_method" from always assuming the correct Window class > based on the environment to dynamically checking the type of the first > argument. > - Improve docstrings. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49532) Improve documentation of "plotting.sample_ratio" option
Xinrong Meng created SPARK-49532: Summary: Improve documentation of "plotting.sample_ratio" option Key: SPARK-49532 URL: https://issues.apache.org/jira/browse/SPARK-49532 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 4.0.0 Reporter: Xinrong Meng The current documentation incorrectly suggests that "plotting.sample_ratio" defaults to "plotting.max_rows". In reality, if "plotting.sample_ratio" is not explicitly set, it is *derived* based on the ratio of "plotting.max_rows" to the dataset size. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49530) PySpark Plotting
Xinrong Meng created SPARK-49530: Summary: PySpark Plotting Key: SPARK-49530 URL: https://issues.apache.org/jira/browse/SPARK-49530 Project: Spark Issue Type: Umbrella Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49521) Remove the ambiguous term “constructor” from the documentation for logical plan nodes
Xinrong Meng created SPARK-49521: Summary: Remove the ambiguous term “constructor” from the documentation for logical plan nodes Key: SPARK-49521 URL: https://issues.apache.org/jira/browse/SPARK-49521 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 4.0.0 Reporter: Xinrong Meng There are three uses of "constructor" in the documentation for logical plan nodes, which are confusing due to their overlap in meaning with Scala constructors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49487) ProtoToParsedPlanTestSuite should recognize rule in BaseSessionStateBuilder.scala
[ https://issues.apache.org/jira/browse/SPARK-49487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49487: - Issue Type: Test (was: Improvement) > ProtoToParsedPlanTestSuite should recognize rule in > BaseSessionStateBuilder.scala > - > > Key: SPARK-49487 > URL: https://issues.apache.org/jira/browse/SPARK-49487 > Project: Spark > Issue Type: Test > Components: Connect >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > A new Analyzer is created in the test suite ProtoToParsedPlanTestSuite, see > [here|https://github.com/apache/spark/blob/master/sql/connect/server/src/test/scala/org/apache/spark/sql/connect/ProtoToParsedPlanTestSuite.scala#L160]. > This is why it didn’t recognize the rule added in BaseSessionStateBuilder, > for example, > [here|https://github.com/apache/spark/pull/47884/files#diff-9806431743675ca892eb73a801af2f4c43086f87ecbc0c94900c8f18660f4a50R208]. > Overriding extendedResolutionRules in the new Analyzer of the test suite is > not feasible cause the SparkSession is null at the moment, leading to > > {{[info] java.lang.NullPointerException: Cannot invoke > "org.apache.spark.sql.SparkSession.sessionState()" because > "this.$outer.org$apache$spark$sql$catalyst$analysis$ResolveTranspose$$sparkSession" > is null}} > > [https://github.com/apache/spark/pull/47884/files#diff-18773e9500b5f13ce74b6f9c01bfee44b6b5a70fc3378997cfb691c503d87bdaR182] > should be enabled after the fix of test suite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49487) ProtoToParsedPlanTestSuite should recognize rules in BaseSessionStateBuilder.scala
[ https://issues.apache.org/jira/browse/SPARK-49487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49487: - Summary: ProtoToParsedPlanTestSuite should recognize rules in BaseSessionStateBuilder.scala (was: ProtoToParsedPlanTestSuite should recognize rule in BaseSessionStateBuilder.scala) > ProtoToParsedPlanTestSuite should recognize rules in > BaseSessionStateBuilder.scala > -- > > Key: SPARK-49487 > URL: https://issues.apache.org/jira/browse/SPARK-49487 > Project: Spark > Issue Type: Test > Components: Connect >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > A new Analyzer is created in the test suite ProtoToParsedPlanTestSuite, see > [here|https://github.com/apache/spark/blob/master/sql/connect/server/src/test/scala/org/apache/spark/sql/connect/ProtoToParsedPlanTestSuite.scala#L160]. > This is why it didn’t recognize the rule added in BaseSessionStateBuilder, > for example, > [here|https://github.com/apache/spark/pull/47884/files#diff-9806431743675ca892eb73a801af2f4c43086f87ecbc0c94900c8f18660f4a50R208]. > Overriding extendedResolutionRules in the new Analyzer of the test suite is > not feasible cause the SparkSession is null at the moment, leading to > > {{[info] java.lang.NullPointerException: Cannot invoke > "org.apache.spark.sql.SparkSession.sessionState()" because > "this.$outer.org$apache$spark$sql$catalyst$analysis$ResolveTranspose$$sparkSession" > is null}} > > [https://github.com/apache/spark/pull/47884/files#diff-18773e9500b5f13ce74b6f9c01bfee44b6b5a70fc3378997cfb691c503d87bdaR182] > should be enabled after the fix of test suite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49487) ProtoToParsedPlanTestSuite should recognize rule in BaseSessionStateBuilder.scala
[ https://issues.apache.org/jira/browse/SPARK-49487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49487: - Description: A new Analyzer is created in the test suite ProtoToParsedPlanTestSuite, see [here|https://github.com/apache/spark/blob/master/sql/connect/server/src/test/scala/org/apache/spark/sql/connect/ProtoToParsedPlanTestSuite.scala#L160]. This is why it didn’t recognize the rule added in BaseSessionStateBuilder, for example, [here|https://github.com/apache/spark/pull/47884/files#diff-9806431743675ca892eb73a801af2f4c43086f87ecbc0c94900c8f18660f4a50R208]. Overriding extendedResolutionRules in the new Analyzer of the test suite is not feasible cause the SparkSession is null at the moment, leading to {{[info] java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.SparkSession.sessionState()" because "this.$outer.org$apache$spark$sql$catalyst$analysis$ResolveTranspose$$sparkSession" is null}} [https://github.com/apache/spark/pull/47884/files#diff-18773e9500b5f13ce74b6f9c01bfee44b6b5a70fc3378997cfb691c503d87bdaR182] should be enabled after the fix of test suite. > ProtoToParsedPlanTestSuite should recognize rule in > BaseSessionStateBuilder.scala > - > > Key: SPARK-49487 > URL: https://issues.apache.org/jira/browse/SPARK-49487 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > A new Analyzer is created in the test suite ProtoToParsedPlanTestSuite, see > [here|https://github.com/apache/spark/blob/master/sql/connect/server/src/test/scala/org/apache/spark/sql/connect/ProtoToParsedPlanTestSuite.scala#L160]. > This is why it didn’t recognize the rule added in BaseSessionStateBuilder, > for example, > [here|https://github.com/apache/spark/pull/47884/files#diff-9806431743675ca892eb73a801af2f4c43086f87ecbc0c94900c8f18660f4a50R208]. > Overriding extendedResolutionRules in the new Analyzer of the test suite is > not feasible cause the SparkSession is null at the moment, leading to > > {{[info] java.lang.NullPointerException: Cannot invoke > "org.apache.spark.sql.SparkSession.sessionState()" because > "this.$outer.org$apache$spark$sql$catalyst$analysis$ResolveTranspose$$sparkSession" > is null}} > > [https://github.com/apache/spark/pull/47884/files#diff-18773e9500b5f13ce74b6f9c01bfee44b6b5a70fc3378997cfb691c503d87bdaR182] > should be enabled after the fix of test suite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49487) ProtoToParsedPlanTestSuite should recognize rule in BaseSessionStateBuilder.scala
[ https://issues.apache.org/jira/browse/SPARK-49487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49487: - Environment: (was: Please see [this comment|https://github.com/apache/spark/pull/47884#issuecomment-2323642874] for context. My guess is that the SparkSession in the ProtoToParsedPlanTestSuite does not pick up [this rule |https://github.com/apache/spark/pull/47884/files#diff-9806431743675ca892eb73a801af2f4c43086f87ecbc0c94900c8f18660f4a50]defined at BaseSessionStateBuilder.scala. ) > ProtoToParsedPlanTestSuite should recognize rule in > BaseSessionStateBuilder.scala > - > > Key: SPARK-49487 > URL: https://issues.apache.org/jira/browse/SPARK-49487 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49487) ProtoToParsedPlanTestSuite should recognize rule in BaseSessionStateBuilder.scala
Xinrong Meng created SPARK-49487: Summary: ProtoToParsedPlanTestSuite should recognize rule in BaseSessionStateBuilder.scala Key: SPARK-49487 URL: https://issues.apache.org/jira/browse/SPARK-49487 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Environment: Please see [this comment|https://github.com/apache/spark/pull/47884#issuecomment-2323642874] for context. My guess is that the SparkSession in the ProtoToParsedPlanTestSuite does not pick up [this rule |https://github.com/apache/spark/pull/47884/files#diff-9806431743675ca892eb73a801af2f4c43086f87ecbc0c94900c8f18660f4a50]defined at BaseSessionStateBuilder.scala. Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49383) Support Transpose DataFrame API
[ https://issues.apache.org/jira/browse/SPARK-49383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-49383: - Description: Support Transpose as Scala/Python DataFrame API in both Spark Connect and Classic Spark. Transposing data is a crucial operation in data analysis, enabling the transformation of rows into columns. This operation is widely used in tools like pandas and numpy, allowing for more flexible data manipulation and visualization. While Apache Spark supports unpivot and pivot operations, it currently lacks a built-in transpose function. Implementing a transpose operation in Spark would enhance its data processing capabilities, aligning it with the functionalities available in pandas and numpy, and further empowering users in their data analysis workflows. Please see [https://docs.google.com/document/d/1QSmG81qQ-muab0UOeqgDAELqF7fJTH8GnxCJF4Ir-kA/edit] for a detailed design. was: Support Transpose as Scala/Python DataFrame API in both Spark Connect and Classic Spark. Transposing data is a crucial operation in data analysis, enabling the transformation of rows into columns. This operation is widely used in tools like pandas and numpy, allowing for more flexible data manipulation and visualization. While Apache Spark supports unpivot and pivot operations, it currently lacks a built-in transpose function. Implementing a transpose operation in Spark would enhance its data processing capabilities, aligning it with the functionalities available in pandas and numpy, and further empowering users in their data analysis workflows. > Support Transpose DataFrame API > --- > > Key: SPARK-49383 > URL: https://issues.apache.org/jira/browse/SPARK-49383 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Support Transpose as Scala/Python DataFrame API in both Spark Connect and > Classic Spark. > Transposing data is a crucial operation in data analysis, enabling the > transformation of rows into columns. This operation is widely used in tools > like pandas and numpy, allowing for more flexible data manipulation and > visualization. > While Apache Spark supports unpivot and pivot operations, it currently lacks > a built-in transpose function. Implementing a transpose operation in Spark > would enhance its data processing capabilities, aligning it with the > functionalities available in pandas and numpy, and further empowering users > in their data analysis workflows. > Please see > [https://docs.google.com/document/d/1QSmG81qQ-muab0UOeqgDAELqF7fJTH8GnxCJF4Ir-kA/edit] > for a detailed design. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48516) Turn on Arrow optimization for Python UDFs by default
Xinrong Meng created SPARK-48516: Summary: Turn on Arrow optimization for Python UDFs by default Key: SPARK-48516 URL: https://issues.apache.org/jira/browse/SPARK-48516 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Turn on Arrow optimization for Python UDFs by default -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48515) Enable Arrow optimization for Python UDFs
Xinrong Meng created SPARK-48515: Summary: Enable Arrow optimization for Python UDFs Key: SPARK-48515 URL: https://issues.apache.org/jira/browse/SPARK-48515 Project: Spark Issue Type: Umbrella Components: PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Enable Arrow optimization for Python UDFs -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47891) Improve docstring of mapInPandas
[ https://issues.apache.org/jira/browse/SPARK-47891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-47891: - Description: Improve docstring of mapInPandas * "using a Python native function that takes and outputs a pandas DataFrame" is confusing cause the function takes and outputs "ITERATOR of pandas DataFrames" instead. * "All columns are passed together as an iterator of pandas DataFrames" easily mislead users to think the entire DataFrame will be passed together, "a batch of rows" is used instead. was:Improve docstring of mapInPandas > Improve docstring of mapInPandas > > > Key: SPARK-47891 > URL: https://issues.apache.org/jira/browse/SPARK-47891 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Improve docstring of mapInPandas > * "using a Python native function that takes and outputs a pandas DataFrame" > is confusing cause the function takes and outputs "ITERATOR of pandas > DataFrames" instead. > * "All columns are passed together as an iterator of pandas DataFrames" > easily mislead users to think the entire DataFrame will be passed together, > "a batch of rows" is used instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47876) Improve docstring of mapInArrow
[ https://issues.apache.org/jira/browse/SPARK-47876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-47876. -- Resolution: Done Resolved by https://github.com/apache/spark/pull/46088 > Improve docstring of mapInArrow > --- > > Key: SPARK-47876 > URL: https://issues.apache.org/jira/browse/SPARK-47876 > Project: Spark > Issue Type: Documentation > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Improve docstring of mapInArrow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47876) Improve docstring of mapInArrow
Xinrong Meng created SPARK-47876: Summary: Improve docstring of mapInArrow Key: SPARK-47876 URL: https://issues.apache.org/jira/browse/SPARK-47876 Project: Spark Issue Type: Documentation Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Improve docstring of mapInArrow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47823) Improve appName and getOrCreate usage for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-47823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-47823: - Description: In Spark Connect {code:java} spark = SparkSession.builder.appName("...").getOrCreate(){code} raises error {code:java} [CANNOT_CONFIGURE_SPARK_CONNECT_MASTER] Spark Connect server and Spark master cannot be configured together: Spark master [...], Spark Connect [...]{code} We should ban the usage of appName in Spark Connect was: In Spark Connect {code:java} spark = SparkSession.builder.appName("...").getOrCreate(){code} raises error{{{}{}}} {code:java} [CANNOT_CONFIGURE_SPARK_CONNECT_MASTER] Spark Connect server and Spark master cannot be configured together: Spark master [...], Spark Connect [...]{code} We should ban the usage of appName in Spark Connect > Improve appName and getOrCreate usage for Spark Connect > --- > > Key: SPARK-47823 > URL: https://issues.apache.org/jira/browse/SPARK-47823 > Project: Spark > Issue Type: Story > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > > In Spark Connect > {code:java} > spark = SparkSession.builder.appName("...").getOrCreate(){code} > > raises error > > {code:java} > [CANNOT_CONFIGURE_SPARK_CONNECT_MASTER] Spark Connect server and Spark master > cannot be configured together: Spark master [...], Spark Connect [...]{code} > > We should ban the usage of appName in Spark Connect > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47823) Improve appName and getOrCreate usage for Spark Connect
Xinrong Meng created SPARK-47823: Summary: Improve appName and getOrCreate usage for Spark Connect Key: SPARK-47823 URL: https://issues.apache.org/jira/browse/SPARK-47823 Project: Spark Issue Type: Story Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng In Spark Connect {code:java} spark = SparkSession.builder.appName("...").getOrCreate(){code} raises error{{{}{}}} {code:java} [CANNOT_CONFIGURE_SPARK_CONNECT_MASTER] Spark Connect server and Spark master cannot be configured together: Spark master [...], Spark Connect [...]{code} We should ban the usage of appName in Spark Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47677) Pandas circular import error in Python 3.10
[ https://issues.apache.org/jira/browse/SPARK-47677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-47677: - Description: {{AttributeError: partially initialized module 'pandas' has no attribute '_pandas_datetime_CAPI' (most likely due to a circular import)}} The above error appears in multiple tests with Python 3.10. Python 3.11, 3.12 and pypy3 don't have the issue. See [https://github.com/apache/spark/actions/runs/8469356110/job/23208894575] for details. was: {{AttributeError: partially initialized module 'pandas' has no attribute '_pandas_datetime_CAPI' (most likely due to a circular import)}} The above error appears in multiple tests with Python 3.10. See [https://github.com/apache/spark/actions/runs/8469356110/job/23208894575] for details. > Pandas circular import error in Python 3.10 > > > Key: SPARK-47677 > URL: https://issues.apache.org/jira/browse/SPARK-47677 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > {{AttributeError: partially initialized module 'pandas' has no attribute > '_pandas_datetime_CAPI' (most likely due to a circular import)}} > > The above error appears in multiple tests with Python 3.10. > Python 3.11, 3.12 and pypy3 don't have the issue. > > See [https://github.com/apache/spark/actions/runs/8469356110/job/23208894575] > for details. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47677) Pandas circular import error in Python 3.10
Xinrong Meng created SPARK-47677: Summary: Pandas circular import error in Python 3.10 Key: SPARK-47677 URL: https://issues.apache.org/jira/browse/SPARK-47677 Project: Spark Issue Type: Test Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Xinrong Meng {{AttributeError: partially initialized module 'pandas' has no attribute '_pandas_datetime_CAPI' (most likely due to a circular import)}} The above error appears in multiple tests with Python 3.10. See [https://github.com/apache/spark/actions/runs/8469356110/job/23208894575] for details. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47276) Introduce `spark.profile.clear` for SparkSession-based profiling
[ https://issues.apache.org/jira/browse/SPARK-47276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-47276. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45378 [https://github.com/apache/spark/pull/45378] > Introduce `spark.profile.clear` for SparkSession-based profiling > > > Key: SPARK-47276 > URL: https://issues.apache.org/jira/browse/SPARK-47276 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Introduce `spark.profile.clear` for SparkSession-based profiling -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47276) Introduce `spark.profile.clear` for SparkSession-based profiling
Xinrong Meng created SPARK-47276: Summary: Introduce `spark.profile.clear` for SparkSession-based profiling Key: SPARK-47276 URL: https://issues.apache.org/jira/browse/SPARK-47276 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Introduce `spark.profile.clear` for SparkSession-based profiling -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46975) Support dedicated fallback methods
[ https://issues.apache.org/jira/browse/SPARK-46975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-46975. -- Resolution: Done Resolved by https://github.com/apache/spark/pull/45026 > Support dedicated fallback methods > -- > > Key: SPARK-46975 > URL: https://issues.apache.org/jira/browse/SPARK-46975 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46975) Support dedicated fallback methods
[ https://issues.apache.org/jira/browse/SPARK-46975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-46975: Assignee: Ruifeng Zheng > Support dedicated fallback methods > -- > > Key: SPARK-46975 > URL: https://issues.apache.org/jira/browse/SPARK-46975 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819779#comment-17819779 ] Xinrong Meng edited comment on SPARK-47132 at 2/22/24 7:21 PM: --- [~wunderalbert] would you double check if you set up your Jira account correctly? I somehow couldn't assign the ticket to you. !image-2024-02-22-11-21-30-460.png! was (Author: xinrongm): [~wunderalbert] would you double check if you set up your Jira account correctly? I somehow couldn't assign the ticket to you. > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Attachments: image-2024-02-22-11-18-02-429.png, > image-2024-02-22-11-21-30-460.png > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819779#comment-17819779 ] Xinrong Meng commented on SPARK-47132: -- [~wunderalbert] would you double check if you set up your Jira account correctly? I somehow couldn't assign the ticket to you. > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Attachments: image-2024-02-22-11-18-02-429.png > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819780#comment-17819780 ] Xinrong Meng commented on SPARK-47132: -- Resolved by https://github.com/apache/spark/pull/45197. > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Attachments: image-2024-02-22-11-18-02-429.png > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-47132: - Attachment: image-2024-02-22-11-18-02-429.png > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Attachments: image-2024-02-22-11-18-02-429.png > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-47132: - Issue Type: Documentation (was: Bug) > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-47132: - Affects Version/s: 4.0.0 (was: 3.5.0) > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819777#comment-17819777 ] Xinrong Meng commented on SPARK-47132: -- I modified the ticket to Documentation (from Bug) and 4.0.0 (from 3.5.0). > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47078) Documentation for SparkSession-based Profilers
Xinrong Meng created SPARK-47078: Summary: Documentation for SparkSession-based Profilers Key: SPARK-47078 URL: https://issues.apache.org/jira/browse/SPARK-47078 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47014) Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession
[ https://issues.apache.org/jira/browse/SPARK-47014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-47014: Assignee: Xinrong Meng > Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession > - > > Key: SPARK-47014 > URL: https://issues.apache.org/jira/browse/SPARK-47014 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47014) Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession
[ https://issues.apache.org/jira/browse/SPARK-47014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-47014. -- Resolution: Done Resolved by https://github.com/apache/spark/pull/45073 > Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession > - > > Key: SPARK-47014 > URL: https://issues.apache.org/jira/browse/SPARK-47014 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47014) Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession
Xinrong Meng created SPARK-47014: Summary: Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession Key: SPARK-47014 URL: https://issues.apache.org/jira/browse/SPARK-47014 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46690) Support profiling on FlatMapCoGroupsInBatchExec
[ https://issues.apache.org/jira/browse/SPARK-46690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-46690: Assignee: Xinrong Meng > Support profiling on FlatMapCoGroupsInBatchExec > --- > > Key: SPARK-46690 > URL: https://issues.apache.org/jira/browse/SPARK-46690 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46690) Support profiling on FlatMapCoGroupsInBatchExec
[ https://issues.apache.org/jira/browse/SPARK-46690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-46690. -- Resolution: Done Resolved by https://github.com/apache/spark/pull/45050 > Support profiling on FlatMapCoGroupsInBatchExec > --- > > Key: SPARK-46690 > URL: https://issues.apache.org/jira/browse/SPARK-46690 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46689) Support profiling on FlatMapGroupsInBatchExec
[ https://issues.apache.org/jira/browse/SPARK-46689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-46689. -- Resolution: Done Resolved by https://github.com/apache/spark/pull/45050 > Support profiling on FlatMapGroupsInBatchExec > - > > Key: SPARK-46689 > URL: https://issues.apache.org/jira/browse/SPARK-46689 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46689) Support profiling on FlatMapGroupsInBatchExec
[ https://issues.apache.org/jira/browse/SPARK-46689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-46689: Assignee: Xinrong Meng > Support profiling on FlatMapGroupsInBatchExec > - > > Key: SPARK-46689 > URL: https://issues.apache.org/jira/browse/SPARK-46689 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46925) Add a warning that instructs to install memory_profiler for memory profiling
Xinrong Meng created SPARK-46925: Summary: Add a warning that instructs to install memory_profiler for memory profiling Key: SPARK-46925 URL: https://issues.apache.org/jira/browse/SPARK-46925 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Add a warning that instructs to install memory_profiler for memory profiling -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46880) Improve and test warning for Arrow-optimized Python UDF
Xinrong Meng created SPARK-46880: Summary: Improve and test warning for Arrow-optimized Python UDF Key: SPARK-46880 URL: https://issues.apache.org/jira/browse/SPARK-46880 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, Tests Affects Versions: 4.0.0 Reporter: Xinrong Meng Improve and test warning for Arrow-optimized Python UDF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46467) Improve and test exceptions of TimedeltaIndex
[ https://issues.apache.org/jira/browse/SPARK-46467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-46467. -- Assignee: Xinrong Meng Resolution: Not A Problem We don't have a plan to migrate Pandas API on Spark to PySpark error framework, instead, it should follow Pandas standard. So no proposed changes for now. > Improve and test exceptions of TimedeltaIndex > - > > Key: SPARK-46467 > URL: https://issues.apache.org/jira/browse/SPARK-46467 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46781) Test data source (pyspark.sql.datasource)
Xinrong Meng created SPARK-46781: Summary: Test data source (pyspark.sql.datasource) Key: SPARK-46781 URL: https://issues.apache.org/jira/browse/SPARK-46781 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Test custom data source and input partition. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46781) Test custom data source and input partition (pyspark.sql.datasource)
[ https://issues.apache.org/jira/browse/SPARK-46781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46781: - Summary: Test custom data source and input partition (pyspark.sql.datasource) (was: Test data source (pyspark.sql.datasource)) > Test custom data source and input partition (pyspark.sql.datasource) > > > Key: SPARK-46781 > URL: https://issues.apache.org/jira/browse/SPARK-46781 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > Test custom data source and input partition. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42862) Review and fix issues in Core API docs
[ https://issues.apache.org/jira/browse/SPARK-42862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42862: - Parent: SPARK-42523 (was: SPARK-42693) > Review and fix issues in Core API docs > -- > > Key: SPARK-42862 > URL: https://issues.apache.org/jira/browse/SPARK-42862 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Yuanjian Li >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42863) Review and fix issues in PySpark API docs
[ https://issues.apache.org/jira/browse/SPARK-42863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42863: - Parent: SPARK-42523 (was: SPARK-42693) > Review and fix issues in PySpark API docs > - > > Key: SPARK-42863 > URL: https://issues.apache.org/jira/browse/SPARK-42863 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42864) Review and fix issues in MLlib API docs
[ https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42864: - Parent: SPARK-42523 (was: SPARK-42693) > Review and fix issues in MLlib API docs > --- > > Key: SPARK-42864 > URL: https://issues.apache.org/jira/browse/SPARK-42864 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42861) Review and fix issues in SQL API docs
[ https://issues.apache.org/jira/browse/SPARK-42861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42861: - Parent: SPARK-42523 (was: SPARK-42693) > Review and fix issues in SQL API docs > - > > Key: SPARK-42861 > URL: https://issues.apache.org/jira/browse/SPARK-42861 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42866) Review and fix issues in Spark Connect - Scala API docs
[ https://issues.apache.org/jira/browse/SPARK-42866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42866: - Parent: SPARK-42523 (was: SPARK-42693) > Review and fix issues in Spark Connect - Scala API docs > --- > > Key: SPARK-42866 > URL: https://issues.apache.org/jira/browse/SPARK-42866 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42693) API Auditing
[ https://issues.apache.org/jira/browse/SPARK-42693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42693: - Parent: SPARK-42523 Issue Type: Sub-task (was: Story) > API Auditing > > > Key: SPARK-42693 > URL: https://issues.apache.org/jira/browse/SPARK-42693 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark, Spark Core, SQL, Structured Streaming >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Blocker > > Audit user-facing API of Spark 3.4. The main goal is to ensure public API > docs to be ready for release, for example, no private classes/methods is > leaking and marked public. > There are 3 common ways to audit API: > * build docs (into a local website) against branch-3.4 to check > * 'git diff' to check the source code differences between v3.3.2 and > branch-3.4 > * [https://github.com/apache/spark-website/pull/443] shows most of the API > doc differences between v3.3.2 and the 3.4.0 RC4(the latest RC); commits are > categorized by components -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42523) Apache Spark 3.4 release
[ https://issues.apache.org/jira/browse/SPARK-42523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-42523. -- Resolution: Done > Apache Spark 3.4 release > > > Key: SPARK-42523 > URL: https://issues.apache.org/jira/browse/SPARK-42523 > Project: Spark > Issue Type: Umbrella > Components: Build >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > An umbrella for Apache Spark 3.4 release -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46467) Improve and test exceptions of TimedeltaIndex
Xinrong Meng created SPARK-46467: Summary: Improve and test exceptions of TimedeltaIndex Key: SPARK-46467 URL: https://issues.apache.org/jira/browse/SPARK-46467 Project: Spark Issue Type: Sub-task Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46459) Fix bundler to 2.4.22 to unclock CI
Xinrong Meng created SPARK-46459: Summary: Fix bundler to 2.4.22 to unclock CI Key: SPARK-46459 URL: https://issues.apache.org/jira/browse/SPARK-46459 Project: Spark Issue Type: Story Components: Build, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Fix bundler to 2.4.22 to unclock CI -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46386) Improve assertions of observation (pyspark.sql.observation)
[ https://issues.apache.org/jira/browse/SPARK-46386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46386: - Summary: Improve assertions of observation (pyspark.sql.observation) (was: Improve and test assertions of observation (pyspark.sql.observation)) > Improve assertions of observation (pyspark.sql.observation) > --- > > Key: SPARK-46386 > URL: https://issues.apache.org/jira/browse/SPARK-46386 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46386) Improve and test assertions of observation (pyspark.sql.observation)
[ https://issues.apache.org/jira/browse/SPARK-46386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46386: - Parent: (was: SPARK-46041) Issue Type: Improvement (was: Sub-task) > Improve and test assertions of observation (pyspark.sql.observation) > > > Key: SPARK-46386 > URL: https://issues.apache.org/jira/browse/SPARK-46386 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46413) Validate returnType of Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-46413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46413: - Description: Validate returnType of Arrow Python UDF (was: Check returnType of Arrow Python UDF) > Validate returnType of Arrow Python UDF > --- > > Key: SPARK-46413 > URL: https://issues.apache.org/jira/browse/SPARK-46413 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > Validate returnType of Arrow Python UDF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46413) Validate returnType of Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-46413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46413: - Summary: Validate returnType of Arrow Python UDF (was: Check returnType of Arrow Python UDF) > Validate returnType of Arrow Python UDF > --- > > Key: SPARK-46413 > URL: https://issues.apache.org/jira/browse/SPARK-46413 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > Check returnType of Arrow Python UDF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46413) Check returnType of Arrow Python UDF
Xinrong Meng created SPARK-46413: Summary: Check returnType of Arrow Python UDF Key: SPARK-46413 URL: https://issues.apache.org/jira/browse/SPARK-46413 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Check returnType of Arrow Python UDF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46398) Test rangeBetween window function (pyspark.sql.window)
Xinrong Meng created SPARK-46398: Summary: Test rangeBetween window function (pyspark.sql.window) Key: SPARK-46398 URL: https://issues.apache.org/jira/browse/SPARK-46398 Project: Spark Issue Type: Sub-task Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46386) Improve and test assertions of observation (pyspark.sql.observation)
Xinrong Meng created SPARK-46386: Summary: Improve and test assertions of observation (pyspark.sql.observation) Key: SPARK-46386 URL: https://issues.apache.org/jira/browse/SPARK-46386 Project: Spark Issue Type: Sub-task Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46385) Test aggregate functions for groups (pyspark.sql.group)
Xinrong Meng created SPARK-46385: Summary: Test aggregate functions for groups (pyspark.sql.group) Key: SPARK-46385 URL: https://issues.apache.org/jira/browse/SPARK-46385 Project: Spark Issue Type: Sub-task Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46277) Validate startup urls with the config being set
[ https://issues.apache.org/jira/browse/SPARK-46277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-46277. -- Resolution: Fixed Resolved by https://github.com/apache/spark/pull/44194 > Validate startup urls with the config being set > --- > > Key: SPARK-46277 > URL: https://issues.apache.org/jira/browse/SPARK-46277 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > Attachments: image-2023-12-05-15-39-08-830.png > > > !image-2023-12-05-15-39-08-830.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46277) Validate startup urls with the config being set
[ https://issues.apache.org/jira/browse/SPARK-46277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-46277: Assignee: Xinrong Meng > Validate startup urls with the config being set > --- > > Key: SPARK-46277 > URL: https://issues.apache.org/jira/browse/SPARK-46277 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > Attachments: image-2023-12-05-15-39-08-830.png > > > !image-2023-12-05-15-39-08-830.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46291) Koalas Testing Migration
[ https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46291: - Description: Test migration from Koalas to Spark repository, including setting up the testing environment and dependencies, and CI jobs. > Koalas Testing Migration > > > Key: SPARK-46291 > URL: https://issues.apache.org/jira/browse/SPARK-46291 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > Test migration from Koalas to Spark repository, including setting up the > testing environment and dependencies, and CI jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46291) Koalas Testing Migration
[ https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46291: - Summary: Koalas Testing Migration (was: Testing migration) > Koalas Testing Migration > > > Key: SPARK-46291 > URL: https://issues.apache.org/jira/browse/SPARK-46291 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46291) Testing migration
[ https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-46291: Assignee: Xinrong Meng > Testing migration > - > > Key: SPARK-46291 > URL: https://issues.apache.org/jira/browse/SPARK-46291 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46291) Testing migration
[ https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-46291. -- Resolution: Done > Testing migration > - > > Key: SPARK-46291 > URL: https://issues.apache.org/jira/browse/SPARK-46291 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34999) Consolidate PySpark testing utils
[ https://issues.apache.org/jira/browse/SPARK-34999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-34999: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Consolidate PySpark testing utils > - > > Key: SPARK-34999 > URL: https://issues.apache.org/jira/browse/SPARK-34999 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > `python/pyspark/pandas/testing` hold test utilites for pandas-on-spark, and > `python/pyspark/testing` contain test utilities for pyspark. Consolidating > them makes code cleaner and easier to maintain. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35012) Port Koalas DataFrame related unit tests into PySpark
[ https://issues.apache.org/jira/browse/SPARK-35012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35012: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Port Koalas DataFrame related unit tests into PySpark > - > > Key: SPARK-35012 > URL: https://issues.apache.org/jira/browse/SPARK-35012 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > This JIRA aims to port Koalas DataFrame related unit tests to [PySpark > tests|https://github.com/apache/spark/tree/master/python/pyspark/tests]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35300) Standardize module name in install.rst
[ https://issues.apache.org/jira/browse/SPARK-35300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35300: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Standardize module name in install.rst > -- > > Key: SPARK-35300 > URL: https://issues.apache.org/jira/browse/SPARK-35300 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > We should use the full names of modules in install.rst. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35034) Port Koalas miscellaneous unit tests into PySpark
[ https://issues.apache.org/jira/browse/SPARK-35034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35034: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Port Koalas miscellaneous unit tests into PySpark > - > > Key: SPARK-35034 > URL: https://issues.apache.org/jira/browse/SPARK-35034 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > This JIRA aims to port Koalas miscellaneous unit tests to [PySpark > tests|https://github.com/apache/spark/tree/master/python/pyspark/tests]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35035) Port Koalas internal implementation unit tests into PySpark
[ https://issues.apache.org/jira/browse/SPARK-35035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35035: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Port Koalas internal implementation unit tests into PySpark > --- > > Key: SPARK-35035 > URL: https://issues.apache.org/jira/browse/SPARK-35035 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > This JIRA aims to port Koalas internal implementation related unit tests to > [PySpark > tests|https://github.com/apache/spark/tree/master/python/pyspark/tests]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org