[jira] [Created] (SPARK-41202) Update ORC to 1.7.7

2022-11-18 Thread William Hyun (Jira)
William Hyun created SPARK-41202:


 Summary: Update ORC to 1.7.7
 Key: SPARK-41202
 URL: https://issues.apache.org/jira/browse/SPARK-41202
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.3.2
Reporter: William Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41175) Assign a name to the error class _LEGACY_ERROR_TEMP_1078

2022-11-18 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-41175.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38696
[https://github.com/apache/spark/pull/38696]

> Assign a name to the error class _LEGACY_ERROR_TEMP_1078
> 
>
> Key: SPARK-41175
> URL: https://issues.apache.org/jira/browse/SPARK-41175
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: BingKun Pan
>Priority: Major
> Fix For: 3.4.0
>
>
> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_1078 and make it 
> visible to users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41175) Assign a name to the error class _LEGACY_ERROR_TEMP_1078

2022-11-18 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-41175:


Assignee: BingKun Pan

> Assign a name to the error class _LEGACY_ERROR_TEMP_1078
> 
>
> Key: SPARK-41175
> URL: https://issues.apache.org/jira/browse/SPARK-41175
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: BingKun Pan
>Priority: Major
>
> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_1078 and make it 
> visible to users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41201) Implement `DataFrame.SelectExpr` in Python client

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41201:


Assignee: (was: Apache Spark)

> Implement `DataFrame.SelectExpr` in Python client
> -
>
> Key: SPARK-41201
> URL: https://issues.apache.org/jira/browse/SPARK-41201
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41201) Implement `DataFrame.SelectExpr` in Python client

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636109#comment-17636109
 ] 

Apache Spark commented on SPARK-41201:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38723

> Implement `DataFrame.SelectExpr` in Python client
> -
>
> Key: SPARK-41201
> URL: https://issues.apache.org/jira/browse/SPARK-41201
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41201) Implement `DataFrame.SelectExpr` in Python client

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41201:


Assignee: Apache Spark

> Implement `DataFrame.SelectExpr` in Python client
> -
>
> Key: SPARK-41201
> URL: https://issues.apache.org/jira/browse/SPARK-41201
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41201) Implement `DataFrame.SelectExpr` in Python client

2022-11-18 Thread Rui Wang (Jira)
Rui Wang created SPARK-41201:


 Summary: Implement `DataFrame.SelectExpr` in Python client
 Key: SPARK-41201
 URL: https://issues.apache.org/jira/browse/SPARK-41201
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41200) BytesToBytesMap's longArray size can be up to MAX_CAPACITY

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636106#comment-17636106
 ] 

Apache Spark commented on SPARK-41200:
--

User 'WangGuangxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/38722

> BytesToBytesMap's longArray size can be up to MAX_CAPACITY
> --
>
> Key: SPARK-41200
> URL: https://issues.apache.org/jira/browse/SPARK-41200
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: EdisonWang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41200) BytesToBytesMap's longArray size can be up to MAX_CAPACITY

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41200:


Assignee: Apache Spark

> BytesToBytesMap's longArray size can be up to MAX_CAPACITY
> --
>
> Key: SPARK-41200
> URL: https://issues.apache.org/jira/browse/SPARK-41200
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: EdisonWang
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41200) BytesToBytesMap's longArray size can be up to MAX_CAPACITY

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636105#comment-17636105
 ] 

Apache Spark commented on SPARK-41200:
--

User 'WangGuangxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/38722

> BytesToBytesMap's longArray size can be up to MAX_CAPACITY
> --
>
> Key: SPARK-41200
> URL: https://issues.apache.org/jira/browse/SPARK-41200
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: EdisonWang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41200) BytesToBytesMap's longArray size can be up to MAX_CAPACITY

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41200:


Assignee: (was: Apache Spark)

> BytesToBytesMap's longArray size can be up to MAX_CAPACITY
> --
>
> Key: SPARK-41200
> URL: https://issues.apache.org/jira/browse/SPARK-41200
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: EdisonWang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41200) BytesToBytesMap's longArray size can be up to MAX_CAPACITY

2022-11-18 Thread EdisonWang (Jira)
EdisonWang created SPARK-41200:
--

 Summary: BytesToBytesMap's longArray size can be up to MAX_CAPACITY
 Key: SPARK-41200
 URL: https://issues.apache.org/jira/browse/SPARK-41200
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: EdisonWang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-41172) Migrate the ambiguous ref error to an error class

2022-11-18 Thread BingKun Pan (Jira)


[ https://issues.apache.org/jira/browse/SPARK-41172 ]


BingKun Pan deleted comment on SPARK-41172:
-

was (Author: panbingkun):
I work on it.

> Migrate the ambiguous ref error to an error class
> -
>
> Key: SPARK-41172
> URL: https://issues.apache.org/jira/browse/SPARK-41172
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Use an error class in 
> https://github.com/apache/spark/blob/99ae1d9a897909990881f14c5ea70a0d1a0bf456/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala#L372



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41172) Migrate the ambiguous ref error to an error class

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636098#comment-17636098
 ] 

Apache Spark commented on SPARK-41172:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38721

> Migrate the ambiguous ref error to an error class
> -
>
> Key: SPARK-41172
> URL: https://issues.apache.org/jira/browse/SPARK-41172
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Use an error class in 
> https://github.com/apache/spark/blob/99ae1d9a897909990881f14c5ea70a0d1a0bf456/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala#L372



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41172) Migrate the ambiguous ref error to an error class

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41172:


Assignee: (was: Apache Spark)

> Migrate the ambiguous ref error to an error class
> -
>
> Key: SPARK-41172
> URL: https://issues.apache.org/jira/browse/SPARK-41172
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Use an error class in 
> https://github.com/apache/spark/blob/99ae1d9a897909990881f14c5ea70a0d1a0bf456/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala#L372



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41172) Migrate the ambiguous ref error to an error class

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636097#comment-17636097
 ] 

Apache Spark commented on SPARK-41172:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38721

> Migrate the ambiguous ref error to an error class
> -
>
> Key: SPARK-41172
> URL: https://issues.apache.org/jira/browse/SPARK-41172
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Use an error class in 
> https://github.com/apache/spark/blob/99ae1d9a897909990881f14c5ea70a0d1a0bf456/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala#L372



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41172) Migrate the ambiguous ref error to an error class

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41172:


Assignee: Apache Spark

> Migrate the ambiguous ref error to an error class
> -
>
> Key: SPARK-41172
> URL: https://issues.apache.org/jira/browse/SPARK-41172
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Use an error class in 
> https://github.com/apache/spark/blob/99ae1d9a897909990881f14c5ea70a0d1a0bf456/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala#L372



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41186) Fix doctest for new version mlfow

2022-11-18 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang resolved SPARK-41186.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38698
[https://github.com/apache/spark/pull/38698]

> Fix doctest for new version mlfow
> -
>
> Key: SPARK-41186
> URL: https://issues.apache.org/jira/browse/SPARK-41186
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>
>   
>   
> **
> File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 168, in 
> pyspark.pandas.mlflow.load_model
> Failed example:
> run_info = client.list_run_infos(exp_id)[-1]
> Exception raised:
> Traceback (most recent call last):
>   File "/usr/lib/python3.9/doctest.py", line 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 1, in 
> 
> run_info = client.list_run_infos(exp_id)[-1]
> AttributeError: 'MlflowClient' object has no attribute 'list_run_infos'
> **
> File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 169, in 
> pyspark.pandas.mlflow.load_model
> Failed example:
> model = 
> load_model("runs:/{run_id}/model".format(run_id=run_info.run_uuid))
> Exception raised:
> Traceback (most recent call last):
>   File "/usr/lib/python3.9/doctest.py", line 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 1, in 
> 
> model = 
> load_model("runs:/{run_id}/model".format(run_id=run_info.run_uuid))
> NameError: name 'run_info' is not defined
> **
> File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 171, in 
> pyspark.pandas.mlflow.load_model
> Failed example:
> prediction_df["prediction"] = model.predict(prediction_df)
> Exception raised:
> Traceback (most recent call last):
>   File "/usr/lib/python3.9/doctest.py", line 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 1, in 
> 
> prediction_df["prediction"] = model.predict(prediction_df)
> NameError: name 'model' is not defined
> **
> File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 172, in 
> pyspark.pandas.mlflow.load_model
> Failed example:
> prediction_df
> Expected:
> x1   x2  prediction
> 0  2.0  4.01.31
> Got:
> x1   x2
> 0  2.0  4.0
> **
> File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 178, in 
> pyspark.pandas.mlflow.load_model
> Failed example:
> model.predict(prediction_df[["x1", "x2"]].to_pandas())
> Exception raised:
> Traceback (most recent call last):
>   File "/usr/lib/python3.9/doctest.py", line 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 1, in 
> 
> model.predict(prediction_df[["x1", "x2"]].to_pandas())
> NameError: name 'model' is not defined
> **
> File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 189, in 
> pyspark.pandas.mlflow.load_model
> Failed example:
> y = model.predict(features)
> Exception raised:
> Traceback (most recent call last):
>   File "/usr/lib/python3.9/doctest.py", line 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 1, in 
> 
> y = model.predict(features)
> NameError: name 'model' is not defined
> **
> File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 198, in 
> pyspark.pandas.mlflow.load_model
> Failed example:
> features['y'] = y
> Exception raised:
> Traceback (most recent call last):
>   File "/usr/lib/python3.9/doctest.py", line 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 1, in 
> 
> features['y'] = y
> NameError: name 'y' is not defined
> **
> File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 200, in 
> pyspark.pandas.mlflow.load_model
> Failed example:
> everything
> Expected:
> x1   x2  z y
> 0  2.0  3.0 -1  1.376932
> Got:
> x1   x2  z
> 0  2.0  3.0 -1
> 

[jira] [Assigned] (SPARK-41186) Fix doctest for new version mlfow

2022-11-18 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang reassigned SPARK-41186:
---

Assignee: Yikun Jiang

> Fix doctest for new version mlfow
> -
>
> Key: SPARK-41186
> URL: https://issues.apache.org/jira/browse/SPARK-41186
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>
>   
>   
> **
> File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 168, in 
> pyspark.pandas.mlflow.load_model
> Failed example:
> run_info = client.list_run_infos(exp_id)[-1]
> Exception raised:
> Traceback (most recent call last):
>   File "/usr/lib/python3.9/doctest.py", line 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 1, in 
> 
> run_info = client.list_run_infos(exp_id)[-1]
> AttributeError: 'MlflowClient' object has no attribute 'list_run_infos'
> **
> File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 169, in 
> pyspark.pandas.mlflow.load_model
> Failed example:
> model = 
> load_model("runs:/{run_id}/model".format(run_id=run_info.run_uuid))
> Exception raised:
> Traceback (most recent call last):
>   File "/usr/lib/python3.9/doctest.py", line 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 1, in 
> 
> model = 
> load_model("runs:/{run_id}/model".format(run_id=run_info.run_uuid))
> NameError: name 'run_info' is not defined
> **
> File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 171, in 
> pyspark.pandas.mlflow.load_model
> Failed example:
> prediction_df["prediction"] = model.predict(prediction_df)
> Exception raised:
> Traceback (most recent call last):
>   File "/usr/lib/python3.9/doctest.py", line 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 1, in 
> 
> prediction_df["prediction"] = model.predict(prediction_df)
> NameError: name 'model' is not defined
> **
> File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 172, in 
> pyspark.pandas.mlflow.load_model
> Failed example:
> prediction_df
> Expected:
> x1   x2  prediction
> 0  2.0  4.01.31
> Got:
> x1   x2
> 0  2.0  4.0
> **
> File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 178, in 
> pyspark.pandas.mlflow.load_model
> Failed example:
> model.predict(prediction_df[["x1", "x2"]].to_pandas())
> Exception raised:
> Traceback (most recent call last):
>   File "/usr/lib/python3.9/doctest.py", line 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 1, in 
> 
> model.predict(prediction_df[["x1", "x2"]].to_pandas())
> NameError: name 'model' is not defined
> **
> File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 189, in 
> pyspark.pandas.mlflow.load_model
> Failed example:
> y = model.predict(features)
> Exception raised:
> Traceback (most recent call last):
>   File "/usr/lib/python3.9/doctest.py", line 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 1, in 
> 
> y = model.predict(features)
> NameError: name 'model' is not defined
> **
> File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 198, in 
> pyspark.pandas.mlflow.load_model
> Failed example:
> features['y'] = y
> Exception raised:
> Traceback (most recent call last):
>   File "/usr/lib/python3.9/doctest.py", line 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 1, in 
> 
> features['y'] = y
> NameError: name 'y' is not defined
> **
> File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 200, in 
> pyspark.pandas.mlflow.load_model
> Failed example:
> everything
> Expected:
> x1   x2  z y
> 0  2.0  3.0 -1  1.376932
> Got:
> x1   x2  z
> 0  2.0  3.0 -1
> **
>8 of  26 in pyspark.pandas.mlflow.load_model



--
This message 

[jira] [Commented] (SPARK-41172) Migrate the ambiguous ref error to an error class

2022-11-18 Thread BingKun Pan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636083#comment-17636083
 ] 

BingKun Pan commented on SPARK-41172:
-

I work on it.

> Migrate the ambiguous ref error to an error class
> -
>
> Key: SPARK-41172
> URL: https://issues.apache.org/jira/browse/SPARK-41172
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Use an error class in 
> https://github.com/apache/spark/blob/99ae1d9a897909990881f14c5ea70a0d1a0bf456/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala#L372



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41184) Fill NA tests are flaky

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636082#comment-17636082
 ] 

Apache Spark commented on SPARK-41184:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/38720

> Fill NA tests are flaky
> ---
>
> Key: SPARK-41184
> URL: https://issues.apache.org/jira/browse/SPARK-41184
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.4.0
>
>
> Connect's fill.na tests for python are flakey. We need to disable them, and 
> investigate what is going on with the typing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41184) Fill NA tests are flaky

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636081#comment-17636081
 ] 

Apache Spark commented on SPARK-41184:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/38720

> Fill NA tests are flaky
> ---
>
> Key: SPARK-41184
> URL: https://issues.apache.org/jira/browse/SPARK-41184
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.4.0
>
>
> Connect's fill.na tests for python are flakey. We need to disable them, and 
> investigate what is going on with the typing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41165) Arrow collect should factor in failures

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636080#comment-17636080
 ] 

Apache Spark commented on SPARK-41165:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/38720

> Arrow collect should factor in failures
> ---
>
> Key: SPARK-41165
> URL: https://issues.apache.org/jira/browse/SPARK-41165
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.4.0
>
>
> Connect's arrow collect path does not factor in failures. If a failure occurs 
> the collect code path will hang.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41199) Streaming query metrics is broken with mixed-up usage of DSv1 streaming source and DSv2 streaming source

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41199:


Assignee: Apache Spark

> Streaming query metrics is broken with mixed-up usage of DSv1 streaming 
> source and DSv2 streaming source
> 
>
> Key: SPARK-41199
> URL: https://issues.apache.org/jira/browse/SPARK-41199
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Major
>
> (It seems like a long standing issue. It probably applies to 2.x as well. I 
> just marked the version line we did not EOL.)
> If a streaming query contains both DSv1 and DSv2 streaming sources together, 
> it only collects metrics properly for DSv1 sources.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41199) Streaming query metrics is broken with mixed-up usage of DSv1 streaming source and DSv2 streaming source

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41199:


Assignee: (was: Apache Spark)

> Streaming query metrics is broken with mixed-up usage of DSv1 streaming 
> source and DSv2 streaming source
> 
>
> Key: SPARK-41199
> URL: https://issues.apache.org/jira/browse/SPARK-41199
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Jungtaek Lim
>Priority: Major
>
> (It seems like a long standing issue. It probably applies to 2.x as well. I 
> just marked the version line we did not EOL.)
> If a streaming query contains both DSv1 and DSv2 streaming sources together, 
> it only collects metrics properly for DSv1 sources.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41199) Streaming query metrics is broken with mixed-up usage of DSv1 streaming source and DSv2 streaming source

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636041#comment-17636041
 ] 

Apache Spark commented on SPARK-41199:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/38719

> Streaming query metrics is broken with mixed-up usage of DSv1 streaming 
> source and DSv2 streaming source
> 
>
> Key: SPARK-41199
> URL: https://issues.apache.org/jira/browse/SPARK-41199
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Jungtaek Lim
>Priority: Major
>
> (It seems like a long standing issue. It probably applies to 2.x as well. I 
> just marked the version line we did not EOL.)
> If a streaming query contains both DSv1 and DSv2 streaming sources together, 
> it only collects metrics properly for DSv1 sources.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41196) Homogenize the protobuf version across server and client

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636034#comment-17636034
 ] 

Apache Spark commented on SPARK-41196:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38718

> Homogenize the protobuf version across server and client
> 
>
> Key: SPARK-41196
> URL: https://issues.apache.org/jira/browse/SPARK-41196
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
> Fix For: 3.4.0
>
>
> Homogenize the protobuf version across server and client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41196) Homogenize the protobuf version across server and client

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636033#comment-17636033
 ] 

Apache Spark commented on SPARK-41196:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38718

> Homogenize the protobuf version across server and client
> 
>
> Key: SPARK-41196
> URL: https://issues.apache.org/jira/browse/SPARK-41196
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
> Fix For: 3.4.0
>
>
> Homogenize the protobuf version across server and client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41199) Streaming query metrics is broken with mixed-up usage of DSv1 streaming source and DSv2 streaming source

2022-11-18 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-41199:


 Summary: Streaming query metrics is broken with mixed-up usage of 
DSv1 streaming source and DSv2 streaming source
 Key: SPARK-41199
 URL: https://issues.apache.org/jira/browse/SPARK-41199
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.3.1, 3.2.2, 3.4.0
Reporter: Jungtaek Lim


(It seems like a long standing issue. It probably applies to 2.x as well. I 
just marked the version line we did not EOL.)

If a streaming query contains both DSv1 and DSv2 streaming sources together, it 
only collects metrics properly for DSv1 sources.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41198) Streaming query metrics is broken with CTE

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636024#comment-17636024
 ] 

Apache Spark commented on SPARK-41198:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/38717

> Streaming query metrics is broken with CTE
> --
>
> Key: SPARK-41198
> URL: https://issues.apache.org/jira/browse/SPARK-41198
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Jungtaek Lim
>Priority: Major
>
> We have observed a case the metrics are not available for the streaming query 
> which contains CTE.
> Looks like CTE was inlined in analysis phase in Spark 3.1.x and it was 
> changed to be inlined in optimization phase in Spark 3.2.x. ProgressReporter 
> depends on analyzed plan, hence the change made ProgressReporter to see CTE 
> nodes, which ends up with having different number of leaf nodes between 
> analyzed plan and executed plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41198) Streaming query metrics is broken with CTE

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636026#comment-17636026
 ] 

Apache Spark commented on SPARK-41198:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/38717

> Streaming query metrics is broken with CTE
> --
>
> Key: SPARK-41198
> URL: https://issues.apache.org/jira/browse/SPARK-41198
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Jungtaek Lim
>Priority: Major
>
> We have observed a case the metrics are not available for the streaming query 
> which contains CTE.
> Looks like CTE was inlined in analysis phase in Spark 3.1.x and it was 
> changed to be inlined in optimization phase in Spark 3.2.x. ProgressReporter 
> depends on analyzed plan, hence the change made ProgressReporter to see CTE 
> nodes, which ends up with having different number of leaf nodes between 
> analyzed plan and executed plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41198) Streaming query metrics is broken with CTE

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41198:


Assignee: (was: Apache Spark)

> Streaming query metrics is broken with CTE
> --
>
> Key: SPARK-41198
> URL: https://issues.apache.org/jira/browse/SPARK-41198
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Jungtaek Lim
>Priority: Major
>
> We have observed a case the metrics are not available for the streaming query 
> which contains CTE.
> Looks like CTE was inlined in analysis phase in Spark 3.1.x and it was 
> changed to be inlined in optimization phase in Spark 3.2.x. ProgressReporter 
> depends on analyzed plan, hence the change made ProgressReporter to see CTE 
> nodes, which ends up with having different number of leaf nodes between 
> analyzed plan and executed plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41198) Streaming query metrics is broken with CTE

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41198:


Assignee: Apache Spark

> Streaming query metrics is broken with CTE
> --
>
> Key: SPARK-41198
> URL: https://issues.apache.org/jira/browse/SPARK-41198
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Major
>
> We have observed a case the metrics are not available for the streaming query 
> which contains CTE.
> Looks like CTE was inlined in analysis phase in Spark 3.1.x and it was 
> changed to be inlined in optimization phase in Spark 3.2.x. ProgressReporter 
> depends on analyzed plan, hence the change made ProgressReporter to see CTE 
> nodes, which ends up with having different number of leaf nodes between 
> analyzed plan and executed plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41173) Move `require()` out from the constructors of string expressions

2022-11-18 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-41173:


Assignee: Yang Jie

> Move `require()` out from the constructors of string expressions
> 
>
> Key: SPARK-41173
> URL: https://issues.apache.org/jira/browse/SPARK-41173
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Yang Jie
>Priority: Major
>
> 1. ConcatWs:
> https://github.com/apache/spark/blob/fabea7101ea55db991590ca2fbe1d4dfd25e5b28/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L70
> 2. FormatString
> https://github.com/apache/spark/blob/fabea7101ea55db991590ca2fbe1d4dfd25e5b28/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L1665



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41173) Move `require()` out from the constructors of string expressions

2022-11-18 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-41173.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38705
[https://github.com/apache/spark/pull/38705]

> Move `require()` out from the constructors of string expressions
> 
>
> Key: SPARK-41173
> URL: https://issues.apache.org/jira/browse/SPARK-41173
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.0
>
>
> 1. ConcatWs:
> https://github.com/apache/spark/blob/fabea7101ea55db991590ca2fbe1d4dfd25e5b28/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L70
> 2. FormatString
> https://github.com/apache/spark/blob/fabea7101ea55db991590ca2fbe1d4dfd25e5b28/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L1665



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41198) Streaming query metrics is broken with CTE

2022-11-18 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-41198:


 Summary: Streaming query metrics is broken with CTE
 Key: SPARK-41198
 URL: https://issues.apache.org/jira/browse/SPARK-41198
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.3.1, 3.2.2, 3.4.0
Reporter: Jungtaek Lim


We have observed a case the metrics are not available for the streaming query 
which contains CTE.

Looks like CTE was inlined in analysis phase in Spark 3.1.x and it was changed 
to be inlined in optimization phase in Spark 3.2.x. ProgressReporter depends on 
analyzed plan, hence the change made ProgressReporter to see CTE nodes, which 
ends up with having different number of leaf nodes between analyzed plan and 
executed plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41197) Upgrade Kafka version to 3.3 release

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635992#comment-17635992
 ] 

Apache Spark commented on SPARK-41197:
--

User 'tedyu' has created a pull request for this issue:
https://github.com/apache/spark/pull/38715

> Upgrade Kafka version to 3.3 release
> 
>
> Key: SPARK-41197
> URL: https://issues.apache.org/jira/browse/SPARK-41197
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API
>Affects Versions: 3.3.1
>Reporter: Ted Yu
>Priority: Minor
>
> Kafka 3.3 has been released.
> This issue upgrades Kafka dependency to 3.3 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41197) Upgrade Kafka version to 3.3 release

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41197:


Assignee: Apache Spark

> Upgrade Kafka version to 3.3 release
> 
>
> Key: SPARK-41197
> URL: https://issues.apache.org/jira/browse/SPARK-41197
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API
>Affects Versions: 3.3.1
>Reporter: Ted Yu
>Assignee: Apache Spark
>Priority: Minor
>
> Kafka 3.3 has been released.
> This issue upgrades Kafka dependency to 3.3 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41197) Upgrade Kafka version to 3.3 release

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635991#comment-17635991
 ] 

Apache Spark commented on SPARK-41197:
--

User 'tedyu' has created a pull request for this issue:
https://github.com/apache/spark/pull/38715

> Upgrade Kafka version to 3.3 release
> 
>
> Key: SPARK-41197
> URL: https://issues.apache.org/jira/browse/SPARK-41197
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API
>Affects Versions: 3.3.1
>Reporter: Ted Yu
>Priority: Minor
>
> Kafka 3.3 has been released.
> This issue upgrades Kafka dependency to 3.3 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41197) Upgrade Kafka version to 3.3 release

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41197:


Assignee: (was: Apache Spark)

> Upgrade Kafka version to 3.3 release
> 
>
> Key: SPARK-41197
> URL: https://issues.apache.org/jira/browse/SPARK-41197
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API
>Affects Versions: 3.3.1
>Reporter: Ted Yu
>Priority: Minor
>
> Kafka 3.3 has been released.
> This issue upgrades Kafka dependency to 3.3 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41141) avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-18 Thread Asif (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635989#comment-17635989
 ] 

Asif commented on SPARK-41141:
--

Opened the following PR

[SPARK-41141-PR|https://github.com/apache/spark/pull/38714/files]

 

> avoid introducing a new aggregate expression in the analysis phase when 
> subquery is referencing it
> --
>
> Key: SPARK-41141
> URL: https://issues.apache.org/jira/browse/SPARK-41141
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Asif
>Priority: Minor
>  Labels: spark-sql
>
> Currently the  analyzer phase rules on subquery referencing the aggregate 
> expression in outer query, avoids introducing a new aggregate only for a 
> single level aggregate function. It introduces new aggregate expression for 
> nested aggregate functions.
> It is possible to avoid  adding this extra aggregate expression  easily, 
> atleast if the outer projection involving aggregate function is exactly same 
> as the one that is used in subquery, or if the outer query's projection 
> involving aggregate function is a subtree of the subquery's expression.
>  
> Thus consider the following 2 cases:
> 1) select  cos (sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = cos(sum(a)) )
> 2) select  sum(a) , b from t1  group by b having exists (select x from t2 
> where y = cos(sum(a)) )
>  
> In both the above cases, there is no need for adding extra aggregate 
> expression.
>  
> I am also investigating if its possible to avoid if the case is 
>  
> 3) select  Cos(sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = sum(a) )
>  
> This Jira also is needed for another issue where subquery datasource v2  is 
> projecting columns which are not needed. ( no Jira filed yet for that, will 
> do that..)
>  
> Will be opening a PR for this soon..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41141) avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41141:


Assignee: (was: Apache Spark)

> avoid introducing a new aggregate expression in the analysis phase when 
> subquery is referencing it
> --
>
> Key: SPARK-41141
> URL: https://issues.apache.org/jira/browse/SPARK-41141
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Asif
>Priority: Minor
>  Labels: spark-sql
>
> Currently the  analyzer phase rules on subquery referencing the aggregate 
> expression in outer query, avoids introducing a new aggregate only for a 
> single level aggregate function. It introduces new aggregate expression for 
> nested aggregate functions.
> It is possible to avoid  adding this extra aggregate expression  easily, 
> atleast if the outer projection involving aggregate function is exactly same 
> as the one that is used in subquery, or if the outer query's projection 
> involving aggregate function is a subtree of the subquery's expression.
>  
> Thus consider the following 2 cases:
> 1) select  cos (sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = cos(sum(a)) )
> 2) select  sum(a) , b from t1  group by b having exists (select x from t2 
> where y = cos(sum(a)) )
>  
> In both the above cases, there is no need for adding extra aggregate 
> expression.
>  
> I am also investigating if its possible to avoid if the case is 
>  
> 3) select  Cos(sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = sum(a) )
>  
> This Jira also is needed for another issue where subquery datasource v2  is 
> projecting columns which are not needed. ( no Jira filed yet for that, will 
> do that..)
>  
> Will be opening a PR for this soon..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41141) avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635988#comment-17635988
 ] 

Apache Spark commented on SPARK-41141:
--

User 'ahshahid' has created a pull request for this issue:
https://github.com/apache/spark/pull/38714

> avoid introducing a new aggregate expression in the analysis phase when 
> subquery is referencing it
> --
>
> Key: SPARK-41141
> URL: https://issues.apache.org/jira/browse/SPARK-41141
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Asif
>Priority: Minor
>  Labels: spark-sql
>
> Currently the  analyzer phase rules on subquery referencing the aggregate 
> expression in outer query, avoids introducing a new aggregate only for a 
> single level aggregate function. It introduces new aggregate expression for 
> nested aggregate functions.
> It is possible to avoid  adding this extra aggregate expression  easily, 
> atleast if the outer projection involving aggregate function is exactly same 
> as the one that is used in subquery, or if the outer query's projection 
> involving aggregate function is a subtree of the subquery's expression.
>  
> Thus consider the following 2 cases:
> 1) select  cos (sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = cos(sum(a)) )
> 2) select  sum(a) , b from t1  group by b having exists (select x from t2 
> where y = cos(sum(a)) )
>  
> In both the above cases, there is no need for adding extra aggregate 
> expression.
>  
> I am also investigating if its possible to avoid if the case is 
>  
> 3) select  Cos(sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = sum(a) )
>  
> This Jira also is needed for another issue where subquery datasource v2  is 
> projecting columns which are not needed. ( no Jira filed yet for that, will 
> do that..)
>  
> Will be opening a PR for this soon..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41141) avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41141:


Assignee: Apache Spark

> avoid introducing a new aggregate expression in the analysis phase when 
> subquery is referencing it
> --
>
> Key: SPARK-41141
> URL: https://issues.apache.org/jira/browse/SPARK-41141
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Asif
>Assignee: Apache Spark
>Priority: Minor
>  Labels: spark-sql
>
> Currently the  analyzer phase rules on subquery referencing the aggregate 
> expression in outer query, avoids introducing a new aggregate only for a 
> single level aggregate function. It introduces new aggregate expression for 
> nested aggregate functions.
> It is possible to avoid  adding this extra aggregate expression  easily, 
> atleast if the outer projection involving aggregate function is exactly same 
> as the one that is used in subquery, or if the outer query's projection 
> involving aggregate function is a subtree of the subquery's expression.
>  
> Thus consider the following 2 cases:
> 1) select  cos (sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = cos(sum(a)) )
> 2) select  sum(a) , b from t1  group by b having exists (select x from t2 
> where y = cos(sum(a)) )
>  
> In both the above cases, there is no need for adding extra aggregate 
> expression.
>  
> I am also investigating if its possible to avoid if the case is 
>  
> 3) select  Cos(sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = sum(a) )
>  
> This Jira also is needed for another issue where subquery datasource v2  is 
> projecting columns which are not needed. ( no Jira filed yet for that, will 
> do that..)
>  
> Will be opening a PR for this soon..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41141) avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-18 Thread Asif (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Asif updated SPARK-41141:
-
Priority: Minor  (was: Major)

> avoid introducing a new aggregate expression in the analysis phase when 
> subquery is referencing it
> --
>
> Key: SPARK-41141
> URL: https://issues.apache.org/jira/browse/SPARK-41141
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Asif
>Priority: Minor
>  Labels: spark-sql
>
> Currently the  analyzer phase rules on subquery referencing the aggregate 
> expression in outer query, avoids introducing a new aggregate only for a 
> single level aggregate function. It introduces new aggregate expression for 
> nested aggregate functions.
> It is possible to avoid  adding this extra aggregate expression  easily, 
> atleast if the outer projection involving aggregate function is exactly same 
> as the one that is used in subquery, or if the outer query's projection 
> involving aggregate function is a subtree of the subquery's expression.
>  
> Thus consider the following 2 cases:
> 1) select  cos (sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = cos(sum(a)) )
> 2) select  sum(a) , b from t1  group by b having exists (select x from t2 
> where y = cos(sum(a)) )
>  
> In both the above cases, there is no need for adding extra aggregate 
> expression.
>  
> I am also investigating if its possible to avoid if the case is 
>  
> 3) select  Cos(sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = sum(a) )
>  
> This Jira also is needed for another issue where subquery datasource v2  is 
> projecting columns which are not needed. ( no Jira filed yet for that, will 
> do that..)
>  
> Will be opening a PR for this soon..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41196) Homogenize the protobuf version across server and client

2022-11-18 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-41196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-41196.
---
Fix Version/s: 3.4.0
 Assignee: Martin Grund
   Resolution: Fixed

> Homogenize the protobuf version across server and client
> 
>
> Key: SPARK-41196
> URL: https://issues.apache.org/jira/browse/SPARK-41196
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
> Fix For: 3.4.0
>
>
> Homogenize the protobuf version across server and client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41197) Upgrade Kafka version to 3.3 release

2022-11-18 Thread Ted Yu (Jira)
Ted Yu created SPARK-41197:
--

 Summary: Upgrade Kafka version to 3.3 release
 Key: SPARK-41197
 URL: https://issues.apache.org/jira/browse/SPARK-41197
 Project: Spark
  Issue Type: Improvement
  Components: Java API
Affects Versions: 3.3.1
Reporter: Ted Yu


Kafka 3.3 has been released.

This issue upgrades Kafka dependency to 3.3 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41196) Homogenize the protobuf version across server and client

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41196:


Assignee: (was: Apache Spark)

> Homogenize the protobuf version across server and client
> 
>
> Key: SPARK-41196
> URL: https://issues.apache.org/jira/browse/SPARK-41196
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> Homogenize the protobuf version across server and client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41196) Homogenize the protobuf version across server and client

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41196:


Assignee: Apache Spark

> Homogenize the protobuf version across server and client
> 
>
> Key: SPARK-41196
> URL: https://issues.apache.org/jira/browse/SPARK-41196
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Apache Spark
>Priority: Major
>
> Homogenize the protobuf version across server and client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41196) Homogenize the protobuf version across server and client

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635973#comment-17635973
 ] 

Apache Spark commented on SPARK-41196:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/38693

> Homogenize the protobuf version across server and client
> 
>
> Key: SPARK-41196
> URL: https://issues.apache.org/jira/browse/SPARK-41196
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> Homogenize the protobuf version across server and client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41196) Homogenize the protobuf version across server and client

2022-11-18 Thread Martin Grund (Jira)
Martin Grund created SPARK-41196:


 Summary: Homogenize the protobuf version across server and client
 Key: SPARK-41196
 URL: https://issues.apache.org/jira/browse/SPARK-41196
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


Homogenize the protobuf version across server and client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41195) Support PIVOT/UNPIVOT with join children

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41195:


Assignee: (was: Apache Spark)

> Support PIVOT/UNPIVOT with join children
> 
>
> Key: SPARK-41195
> URL: https://issues.apache.org/jira/browse/SPARK-41195
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41195) Support PIVOT/UNPIVOT with join children

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635916#comment-17635916
 ] 

Apache Spark commented on SPARK-41195:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/38713

> Support PIVOT/UNPIVOT with join children
> 
>
> Key: SPARK-41195
> URL: https://issues.apache.org/jira/browse/SPARK-41195
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41195) Support PIVOT/UNPIVOT with join children

2022-11-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41195:


Assignee: Apache Spark

> Support PIVOT/UNPIVOT with join children
> 
>
> Key: SPARK-41195
> URL: https://issues.apache.org/jira/browse/SPARK-41195
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37648) Spark catalog and Delta tables

2022-11-18 Thread Michael F (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635913#comment-17635913
 ] 

Michael F commented on SPARK-37648:
---

Any update here? This continues to be an issue in 3.3.1

> Spark catalog and Delta tables
> --
>
> Key: SPARK-37648
> URL: https://issues.apache.org/jira/browse/SPARK-37648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
> Environment: Spark version 3.1.2
> Scala version 2.12.10
> Hive version 2.3.7
> Delta version 1.0.0
>Reporter: Hanna Liashchuk
>Priority: Major
>
> I'm using Spark with Delta tables, while tables are created, there are no 
> columns in the table.
> Steps to reproduce:
> 1. Start spark-shell 
> {code:java}
> spark-shell --conf 
> "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
> "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
>  --conf "spark.sql.legacy.parquet.int96RebaseModeInWrite=LEGACY"{code}
> 2. Create delta table
> {code:java}
> spark.range(10).write.format("delta").option("path", 
> "tmp/delta").saveAsTable("delta"){code}
> 3. Make sure table exists 
> {code:java}
> spark.catalog.listTables.show{code}
> 4. Find out that columns are not
> {code:java}
> spark.catalog.listColumns("delta").show{code}
> This is critical for Delta integration with different BI tools such as Power 
> BI or Tableau, as they are querying spark catalog for the metadata and we are 
> getting errors that no columns are found. 
> Discussion can be found in Delta repository - 
> https://github.com/delta-io/delta/issues/695



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41195) Support PIVOT/UNPIVOT with join children

2022-11-18 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-41195:
---

 Summary: Support PIVOT/UNPIVOT with join children
 Key: SPARK-41195
 URL: https://issues.apache.org/jira/browse/SPARK-41195
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.4.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40999) Hints on subqueries are not properly propagated

2022-11-18 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-40999:
---

Assignee: Fredrik Klauß

> Hints on subqueries are not properly propagated
> ---
>
> Key: SPARK-40999
> URL: https://issues.apache.org/jira/browse/SPARK-40999
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 
> 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.4.0, 3.3.1
>Reporter: Fredrik Klauß
>Assignee: Fredrik Klauß
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, if a user tries to specify a query like the following, the hints 
> on the subquery will be lost. 
> {code:java}
> SELECT * FROM target t WHERE EXISTS
> (SELECT /*+ BROADCAST */ * FROM source s WHERE s.key = t.key){code}
> This happens as hints are removed from the plan and pulled into joins in the 
> beginning of the optimization stage, but subqueries are only turned into 
> joins during optimization. As we remove any hints that are not below a join, 
> we end up removing hints that are below a subquery. 
>  
> It worked prior to a refactoring that added hints as a field to joins 
> (SPARK-26065) and can cause a regression if someone made use of hints on 
> subqueries before.
>  
> To resolve this, we add a hint field to SubqueryExpression that any hints 
> inside a subquery's plan can be pulled into during EliminateResolvedHint, and 
> then pass this hint on when the subquery is turned into a join.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40999) Hints on subqueries are not properly propagated

2022-11-18 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-40999.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38497
[https://github.com/apache/spark/pull/38497]

> Hints on subqueries are not properly propagated
> ---
>
> Key: SPARK-40999
> URL: https://issues.apache.org/jira/browse/SPARK-40999
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 
> 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.4.0, 3.3.1
>Reporter: Fredrik Klauß
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, if a user tries to specify a query like the following, the hints 
> on the subquery will be lost. 
> {code:java}
> SELECT * FROM target t WHERE EXISTS
> (SELECT /*+ BROADCAST */ * FROM source s WHERE s.key = t.key){code}
> This happens as hints are removed from the plan and pulled into joins in the 
> beginning of the optimization stage, but subqueries are only turned into 
> joins during optimization. As we remove any hints that are not below a join, 
> we end up removing hints that are below a subquery. 
>  
> It worked prior to a refactoring that added hints as a field to joins 
> (SPARK-26065) and can cause a regression if someone made use of hints on 
> subqueries before.
>  
> To resolve this, we add a hint field to SubqueryExpression that any hints 
> inside a subquery's plan can be pulled into during EliminateResolvedHint, and 
> then pass this hint on when the subquery is turned into a join.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41161) Upgrade `scala-parser-combinators` to 2.1.1

2022-11-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41161:


Assignee: Yang Jie

> Upgrade `scala-parser-combinators` to 2.1.1
> ---
>
> Key: SPARK-41161
> URL: https://issues.apache.org/jira/browse/SPARK-41161
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41161) Upgrade `scala-parser-combinators` to 2.1.1

2022-11-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41161.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38675
[https://github.com/apache/spark/pull/38675]

> Upgrade `scala-parser-combinators` to 2.1.1
> ---
>
> Key: SPARK-41161
> URL: https://issues.apache.org/jira/browse/SPARK-41161
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41166) Check errorSubClass of DataTypeMismatch in *ExpressionSuites

2022-11-18 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-41166.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38688
[https://github.com/apache/spark/pull/38688]

> Check errorSubClass of DataTypeMismatch in *ExpressionSuites
> 
>
> Key: SPARK-41166
> URL: https://issues.apache.org/jira/browse/SPARK-41166
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41166) Check errorSubClass of DataTypeMismatch in *ExpressionSuites

2022-11-18 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-41166:


Assignee: BingKun Pan

> Check errorSubClass of DataTypeMismatch in *ExpressionSuites
> 
>
> Key: SPARK-41166
> URL: https://issues.apache.org/jira/browse/SPARK-41166
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38093) Set shuffleMergeAllowed to false for a determinate stage after the stage is finalized

2022-11-18 Thread Mars (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635786#comment-17635786
 ] 

Mars commented on SPARK-38093:
--

comment https://github.com/apache/spark/pull/34122#discussion_r796929787

> Set shuffleMergeAllowed to false for a determinate stage after the stage is 
> finalized
> -
>
> Key: SPARK-38093
> URL: https://issues.apache.org/jira/browse/SPARK-38093
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.2.1
>Reporter: Venkata krishnan Sowrirajan
>Priority: Major
>
> Currently we are setting shuffleMergeAllowed to false before 
> prepareShuffleServicesForShuffleMapStage if the shuffle dependency is already 
> finalized. Ideally it is better to do it right after shuffle dependency 
> finalization for a determinate stage. cc [~mridulm80]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41130) Rename OUT_OF_DECIMAL_TYPE_RANGE to NUMERIC_OUT_OF_SUPPORTED_RANGE

2022-11-18 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-41130:


Assignee: Haejoon Lee

> Rename OUT_OF_DECIMAL_TYPE_RANGE to NUMERIC_OUT_OF_SUPPORTED_RANGE
> --
>
> Key: SPARK-41130
> URL: https://issues.apache.org/jira/browse/SPARK-41130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> We should use proper name for error class and clear error message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41130) Rename OUT_OF_DECIMAL_TYPE_RANGE to NUMERIC_OUT_OF_SUPPORTED_RANGE

2022-11-18 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-41130.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38644
[https://github.com/apache/spark/pull/38644]

> Rename OUT_OF_DECIMAL_TYPE_RANGE to NUMERIC_OUT_OF_SUPPORTED_RANGE
> --
>
> Key: SPARK-41130
> URL: https://issues.apache.org/jira/browse/SPARK-41130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>
> We should use proper name for error class and clear error message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41192) Task finished before speculative task scheduled leads to holding idle executors

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635763#comment-17635763
 ] 

Apache Spark commented on SPARK-41192:
--

User 'toujours33' has created a pull request for this issue:
https://github.com/apache/spark/pull/38711

> Task finished before speculative task scheduled leads to holding idle 
> executors
> ---
>
> Key: SPARK-41192
> URL: https://issues.apache.org/jira/browse/SPARK-41192
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2, 3.3.1
>Reporter: Yazhi Wang
>Priority: Minor
>  Labels: dynamic_allocation
> Attachments: dynamic-executors, dynamic-log
>
>
> When task finished before speculative task has been scheduled by 
> DAGScheduler, then the speculative tasks will be considered as pending and 
> count towards the calculation of number of needed executors, which will lead 
> to request more executors than needed
> h2. Background & Reproduce
> In one of our production job, we found that ExecutorAllocationManager was 
> holding more executors than needed. 
> We found it's difficult to reproduce in the test environment. In order to 
> stably reproduce and debug, we temporarily annotated the scheduling code of 
> speculative tasks in TaskSetManager:363 to ensure that the task be completed 
> before the speculative task being scheduled.
> {code:java}
> // Original code
> private def dequeueTask(
>     execId: String,
>     host: String,
>     maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, 
> Boolean)] = {
>   // Tries to schedule a regular task first; if it returns None, then 
> schedules
>   // a speculative task
>   dequeueTaskHelper(execId, host, maxLocality, false).orElse(
>     dequeueTaskHelper(execId, host, maxLocality, true))
> } 
> // Speculative task will never be scheduled
> private def dequeueTask(
>     execId: String,
>     host: String,
>     maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, 
> Boolean)] = {
>   // Tries to schedule a regular task first; if it returns None, then 
> schedules
>   // a speculative task
>   dequeueTaskHelper(execId, host, maxLocality, false)
> }  {code}
> Referring to examples in SPARK-30511
> You will see when running the last task, we would be hold 38 executors (see 
> attachment), which is exactly (149 + 1) / 4 = 38. But actually there are only 
> 2 tasks in running, which requires Math.min(20, 2/4) = 20 executors indeed.
> {code:java}
> ./bin/spark-shell --master yarn --conf spark.speculation=true --conf 
> spark.executor.cores=4 --conf spark.dynamicAllocation.enabled=true --conf 
> spark.dynamicAllocation.minExecutors=20 --conf 
> spark.dynamicAllocation.maxExecutors=1000 {code}
> {code:java}
> val n = 4000
> val someRDD = sc.parallelize(1 to n, n)
> someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
> if (index > 3998) {
>     Thread.sleep(1000 * 1000)
> } else if (index > 3850) {
>     Thread.sleep(50 * 1000) // Fake running tasks
> } else {
>     Thread.sleep(100)
> }
> Array.fill[Int](1)(1).iterator{code}
>  
> I will have a PR ready to fix this issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41192) Task finished before speculative task scheduled leads to holding idle executors

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635762#comment-17635762
 ] 

Apache Spark commented on SPARK-41192:
--

User 'toujours33' has created a pull request for this issue:
https://github.com/apache/spark/pull/38711

> Task finished before speculative task scheduled leads to holding idle 
> executors
> ---
>
> Key: SPARK-41192
> URL: https://issues.apache.org/jira/browse/SPARK-41192
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2, 3.3.1
>Reporter: Yazhi Wang
>Priority: Minor
>  Labels: dynamic_allocation
> Attachments: dynamic-executors, dynamic-log
>
>
> When task finished before speculative task has been scheduled by 
> DAGScheduler, then the speculative tasks will be considered as pending and 
> count towards the calculation of number of needed executors, which will lead 
> to request more executors than needed
> h2. Background & Reproduce
> In one of our production job, we found that ExecutorAllocationManager was 
> holding more executors than needed. 
> We found it's difficult to reproduce in the test environment. In order to 
> stably reproduce and debug, we temporarily annotated the scheduling code of 
> speculative tasks in TaskSetManager:363 to ensure that the task be completed 
> before the speculative task being scheduled.
> {code:java}
> // Original code
> private def dequeueTask(
>     execId: String,
>     host: String,
>     maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, 
> Boolean)] = {
>   // Tries to schedule a regular task first; if it returns None, then 
> schedules
>   // a speculative task
>   dequeueTaskHelper(execId, host, maxLocality, false).orElse(
>     dequeueTaskHelper(execId, host, maxLocality, true))
> } 
> // Speculative task will never be scheduled
> private def dequeueTask(
>     execId: String,
>     host: String,
>     maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, 
> Boolean)] = {
>   // Tries to schedule a regular task first; if it returns None, then 
> schedules
>   // a speculative task
>   dequeueTaskHelper(execId, host, maxLocality, false)
> }  {code}
> Referring to examples in SPARK-30511
> You will see when running the last task, we would be hold 38 executors (see 
> attachment), which is exactly (149 + 1) / 4 = 38. But actually there are only 
> 2 tasks in running, which requires Math.min(20, 2/4) = 20 executors indeed.
> {code:java}
> ./bin/spark-shell --master yarn --conf spark.speculation=true --conf 
> spark.executor.cores=4 --conf spark.dynamicAllocation.enabled=true --conf 
> spark.dynamicAllocation.minExecutors=20 --conf 
> spark.dynamicAllocation.maxExecutors=1000 {code}
> {code:java}
> val n = 4000
> val someRDD = sc.parallelize(1 to n, n)
> someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
> if (index > 3998) {
>     Thread.sleep(1000 * 1000)
> } else if (index > 3850) {
>     Thread.sleep(50 * 1000) // Fake running tasks
> } else {
>     Thread.sleep(100)
> }
> Array.fill[Int](1)(1).iterator{code}
>  
> I will have a PR ready to fix this issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2022-11-18 Thread Daniel Carl Jones (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635756#comment-17635756
 ] 

Daniel Carl Jones commented on SPARK-38958:
---

Upgrade from V1 to V2 AWS SDK is likely to introduce a breaking change to the 
interface of the client factory, since we will be changing the Java interface 
being returned by the factory for starters.

What it will mean is the factory function signatures will need to be updated, 
and given the V2 SDK has a sync and an async client, may need a second method 
(with the same headers attached again).

> Override S3 Client in Spark Write/Read calls
> 
>
> Key: SPARK-38958
> URL: https://issues.apache.org/jira/browse/SPARK-38958
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Hershal
>Priority: Major
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2022-11-18 Thread Daniel Carl Jones (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635755#comment-17635755
 ] 

Daniel Carl Jones commented on SPARK-38958:
---

I had someone reach out to me with a similar request - static headers on all S3 
requests for a given S3A file system.

If static headers per fs by config were to be added as a feature, do we have 
any idea what configuration for a feature like this might look like? i.e. how 
do we model a list of key value pairs in the Hadoop configurations? Best I see 
is "getStrings" which we need to figure out if its even (right number of k,v 
pairs) or maybe have each k,v pair be one string joined by equals symbol.

Also, any reasons not to have such a configuration or any better way to design 
it?

> Override S3 Client in Spark Write/Read calls
> 
>
> Key: SPARK-38958
> URL: https://issues.apache.org/jira/browse/SPARK-38958
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Hershal
>Priority: Major
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41107) Install memory-profiler in the CI

2022-11-18 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang reassigned SPARK-41107:
---

Assignee: Xinrong Meng

> Install memory-profiler in the CI
> -
>
> Key: SPARK-41107
> URL: https://issues.apache.org/jira/browse/SPARK-41107
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> PySpark memory profiler depends on 
> [memory-profiler|https://pypi.org/project/memory-profiler/].
> The ticket proposes to install memory-profiler in the CI to enable related 
> tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41107) Install memory-profiler in the CI

2022-11-18 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang resolved SPARK-41107.
-
Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/38611

> Install memory-profiler in the CI
> -
>
> Key: SPARK-41107
> URL: https://issues.apache.org/jira/browse/SPARK-41107
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> PySpark memory profiler depends on 
> [memory-profiler|https://pypi.org/project/memory-profiler/].
> The ticket proposes to install memory-profiler in the CI to enable related 
> tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41189) Add an environment to switch on and off namedtuple hack

2022-11-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41189.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38700
[https://github.com/apache/spark/pull/38700]

> Add an environment to switch on and off namedtuple hack 
> 
>
> Key: SPARK-41189
> URL: https://issues.apache.org/jira/browse/SPARK-41189
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0, 3.3.1
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> SPARK-32079 removed the namedtuple hack but there are still bugs being fixed 
> in the cloudpickle upstream. This JIRA aims to have a switch to on and off 
> this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41189) Add an environment to switch on and off namedtuple hack

2022-11-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41189:


Assignee: Hyukjin Kwon

> Add an environment to switch on and off namedtuple hack 
> 
>
> Key: SPARK-41189
> URL: https://issues.apache.org/jira/browse/SPARK-41189
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0, 3.3.1
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> SPARK-32079 removed the namedtuple hack but there are still bugs being fixed 
> in the cloudpickle upstream. This JIRA aims to have a switch to on and off 
> this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org