[jira] [Updated] (SPARK-46149) Skip `TorchDistributorLocalUnitTests.test_end_to_end_run_locally` with Python 3.12

2023-11-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46149:
-
Description: 
{code}
==
ERROR [12.635s]: test_end_to_end_run_locally 
(pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 403, in test_end_to_end_run_locally
output = TorchDistributor(num_processes=2, local_mode=True, 
use_gpu=False).run(
 
^^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, in 
run
return self._run(
   ^^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, in 
_run
output = self._run_local_training(
 ^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, in 
_run_local_training
output = TorchDistributor._get_output_from_framework_wrapper(
 
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, in 
_get_output_from_framework_wrapper
return framework_wrapper(
   ^^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, in 
_run_training_on_pytorch_function
raise RuntimeError(
RuntimeError: TorchDistributor failed during training.View stdout logs for 
detailed error message.

==
ERROR [14.850s]: test_end_to_end_run_locally 
(pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 403, in test_end_to_end_run_locally
output = TorchDistributor(num_processes=2, local_mode=True, 
use_gpu=False).run(
 
^^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, in 
run
return self._run(
   ^^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, in 
_run
output = self._run_local_training(
 ^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, in 
_run_local_training
output = TorchDistributor._get_output_from_framework_wrapper(
 
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, in 
_get_output_from_framework_wrapper
return framework_wrapper(
   ^^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, in 
_run_training_on_pytorch_function
raise RuntimeError(
RuntimeError: TorchDistributor failed during training.View stdout logs for 
detailed error message.

--
{code}

https://github.com/apache/spark/actions/runs/7020654429/job/19100964890

  was:
{code}
==
ERROR [12.635s]: test_end_to_end_run_locally 
(pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 403, in test_end_to_end_run_locally
output = TorchDistributor(num_processes=2, local_mode=True, 
use_gpu=False).run(
 
^^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, in 
run
return self._run(
   ^^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, in 
_run
output = self._run_local_training(
 ^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, in 
_run_local_training
output = TorchDistributor._get_output_from_framework_wrapper(
 
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, in 
_get_output_from_framework_wrapper
return framework_wrapper(
   ^^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, in 
_run_training_on_pytorch_function

[jira] [Updated] (SPARK-46149) Skip `TorchDistributorLocalUnitTests.test_end_to_end_run_locally` with Python 3.12

2023-11-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46149:
-
Summary: Skip `TorchDistributorLocalUnitTests.test_end_to_end_run_locally` 
with Python 3.12  (was: Skip 
`TorchDistributorLocalUnitTests.test_end_to_end_run_locally`)

> Skip `TorchDistributorLocalUnitTests.test_end_to_end_run_locally` with Python 
> 3.12
> --
>
> Key: SPARK-46149
> URL: https://issues.apache.org/jira/browse/SPARK-46149
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> {code}
> ==
> ERROR [12.635s]: test_end_to_end_run_locally 
> (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally)
> --
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 403, in test_end_to_end_run_locally
> output = TorchDistributor(num_processes=2, local_mode=True, 
> use_gpu=False).run(
>  
> ^^
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, 
> in run
> return self._run(
>^^
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, 
> in _run
> output = self._run_local_training(
>  ^
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, 
> in _run_local_training
> output = TorchDistributor._get_output_from_framework_wrapper(
>  
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, 
> in _get_output_from_framework_wrapper
> return framework_wrapper(
>^^
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, 
> in _run_training_on_pytorch_function
> raise RuntimeError(
> RuntimeError: TorchDistributor failed during training.View stdout logs for 
> detailed error message.
> ==
> ERROR [14.850s]: test_end_to_end_run_locally 
> (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally)
> --
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 403, in test_end_to_end_run_locally
> output = TorchDistributor(num_processes=2, local_mode=True, 
> use_gpu=False).run(
>  
> ^^
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, 
> in run
> return self._run(
>^^
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, 
> in _run
> output = self._run_local_training(
>  ^
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, 
> in _run_local_training
> output = TorchDistributor._get_output_from_framework_wrapper(
>  
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, 
> in _get_output_from_framework_wrapper
> return framework_wrapper(
>^^
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, 
> in _run_training_on_pytorch_function
> raise RuntimeError(
> RuntimeError: TorchDistributor failed during training.View stdout logs for 
> detailed error message.
> --
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46149) Skip `TorchDistributorLocalUnitTests.test_end_to_end_run_locally`

2023-11-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46149:
-
Priority: Minor  (was: Major)

> Skip `TorchDistributorLocalUnitTests.test_end_to_end_run_locally`
> -
>
> Key: SPARK-46149
> URL: https://issues.apache.org/jira/browse/SPARK-46149
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> {code}
> ==
> ERROR [12.635s]: test_end_to_end_run_locally 
> (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally)
> --
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 403, in test_end_to_end_run_locally
> output = TorchDistributor(num_processes=2, local_mode=True, 
> use_gpu=False).run(
>  
> ^^
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, 
> in run
> return self._run(
>^^
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, 
> in _run
> output = self._run_local_training(
>  ^
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, 
> in _run_local_training
> output = TorchDistributor._get_output_from_framework_wrapper(
>  
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, 
> in _get_output_from_framework_wrapper
> return framework_wrapper(
>^^
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, 
> in _run_training_on_pytorch_function
> raise RuntimeError(
> RuntimeError: TorchDistributor failed during training.View stdout logs for 
> detailed error message.
> ==
> ERROR [14.850s]: test_end_to_end_run_locally 
> (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally)
> --
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 403, in test_end_to_end_run_locally
> output = TorchDistributor(num_processes=2, local_mode=True, 
> use_gpu=False).run(
>  
> ^^
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, 
> in run
> return self._run(
>^^
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, 
> in _run
> output = self._run_local_training(
>  ^
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, 
> in _run_local_training
> output = TorchDistributor._get_output_from_framework_wrapper(
>  
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, 
> in _get_output_from_framework_wrapper
> return framework_wrapper(
>^^
>   File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, 
> in _run_training_on_pytorch_function
> raise RuntimeError(
> RuntimeError: TorchDistributor failed during training.View stdout logs for 
> detailed error message.
> --
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46149) Skip `TorchDistributorLocalUnitTests.test_end_to_end_run_locally`

2023-11-28 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46149:


 Summary: Skip 
`TorchDistributorLocalUnitTests.test_end_to_end_run_locally`
 Key: SPARK-46149
 URL: https://issues.apache.org/jira/browse/SPARK-46149
 Project: Spark
  Issue Type: Sub-task
  Components: ML, PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
==
ERROR [12.635s]: test_end_to_end_run_locally 
(pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 403, in test_end_to_end_run_locally
output = TorchDistributor(num_processes=2, local_mode=True, 
use_gpu=False).run(
 
^^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, in 
run
return self._run(
   ^^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, in 
_run
output = self._run_local_training(
 ^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, in 
_run_local_training
output = TorchDistributor._get_output_from_framework_wrapper(
 
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, in 
_get_output_from_framework_wrapper
return framework_wrapper(
   ^^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, in 
_run_training_on_pytorch_function
raise RuntimeError(
RuntimeError: TorchDistributor failed during training.View stdout logs for 
detailed error message.

==
ERROR [14.850s]: test_end_to_end_run_locally 
(pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 403, in test_end_to_end_run_locally
output = TorchDistributor(num_processes=2, local_mode=True, 
use_gpu=False).run(
 
^^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, in 
run
return self._run(
   ^^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, in 
_run
output = self._run_local_training(
 ^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, in 
_run_local_training
output = TorchDistributor._get_output_from_framework_wrapper(
 
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, in 
_get_output_from_framework_wrapper
return framework_wrapper(
   ^^
  File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, in 
_run_training_on_pytorch_function
raise RuntimeError(
RuntimeError: TorchDistributor failed during training.View stdout logs for 
detailed error message.

--
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46148) Fix pyspark.pandas.mlflow.load_model test (Python 3.12)

2023-11-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46148:
-
Description: 
{code}
**
File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 172, in 
pyspark.pandas.mlflow.load_model
Failed example:
prediction_df
Exception raised:
Traceback (most recent call last):
  File "/usr/lib/python3.10/doctest.py", line 1350, in __run
exec(compile(example.source, filename, "single",
  File "", line 1, in 
prediction_df
  File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13291, in 
__repr__
pdf = cast("DataFrame", 
self._get_or_create_repr_pandas_cache(max_display_count))
  File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13282, in 
_get_or_create_repr_pandas_cache
self, "_repr_pandas_cache", {n: self.head(n + 1)._to_internal_pandas()}
  File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13277, in 
_to_internal_pandas
return self._internal.to_pandas_frame
  File "/__w/spark/spark/python/pyspark/pandas/utils.py", line 599, in 
wrapped_lazy_property
setattr(self, attr_name, fn(self))
  File "/__w/spark/spark/python/pyspark/pandas/internal.py", line 1110, in 
to_pandas_frame
pdf = sdf.toPandas()
  File "/__w/spark/spark/python/pyspark/sql/pandas/conversion.py", line 
213, in toPandas
rows = self.collect()
  File "/__w/spark/spark/python/pyspark/sql/dataframe.py", line 1369, in 
collect
sock_info = self._jdf.collectToPython()
  File 
"/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 
1322, in __call__
return_value = get_return_value(
  File "/__w/spark/spark/python/pyspark/errors/exceptions/captured.py", 
line 188, in deco
raise converted from None
pyspark.errors.exceptions.captured.PythonException: 
  An exception was thrown from the Python worker. Please see the stack 
trace below.
Traceback (most recent call last):
  File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1523, in main
process()
  File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1515, in process
serializer.dump_stream(out_iter, outfile)
  File 
"/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
line 485, in dump_stream
return ArrowStreamSerializer.dump_stream(self, 
init_stream_yield_batches(), stream)
  File 
"/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
line 101, in dump_stream
for batch in iterator:
  File 
"/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
line 478, in init_stream_yield_batches
for series in iterator:
  File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1284, in func
for result_batch, result_type in result_iter:
  File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", 
line 1619, in udf
yield _predict_row_batch(batch_predict_fn, row_batch_args)
  File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", 
line 1383, in _predict_row_batch
result = predict_fn(pdf, params)
  File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", 
line 1601, in batch_predict_fn
return loaded_model.predict(pdf, params=params)
  File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", 
line 491, in predict
return _predict()
  File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", 
line 477, in _predict
return self._predict_fn(data, params=params)
  File 
"/usr/local/lib/python3.10/dist-packages/mlflow/sklearn/__init__.py", line 517, 
in predict
return self.sklearn_model.predict(data)
  File 
"/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_base.py", line 
386, in predict
return self._decision_function(X)
  File 
"/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_base.py", line 
369, in _decision_function
X = self._validate_data(X, accept_sparse=["csr", "csc", "coo"], 
reset=False)
  File "/usr/local/lib/python3.10/dist-packages/sklearn/base.py", line 580, 
in _validate_data
self._check_feature_names(X, reset=reset)
  File "/usr/local/lib/python3.10/dist-packages/sklearn/base.py", line 507, 
in _check_feature_names
raise ValueError(message)
ValueError: The feature names should match those that were passed during 
fit.
Feature names unseen at fit time:
- 0
- 1
Feature names seen at fit time, yet now missing:
- x1
- x2



JVM stacktrace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 
in stage 1.0 failed 1 times, most recent 

[jira] [Created] (SPARK-46148) Fix pyspark.pandas.mlflow.load_model test (Python 3.12)

2023-11-28 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46148:


 Summary: Fix pyspark.pandas.mlflow.load_model test (Python 3.12)
 Key: SPARK-46148
 URL: https://issues.apache.org/jira/browse/SPARK-46148
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
**
File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 172, in 
pyspark.pandas.mlflow.load_model
Failed example:
prediction_df
Exception raised:
Traceback (most recent call last):
  File "/usr/lib/python3.10/doctest.py", line 1350, in __run
exec(compile(example.source, filename, "single",
  File "", line 1, in 
prediction_df
  File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13291, in 
__repr__
pdf = cast("DataFrame", 
self._get_or_create_repr_pandas_cache(max_display_count))
  File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13282, in 
_get_or_create_repr_pandas_cache
self, "_repr_pandas_cache", {n: self.head(n + 1)._to_internal_pandas()}
  File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13277, in 
_to_internal_pandas
return self._internal.to_pandas_frame
  File "/__w/spark/spark/python/pyspark/pandas/utils.py", line 599, in 
wrapped_lazy_property
setattr(self, attr_name, fn(self))
  File "/__w/spark/spark/python/pyspark/pandas/internal.py", line 1110, in 
to_pandas_frame
pdf = sdf.toPandas()
  File "/__w/spark/spark/python/pyspark/sql/pandas/conversion.py", line 
213, in toPandas
rows = self.collect()
  File "/__w/spark/spark/python/pyspark/sql/dataframe.py", line 1369, in 
collect
sock_info = self._jdf.collectToPython()
  File 
"/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 
1322, in __call__
return_value = get_return_value(
  File "/__w/spark/spark/python/pyspark/errors/exceptions/captured.py", 
line 188, in deco
raise converted from None
pyspark.errors.exceptions.captured.PythonException: 
  An exception was thrown from the Python worker. Please see the stack 
trace below.
Traceback (most recent call last):
  File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1523, in main
process()
  File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1515, in process
serializer.dump_stream(out_iter, outfile)
  File 
"/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
line 485, in dump_stream
return ArrowStreamSerializer.dump_stream(self, 
init_stream_yield_batches(), stream)
  File 
"/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
line 101, in dump_stream
for batch in iterator:
  File 
"/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
line 478, in init_stream_yield_batches
for series in iterator:
  File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1284, in func
for result_batch, result_type in result_iter:
  File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", 
line 1619, in udf
yield _predict_row_batch(batch_predict_fn, row_batch_args)
  File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", 
line 1383, in _predict_row_batch
result = predict_fn(pdf, params)
  File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", 
line 1601, in batch_predict_fn
return loaded_model.predict(pdf, params=params)
  File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", 
line 491, in predict
return _predict()
  File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", 
line 477, in _predict
return self._predict_fn(data, params=params)
  File 
"/usr/local/lib/python3.10/dist-packages/mlflow/sklearn/__init__.py", line 517, 
in predict
return self.sklearn_model.predict(data)
  File 
"/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_base.py", line 
386, in predict
return self._decision_function(X)
  File 
"/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_base.py", line 
369, in _decision_function
X = self._validate_data(X, accept_sparse=["csr", "csc", "coo"], 
reset=False)
  File "/usr/local/lib/python3.10/dist-packages/sklearn/base.py", line 580, 
in _validate_data
self._check_feature_names(X, reset=reset)
  File "/usr/local/lib/python3.10/dist-packages/sklearn/base.py", line 507, 
in _check_feature_names
raise ValueError(message)
ValueError: The feature names should match those that were passed during 
fit.
Feature names unseen at fit time:
- 0
- 1
Feature names seen 

[jira] [Updated] (SPARK-46147) Fix the doctest in pyspark.pandas.series.Series.to_dict (Python 3.12)

2023-11-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46147:
-
Fix Version/s: (was: 4.0.0)

> Fix the doctest in pyspark.pandas.series.Series.to_dict (Python 3.12)
> -
>
> Key: SPARK-46147
> URL: https://issues.apache.org/jira/browse/SPARK-46147
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>
> {code}
> File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 2515, in 
> pyspark.pandas.frame.DataFrame.to_dict
> Failed example:
> df.to_dict(into=OrderedDict)
> Expected:
> OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', 
> OrderedDict([('row1', 0.5), ('row2', 0.75)]))])
> Got:
> OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}), 'col2': 
> OrderedDict({'row1': 0.5, 'row2': 0.75})})
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46147) Fix the doctest in pyspark.pandas.series.Series.to_dict (Python 3.12)

2023-11-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46147:
-
Labels:   (was: pull-request-available)

> Fix the doctest in pyspark.pandas.series.Series.to_dict (Python 3.12)
> -
>
> Key: SPARK-46147
> URL: https://issues.apache.org/jira/browse/SPARK-46147
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 4.0.0
>
>
> {code}
> File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 2515, in 
> pyspark.pandas.frame.DataFrame.to_dict
> Failed example:
> df.to_dict(into=OrderedDict)
> Expected:
> OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', 
> OrderedDict([('row1', 0.5), ('row2', 0.75)]))])
> Got:
> OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}), 'col2': 
> OrderedDict({'row1': 0.5, 'row2': 0.75})})
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46147) Fix the doctest in pyspark.pandas.series.Series.to_dict (Python 3.12)

2023-11-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46147:
-
Description: 
{code}
File "/__w/spark/spark/python/pyspark/pandas/series.py", line 1633, in 
pyspark.pandas.series.Series.to_dict
Failed example:
s.to_dict(OrderedDict)
Expected:
OrderedDict([(0, 1), (1, 2), (2, 3), (3, 4)])
Got:
OrderedDict({0: 1, 1: 2, 2: 3, 3: 4})
{code}

  was:
{code}
File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 2515, in 
pyspark.pandas.frame.DataFrame.to_dict
Failed example:
df.to_dict(into=OrderedDict)
Expected:
OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', 
OrderedDict([('row1', 0.5), ('row2', 0.75)]))])
Got:
OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}), 'col2': 
OrderedDict({'row1': 0.5, 'row2': 0.75})})
{code}


> Fix the doctest in pyspark.pandas.series.Series.to_dict (Python 3.12)
> -
>
> Key: SPARK-46147
> URL: https://issues.apache.org/jira/browse/SPARK-46147
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>
> {code}
> File "/__w/spark/spark/python/pyspark/pandas/series.py", line 1633, in 
> pyspark.pandas.series.Series.to_dict
> Failed example:
> s.to_dict(OrderedDict)
> Expected:
> OrderedDict([(0, 1), (1, 2), (2, 3), (3, 4)])
> Got:
> OrderedDict({0: 1, 1: 2, 2: 3, 3: 4})
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46147) Fix the doctest in pyspark.pandas.series.Series.to_dict (Python 3.12)

2023-11-28 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46147:


 Summary: Fix the doctest in pyspark.pandas.series.Series.to_dict 
(Python 3.12)
 Key: SPARK-46147
 URL: https://issues.apache.org/jira/browse/SPARK-46147
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon
Assignee: Hyukjin Kwon
 Fix For: 4.0.0


{code}
File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 2515, in 
pyspark.pandas.frame.DataFrame.to_dict
Failed example:
df.to_dict(into=OrderedDict)
Expected:
OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', 
OrderedDict([('row1', 0.5), ('row2', 0.75)]))])
Got:
OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}), 'col2': 
OrderedDict({'row1': 0.5, 'row2': 0.75})})
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-46136) Do not combine adjacent Python UDFs if pythonExec is different

2023-11-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon deleted SPARK-46136:
-


> Do not combine adjacent Python UDFs if pythonExec is different
> --
>
> Key: SPARK-46136
> URL: https://issues.apache.org/jira/browse/SPARK-46136
> Project: Spark
>  Issue Type: Improvement
>Reporter: Hyukjin Kwon
>Priority: Major
>
> In {{ExtractPythonUDFs}}, we are combining the adjacent Python UDFs and run 
> them in one Python worker which should not happen if pythonExec is different.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46136) Do not combine adjacent Python UDFs if pythonExec is different

2023-11-28 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46136:


 Summary: Do not combine adjacent Python UDFs if pythonExec is 
different
 Key: SPARK-46136
 URL: https://issues.apache.org/jira/browse/SPARK-46136
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


In {{ExtractPythonUDFs}}, we are combining the adjacent Python UDFs and run 
them in one Python worker which should not happen if pythonExec is different.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32407) Remove upperbound of Sphinx version

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-32407.
--
Fix Version/s: 4.0.0
 Assignee: Ruifeng Zheng
   Resolution: Fixed

https://github.com/apache/spark/pull/44046

> Remove upperbound of Sphinx version
> ---
>
> Key: SPARK-32407
> URL: https://issues.apache.org/jira/browse/SPARK-32407
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Sphinx 3.1+ does not correctly indexes nested classes. See also 
> https://github.com/sphinx-doc/sphinx/issues/7551. We should remove the 
> upperbound of sphinx version once that issue is fixed in Sphinx.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46127) Flaky `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` with Python 3.12

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46127:


Assignee: Hyukjin Kwon

> Flaky 
> `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` 
> with Python 3.12
> ---
>
> Key: SPARK-46127
> URL: https://issues.apache.org/jira/browse/SPARK-46127
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> {code}
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/tests/test_worker.py", line 241, in 
> test_python_segfault
> self.sc.parallelize([1]).map(lambda x: f()).count()
>   File "/__w/spark/spark/python/pyspark/rdd.py", line 2315, in count
> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
>^^^
>   File "/__w/spark/spark/python/pyspark/rdd.py", line 2290, in sum
> return self.mapPartitions(lambda x: [sum(x)]).fold(  # type: 
> ignore[return-value]
>
> ^^
>   File "/__w/spark/spark/python/pyspark/rdd.py", line 2043, in fold
> vals = self.mapPartitions(func).collect()
>^^
>   File "/__w/spark/spark/python/pyspark/rdd.py", line 1832, in collect
> sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
> ^
>   File 
> "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", 
> line 1322, in __call__
> return_value = get_return_value(
>^
>   File "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", 
> line 326, in get_return_value
> raise Py4JJavaError(
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0) (localhost executor driver): org.apache.spark.SparkException: Python 
> worker exited unexpectedly (crashed)
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:560)
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:535)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35)
>   at 
> org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:863)
>   at 
> org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:843)
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:473)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.mutable.Growable.addAll(Growable.scala:61)
>   at scala.collection.mutable.Growable.addAll$(Growable.scala:57)
>   at scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:67)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
>   at py4j.Gateway.invoke(Gateway.java:282)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at 
> py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
>   at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
>   at java.base/java.lang.Thread.run(Thread.java:840)
> Caused by: org.apache.spark.SparkException: Python worker exited unexpectedly 
> (crashed)
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:560)
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:535)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35)
>   at 
> org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:863)
>   at 
> org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:843)
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:473)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at 

[jira] [Resolved] (SPARK-46131) Install torchvision for Python 3.12 build

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46131.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44045
[https://github.com/apache/spark/pull/44045]

> Install torchvision for Python 3.12 build
> -
>
> Key: SPARK-46131
> URL: https://issues.apache.org/jira/browse/SPARK-46131
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> ==
> ERROR [0.001s]: test_end_to_end_run_distributedly 
> (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorDistributedUnitTestsOnConnect.test_end_to_end_run_distributedly)
> --
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 495, in test_end_to_end_run_distributedly
> train_fn = create_training_function(self.mnist_dir_path)
>^
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 60, in create_training_function
> from torchvision import transforms, datasets
> ModuleNotFoundError: No module named 'torchvision'
> ==
> ERROR [0.001s]: test_end_to_end_run_locally 
> (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally)
> --
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 402, in test_end_to_end_run_locally
> train_fn = create_training_function(self.mnist_dir_path)
>^
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 60, in create_training_function
> from torchvision import transforms, datasets
> ModuleNotFoundError: No module named 'torchvision'
> ==
> ERROR [0.001s]: test_end_to_end_run_locally 
> (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally)
> --
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 402, in test_end_to_end_run_locally
> train_fn = create_training_function(self.mnist_dir_path)
>^
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 60, in create_training_function
> from torchvision import transforms, datasets
> ModuleNotFoundError: No module named 'torchvision'
> --
> Ran 23 tests in 50.860s
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46127) Flaky `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` with Python 3.12

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46127.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44044
[https://github.com/apache/spark/pull/44044]

> Flaky 
> `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` 
> with Python 3.12
> ---
>
> Key: SPARK-46127
> URL: https://issues.apache.org/jira/browse/SPARK-46127
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/tests/test_worker.py", line 241, in 
> test_python_segfault
> self.sc.parallelize([1]).map(lambda x: f()).count()
>   File "/__w/spark/spark/python/pyspark/rdd.py", line 2315, in count
> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
>^^^
>   File "/__w/spark/spark/python/pyspark/rdd.py", line 2290, in sum
> return self.mapPartitions(lambda x: [sum(x)]).fold(  # type: 
> ignore[return-value]
>
> ^^
>   File "/__w/spark/spark/python/pyspark/rdd.py", line 2043, in fold
> vals = self.mapPartitions(func).collect()
>^^
>   File "/__w/spark/spark/python/pyspark/rdd.py", line 1832, in collect
> sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
> ^
>   File 
> "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", 
> line 1322, in __call__
> return_value = get_return_value(
>^
>   File "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", 
> line 326, in get_return_value
> raise Py4JJavaError(
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0) (localhost executor driver): org.apache.spark.SparkException: Python 
> worker exited unexpectedly (crashed)
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:560)
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:535)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35)
>   at 
> org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:863)
>   at 
> org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:843)
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:473)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.mutable.Growable.addAll(Growable.scala:61)
>   at scala.collection.mutable.Growable.addAll$(Growable.scala:57)
>   at scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:67)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
>   at py4j.Gateway.invoke(Gateway.java:282)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at 
> py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
>   at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
>   at java.base/java.lang.Thread.run(Thread.java:840)
> Caused by: org.apache.spark.SparkException: Python worker exited unexpectedly 
> (crashed)
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:560)
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:535)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35)
>   at 
> org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:863)
>   at 
> org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:843)
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:473)
>   at 
> 

[jira] [Assigned] (SPARK-46131) Install torchvision for Python 3.12 build

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46131:


Assignee: Hyukjin Kwon

> Install torchvision for Python 3.12 build
> -
>
> Key: SPARK-46131
> URL: https://issues.apache.org/jira/browse/SPARK-46131
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> ==
> ERROR [0.001s]: test_end_to_end_run_distributedly 
> (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorDistributedUnitTestsOnConnect.test_end_to_end_run_distributedly)
> --
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 495, in test_end_to_end_run_distributedly
> train_fn = create_training_function(self.mnist_dir_path)
>^
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 60, in create_training_function
> from torchvision import transforms, datasets
> ModuleNotFoundError: No module named 'torchvision'
> ==
> ERROR [0.001s]: test_end_to_end_run_locally 
> (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally)
> --
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 402, in test_end_to_end_run_locally
> train_fn = create_training_function(self.mnist_dir_path)
>^
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 60, in create_training_function
> from torchvision import transforms, datasets
> ModuleNotFoundError: No module named 'torchvision'
> ==
> ERROR [0.001s]: test_end_to_end_run_locally 
> (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally)
> --
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 402, in test_end_to_end_run_locally
> train_fn = create_training_function(self.mnist_dir_path)
>^
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 60, in create_training_function
> from torchvision import transforms, datasets
> ModuleNotFoundError: No module named 'torchvision'
> --
> Ran 23 tests in 50.860s
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46131) Install torchvision for Python 3.12 build

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46131:
-
Description: 
{code}
==
ERROR [0.001s]: test_end_to_end_run_distributedly 
(pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorDistributedUnitTestsOnConnect.test_end_to_end_run_distributedly)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 495, in test_end_to_end_run_distributedly
train_fn = create_training_function(self.mnist_dir_path)
   ^
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 60, in create_training_function
from torchvision import transforms, datasets
ModuleNotFoundError: No module named 'torchvision'

==
ERROR [0.001s]: test_end_to_end_run_locally 
(pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 402, in test_end_to_end_run_locally
train_fn = create_training_function(self.mnist_dir_path)
   ^
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 60, in create_training_function
from torchvision import transforms, datasets
ModuleNotFoundError: No module named 'torchvision'

==
ERROR [0.001s]: test_end_to_end_run_locally 
(pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 402, in test_end_to_end_run_locally
train_fn = create_training_function(self.mnist_dir_path)
   ^
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 60, in create_training_function
from torchvision import transforms, datasets
ModuleNotFoundError: No module named 'torchvision'

--
Ran 23 tests in 50.860s
{code}

  was:
{code}
==
ERROR [0.001s]: test_end_to_end_run_distributedly 
(pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorDistributedUnitTestsOnConnect.test_end_to_end_run_distributedly)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 495, in test_end_to_end_run_distributedly
train_fn = create_training_function(self.mnist_dir_path)
   ^
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 60, in create_training_function
from torchvision import transforms, datasets
ModuleNotFoundError: No module named 'torchvision'

==
ERROR [0.001s]: test_end_to_end_run_locally 
(pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 402, in test_end_to_end_run_locally
train_fn = create_training_function(self.mnist_dir_path)
   ^
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 60, in create_training_function
from torchvision import transforms, datasets
ModuleNotFoundError: No module named 'torchvision'

==
ERROR [0.001s]: test_end_to_end_run_locally 
(pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 402, in test_end_to_end_run_locally
train_fn = create_training_function(self.mnist_dir_path)
   

[jira] [Created] (SPARK-46131) Install torchvision for Python 3.12 build

2023-11-27 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46131:


 Summary: Install torchvision for Python 3.12 build
 Key: SPARK-46131
 URL: https://issues.apache.org/jira/browse/SPARK-46131
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
==
ERROR [0.001s]: test_end_to_end_run_distributedly 
(pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorDistributedUnitTestsOnConnect.test_end_to_end_run_distributedly)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 495, in test_end_to_end_run_distributedly
train_fn = create_training_function(self.mnist_dir_path)
   ^
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 60, in create_training_function
from torchvision import transforms, datasets
ModuleNotFoundError: No module named 'torchvision'

==
ERROR [0.001s]: test_end_to_end_run_locally 
(pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 402, in test_end_to_end_run_locally
train_fn = create_training_function(self.mnist_dir_path)
   ^
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 60, in create_training_function
from torchvision import transforms, datasets
ModuleNotFoundError: No module named 'torchvision'

==
ERROR [0.001s]: test_end_to_end_run_locally 
(pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 402, in test_end_to_end_run_locally
train_fn = create_training_function(self.mnist_dir_path)
   ^
  File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
line 60, in create_training_function
from torchvision import transforms, datasets
ModuleNotFoundError: No module named 'torchvision'

--
Ran 23 tests in 50.860s
[code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46130) Reeanble `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` with Python 3.12

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46130:
-
Parent: SPARK-45981
Issue Type: Sub-task  (was: Test)

> Reeanble 
> `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` 
> with Python 3.12
> --
>
> Key: SPARK-46130
> URL: https://issues.apache.org/jira/browse/SPARK-46130
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> Reeanble https://issues.apache.org/jira/browse/SPARK-46127.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46130) Reeanble `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` with Python 3.12

2023-11-27 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46130:


 Summary: Reeanble 
`pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` 
with Python 3.12
 Key: SPARK-46130
 URL: https://issues.apache.org/jira/browse/SPARK-46130
 Project: Spark
  Issue Type: Test
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Reeanble https://issues.apache.org/jira/browse/SPARK-46127.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46126) Fix the doctest in pyspark.pandas.frame.DataFrame.to_dict (Python 3.12)

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46126:


Assignee: Hyukjin Kwon

> Fix the doctest in pyspark.pandas.frame.DataFrame.to_dict (Python 3.12)
> ---
>
> Key: SPARK-46126
> URL: https://issues.apache.org/jira/browse/SPARK-46126
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> {code}
> File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 2515, in 
> pyspark.pandas.frame.DataFrame.to_dict
> Failed example:
> df.to_dict(into=OrderedDict)
> Expected:
> OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', 
> OrderedDict([('row1', 0.5), ('row2', 0.75)]))])
> Got:
> OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}), 'col2': 
> OrderedDict({'row1': 0.5, 'row2': 0.75})})
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46126) Fix the doctest in pyspark.pandas.frame.DataFrame.to_dict (Python 3.12)

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46126.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44042
[https://github.com/apache/spark/pull/44042]

> Fix the doctest in pyspark.pandas.frame.DataFrame.to_dict (Python 3.12)
> ---
>
> Key: SPARK-46126
> URL: https://issues.apache.org/jira/browse/SPARK-46126
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 2515, in 
> pyspark.pandas.frame.DataFrame.to_dict
> Failed example:
> df.to_dict(into=OrderedDict)
> Expected:
> OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', 
> OrderedDict([('row1', 0.5), ('row2', 0.75)]))])
> Got:
> OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}), 'col2': 
> OrderedDict({'row1': 0.5, 'row2': 0.75})})
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46111) Add copyright to the PySpark official documentation.

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46111.
--
Fix Version/s: 4.0.0
 Assignee: Haejoon Lee
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/44026

> Add copyright to the PySpark official documentation.
> 
>
> Key: SPARK-46111
> URL: https://issues.apache.org/jira/browse/SPARK-46111
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add copyright to the PySpark official documentation by using Sphinx extension.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46127) Flaky `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` with Python 3.12

2023-11-27 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46127:


 Summary: Flaky 
`pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` 
with Python 3.12
 Key: SPARK-46127
 URL: https://issues.apache.org/jira/browse/SPARK-46127
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/tests/test_worker.py", line 241, in 
test_python_segfault
self.sc.parallelize([1]).map(lambda x: f()).count()
  File "/__w/spark/spark/python/pyspark/rdd.py", line 2315, in count
return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
   ^^^
  File "/__w/spark/spark/python/pyspark/rdd.py", line 2290, in sum
return self.mapPartitions(lambda x: [sum(x)]).fold(  # type: 
ignore[return-value]
   
^^
  File "/__w/spark/spark/python/pyspark/rdd.py", line 2043, in fold
vals = self.mapPartitions(func).collect()
   ^^
  File "/__w/spark/spark/python/pyspark/rdd.py", line 1832, in collect
sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
^
  File 
"/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 
1322, in __call__
return_value = get_return_value(
   ^
  File "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", 
line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 
0) (localhost executor driver): org.apache.spark.SparkException: Python worker 
exited unexpectedly (crashed)
at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:560)
at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:535)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35)
at 
org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:863)
at 
org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:843)
at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:473)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.mutable.Growable.addAll(Growable.scala:61)
at scala.collection.mutable.Growable.addAll$(Growable.scala:57)
at scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:67)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at 
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.apache.spark.SparkException: Python worker exited unexpectedly 
(crashed)
at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:560)
at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:535)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35)
at 
org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:863)
at 
org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:843)
at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:473)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.mutable.Growable.addAll(Growable.scala:61)
at scala.collection.mutable.Growable.addAll$(Growable.scala:57)
at scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:67)
at scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1346)
at scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1339)
at 
org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
at 

[jira] [Created] (SPARK-46126) Fix the doctest in pyspark.pandas.frame.DataFrame.to_dict (Python 3.12)

2023-11-27 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46126:


 Summary: Fix the doctest in pyspark.pandas.frame.DataFrame.to_dict 
(Python 3.12)
 Key: SPARK-46126
 URL: https://issues.apache.org/jira/browse/SPARK-46126
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 2515, in 
pyspark.pandas.frame.DataFrame.to_dict
Failed example:
df.to_dict(into=OrderedDict)
Expected:
OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', 
OrderedDict([('row1', 0.5), ('row2', 0.75)]))])
Got:
OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}), 'col2': 
OrderedDict({'row1': 0.5, 'row2': 0.75})})
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46121) Refine docstring of `concat/array_position/element_at/try_element_at`

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46121.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44039
[https://github.com/apache/spark/pull/44039]

> Refine docstring of `concat/array_position/element_at/try_element_at`
> -
>
> Key: SPARK-46121
> URL: https://issues.apache.org/jira/browse/SPARK-46121
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46121) Refine docstring of `concat/array_position/element_at/try_element_at`

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46121:


Assignee: Yang Jie

> Refine docstring of `concat/array_position/element_at/try_element_at`
> -
>
> Key: SPARK-46121
> URL: https://issues.apache.org/jira/browse/SPARK-46121
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46115) Restrict charsets in encode()

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46115.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44020
[https://github.com/apache/spark/pull/44020]

> Restrict charsets in encode()
> -
>
> Key: SPARK-46115
> URL: https://issues.apache.org/jira/browse/SPARK-46115
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently the list of supported charsets in encode() is not stable and fully 
> depends on the used JDK version. So, sometimes user code might not work 
> because a devop changed Java version in Spark cluster. The ticket aims to 
> restrict the list of supported charsets by:
> {code}
> 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45957) SQL on streaming Temp view fails

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45957:


Assignee: Jean-Francois Desjeans Gauthier

> SQL on streaming Temp view fails
> 
>
> Key: SPARK-45957
> URL: https://issues.apache.org/jira/browse/SPARK-45957
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Raghu Angadi
>Assignee: Jean-Francois Desjeans Gauthier
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The following code fails in the last step with Spark Connect.
> The root cause is that Connect server triggers physical plan on a streaming 
> Dataframe [in 
> SparkConnectPlanner.scala|https://github.com/apache/spark/blob/334d952f9555cbfad8ef84987d6f978eb6b37b9b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala#L2591].
>  Better to avoid that entirely, but at least for streaming it should be 
> avoided since it cannot be done with a batch execution engine.
> {code:java}
> df = spark.readStream.format("rate").option("numPartitions", "1").load()
> df.createOrReplaceTempView("temp_view")
> view_df = spark.sql("SELECT * FROM temp_view") // FAILS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45957) SQL on streaming Temp view fails

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45957.
--
Resolution: Fixed

Issue resolved by pull request 43851
[https://github.com/apache/spark/pull/43851]

> SQL on streaming Temp view fails
> 
>
> Key: SPARK-45957
> URL: https://issues.apache.org/jira/browse/SPARK-45957
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Raghu Angadi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The following code fails in the last step with Spark Connect.
> The root cause is that Connect server triggers physical plan on a streaming 
> Dataframe [in 
> SparkConnectPlanner.scala|https://github.com/apache/spark/blob/334d952f9555cbfad8ef84987d6f978eb6b37b9b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala#L2591].
>  Better to avoid that entirely, but at least for streaming it should be 
> avoided since it cannot be done with a batch execution engine.
> {code:java}
> df = spark.readStream.format("rate").option("numPartitions", "1").load()
> df.createOrReplaceTempView("temp_view")
> view_df = spark.sql("SELECT * FROM temp_view") // FAILS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46110) Use error classes in catalog, conf, connect, observation, pandas modules

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46110:


Assignee: Hyukjin Kwon

> Use error classes in catalog, conf, connect, observation, pandas modules
> 
>
> Key: SPARK-46110
> URL: https://issues.apache.org/jira/browse/SPARK-46110
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46110) Use error classes in catalog, conf, connect, observation, pandas modules

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46110.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44024
[https://github.com/apache/spark/pull/44024]

> Use error classes in catalog, conf, connect, observation, pandas modules
> 
>
> Key: SPARK-46110
> URL: https://issues.apache.org/jira/browse/SPARK-46110
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46120) Remove helper function DataFrame.withPlan

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46120:


Assignee: Ruifeng Zheng

> Remove helper function DataFrame.withPlan
> -
>
> Key: SPARK-46120
> URL: https://issues.apache.org/jira/browse/SPARK-46120
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46120) Remove helper function DataFrame.withPlan

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46120.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44037
[https://github.com/apache/spark/pull/44037]

> Remove helper function DataFrame.withPlan
> -
>
> Key: SPARK-46120
> URL: https://issues.apache.org/jira/browse/SPARK-46120
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46123) Using brighter color for document title for better visibility

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46123:


Assignee: Haejoon Lee

> Using brighter color for document title for better visibility
> -
>
> Key: SPARK-46123
> URL: https://issues.apache.org/jira/browse/SPARK-46123
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> With the increasing popularity of dark mode for its eye comfort and 
> energy-saving benefits, it's important to ensure that our documentation is 
> easily readable in both light and dark settings. The current title font color 
> in dark mode is not optimal for readability, which can hinder user 
> experience. By adjusting the color, we aim to enhance the overall 
> accessibility and readability of the PySpark documentation in dark mode. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46123) Using brighter color for document title for better visibility

2023-11-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46123.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44040
[https://github.com/apache/spark/pull/44040]

> Using brighter color for document title for better visibility
> -
>
> Key: SPARK-46123
> URL: https://issues.apache.org/jira/browse/SPARK-46123
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> With the increasing popularity of dark mode for its eye comfort and 
> energy-saving benefits, it's important to ensure that our documentation is 
> easily readable in both light and dark settings. The current title font color 
> in dark mode is not optimal for readability, which can hinder user 
> experience. By adjusting the color, we aim to enhance the overall 
> accessibility and readability of the PySpark documentation in dark mode. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46114) Define IndexError for PySpark error framework

2023-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46114:


Assignee: Hyukjin Kwon

> Define IndexError for PySpark error framework
> -
>
> Key: SPARK-46114
> URL: https://issues.apache.org/jira/browse/SPARK-46114
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46114) Define IndexError for PySpark error framework

2023-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46114.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44028
[https://github.com/apache/spark/pull/44028]

> Define IndexError for PySpark error framework
> -
>
> Key: SPARK-46114
> URL: https://issues.apache.org/jira/browse/SPARK-46114
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46114) Define IndexError for PySpark error framework

2023-11-26 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46114:


 Summary: Define IndexError for PySpark error framework
 Key: SPARK-46114
 URL: https://issues.apache.org/jira/browse/SPARK-46114
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46110) Use error classes in catalog, conf, connect, observation, pandas modules

2023-11-26 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46110:


 Summary: Use error classes in catalog, conf, connect, observation, 
pandas modules
 Key: SPARK-46110
 URL: https://issues.apache.org/jira/browse/SPARK-46110
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-46109) Migrate to error classes in PySpark

2023-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon deleted SPARK-46109:
-


> Migrate to error classes in PySpark
> ---
>
> Key: SPARK-46109
> URL: https://issues.apache.org/jira/browse/SPARK-46109
> Project: Spark
>  Issue Type: Umbrella
>Reporter: Hyukjin Kwon
>Priority: Major
>
> SPARK-41597 continues here to use error classes in PySpark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46109) Migrate to error classes in PySpark

2023-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46109:
-

> Migrate to error classes in PySpark
> ---
>
> Key: SPARK-46109
> URL: https://issues.apache.org/jira/browse/SPARK-46109
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> SPARK-41597 continues here to use error classes in PySpark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46109) Migrate to error classes in PySpark

2023-11-26 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46109:


 Summary: Migrate to error classes in PySpark
 Key: SPARK-46109
 URL: https://issues.apache.org/jira/browse/SPARK-46109
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


SPARK-41597 continues here to use error classes in PySpark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32933) Use keyword-only syntax for keyword_only methods

2023-11-26 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789874#comment-17789874
 ] 

Hyukjin Kwon commented on SPARK-32933:
--

Here the PR and JIRA: https://github.com/apache/spark/pull/44023 
https://issues.apache.org/jira/browse/SPARK-46107

> Use keyword-only syntax for keyword_only methods
> 
>
> Key: SPARK-32933
> URL: https://issues.apache.org/jira/browse/SPARK-32933
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Minor
> Fix For: 3.1.0
>
>
> Since 3.0, provides syntax for indicating keyword-only arguments ([PEP 
> 3102|https://www.python.org/dev/peps/pep-3102/]).
> It is not a full replacement for our current usage of {{keyword_only}}, but 
> it would allow us to make our expectations explicit:
> {code:python}
> @keyword_only
> def __init__(self, degree=2, inputCol=None, outputCol=None):
> {code}
> {code:python}
> @keyword_only
> def __init__(self, *, degree=2, inputCol=None, outputCol=None):
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46107) Deprecate pyspark.keyword_only API

2023-11-26 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46107:


 Summary: Deprecate pyspark.keyword_only API
 Key: SPARK-46107
 URL: https://issues.apache.org/jira/browse/SPARK-46107
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


See https://issues.apache.org/jira/browse/SPARK-32933. We don't need this 
anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46074) [CONNECT][SCALA] Insufficient details in error when a UDF fails

2023-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46074.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43983
[https://github.com/apache/spark/pull/43983]

> [CONNECT][SCALA] Insufficient details in error when a UDF fails
> ---
>
> Key: SPARK-46074
> URL: https://issues.apache.org/jira/browse/SPARK-46074
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Niranjan Jayakar
>Assignee: Niranjan Jayakar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, when a UDF fails the connect client does not receive the actual 
> error that caused the failure. 
> As an example, the error message looks like -
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: 
> grpc_shaded.io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to 
> stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost 
> task 2.3 in stage 0.0 (TID 10) (10.68.141.158 executor 0): 
> org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user 
> defined function (` (Main$$$Lambda$4770/1714264622)`: (int) => int). 
> SQLSTATE: 39000 {code}
> In this case, the actual error was a {{{}java.lang.NoClassDefFoundError{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46074) [CONNECT][SCALA] Insufficient details in error when a UDF fails

2023-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46074:


Assignee: Niranjan Jayakar

> [CONNECT][SCALA] Insufficient details in error when a UDF fails
> ---
>
> Key: SPARK-46074
> URL: https://issues.apache.org/jira/browse/SPARK-46074
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Niranjan Jayakar
>Assignee: Niranjan Jayakar
>Priority: Major
>  Labels: pull-request-available
>
> Currently, when a UDF fails the connect client does not receive the actual 
> error that caused the failure. 
> As an example, the error message looks like -
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: 
> grpc_shaded.io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to 
> stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost 
> task 2.3 in stage 0.0 (TID 10) (10.68.141.158 executor 0): 
> org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user 
> defined function (` (Main$$$Lambda$4770/1714264622)`: (int) => int). 
> SQLSTATE: 39000 {code}
> In this case, the actual error was a {{{}java.lang.NoClassDefFoundError{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45922) Multiple policies follow-up (Python)

2023-11-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45922.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43800
[https://github.com/apache/spark/pull/43800]

> Multiple policies follow-up (Python)
> 
>
> Key: SPARK-45922
> URL: https://issues.apache.org/jira/browse/SPARK-45922
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Alice Sayutina
>Assignee: Alice Sayutina
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Minor further improvements for multiple policies work



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46016) Fix pandas API support list properly

2023-11-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46016.
--
Fix Version/s: 3.4.2
   4.0.0
   3.5.1
 Assignee: Haejoon Lee
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/43996

> Fix pandas API support list properly
> 
>
> Key: SPARK-46016
> URL: https://issues.apache.org/jira/browse/SPARK-46016
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.2, 4.0.0, 3.5.1
>
>
> Currently Supported pandas API is not generated properly, so we should fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46082) Fix protobuf string representation for Pandas Functions API with Spark Connect

2023-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46082:


Assignee: Hyukjin Kwon

> Fix protobuf string representation for Pandas Functions API with Spark Connect
> --
>
> Key: SPARK-46082
> URL: https://issues.apache.org/jira/browse/SPARK-46082
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> {code}
> df = spark.range(1)
> df.mapInPandas(lambda x: x, df.schema)._plan.print()
> {code}
> prints as below. It should includes functions.
> {code}
> 
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46082) Fix protobuf string representation for Pandas Functions API with Spark Connect

2023-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46082.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43991
[https://github.com/apache/spark/pull/43991]

> Fix protobuf string representation for Pandas Functions API with Spark Connect
> --
>
> Key: SPARK-46082
> URL: https://issues.apache.org/jira/browse/SPARK-46082
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> df = spark.range(1)
> df.mapInPandas(lambda x: x, df.schema)._plan.print()
> {code}
> prints as below. It should includes functions.
> {code}
> 
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46085) Dataset.groupingSets in Scala Spark Connect client

2023-11-23 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46085:


 Summary: Dataset.groupingSets in Scala Spark Connect client
 Key: SPARK-46085
 URL: https://issues.apache.org/jira/browse/SPARK-46085
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Scala Spark Connect client for SPARK-45929



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46083) Make SparkNoSuchElementException as a canonical error API

2023-11-23 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46083:


 Summary: Make SparkNoSuchElementException as a canonical error API
 Key: SPARK-46083
 URL: https://issues.apache.org/jira/browse/SPARK-46083
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


https://github.com/apache/spark/pull/43927 added SparkNoSuchElementException. 
It should be a canonical error API, documented properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46082) Fix protobuf string representation for Pandas Functions API with Spark Connect

2023-11-23 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46082:


 Summary: Fix protobuf string representation for Pandas Functions 
API with Spark Connect
 Key: SPARK-46082
 URL: https://issues.apache.org/jira/browse/SPARK-46082
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
df = spark.range(1)
df.mapInPandas(lambda x: x, df.schema)._plan.print()
{code}

prints as below. It should includes functions.

{code}

  
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46080) Upgrade Cloudpickle to 3.0.0

2023-11-23 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46080:


 Summary: Upgrade Cloudpickle to 3.0.0
 Key: SPARK-46080
 URL: https://issues.apache.org/jira/browse/SPARK-46080
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


It includes official support of Python 3.12 
(https://github.com/cloudpipe/cloudpickle/pull/517)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46076) Remove `unittest` deprecated alias usage for Python 3.12

2023-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46076.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43986
[https://github.com/apache/spark/pull/43986]

> Remove `unittest` deprecated alias usage for Python 3.12
> 
>
> Key: SPARK-46076
> URL: https://issues.apache.org/jira/browse/SPARK-46076
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46065) Refactor `(DataFrame|Series).factorize()` to use `create_map`.

2023-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46065.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43970
[https://github.com/apache/spark/pull/43970]

> Refactor `(DataFrame|Series).factorize()` to use `create_map`.
> --
>
> Key: SPARK-46065
> URL: https://issues.apache.org/jira/browse/SPARK-46065
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We can accept Column object for Column.__getitem__ on remote Session, so we 
> can optimize the existing factorize implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46065) Refactor `(DataFrame|Series).factorize()` to use `create_map`.

2023-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46065:


Assignee: Haejoon Lee

> Refactor `(DataFrame|Series).factorize()` to use `create_map`.
> --
>
> Key: SPARK-46065
> URL: https://issues.apache.org/jira/browse/SPARK-46065
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> We can accept Column object for Column.__getitem__ on remote Session, so we 
> can optimize the existing factorize implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-46049) Support groupingSets operation in PySpark (Spark Connect)

2023-11-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon deleted SPARK-46049:
-


> Support groupingSets operation in PySpark (Spark Connect)
> -
>
> Key: SPARK-46049
> URL: https://issues.apache.org/jira/browse/SPARK-46049
> Project: Spark
>  Issue Type: New Feature
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Connect version of SPARK-46048



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46063) Improve error messages related to argument types in cute, rollup, groupby, and pivot

2023-11-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46063:
-
Description: 
{code}
>>> spark.range(1).cube(cols=1.2)
Traceback (most recent call last):
  File "", line 1, in 
  File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube
raise PySparkTypeError(
pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument 
`cube` should be a Column or str, got float.
{code}

{code}
>>> help(spark.range(1).cube)
Help on method cube in module pyspark.sql.connect.dataframe:

cube(*cols: 'ColumnOrName') -> 'GroupedData' method of 
pyspark.sql.connect.dataframe.DataFrame instance
Create a multi-dimensional cube for the current :class:`DataFrame` using
the specified columns, allowing aggregations to be performed on them.

.. versionadded:: 1.4.0

.. versionchanged:: 3.4.0
{code}

it has to be {cols}

  was:
{code}
>>> spark.range(1).cube(cols=1.2)
Traceback (most recent call last):
  File "", line 1, in 
  File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube
raise PySparkTypeError(
pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument 
`cube` should be a Column or str, got float.
{code}

```
Help on method cube in module pyspark.sql.connect.dataframe:

cube(*cols: 'ColumnOrName') -> 'GroupedData' method of 
pyspark.sql.connect.dataframe.DataFrame instance
Create a multi-dimensional cube for the current :class:`DataFrame` using
the specified columns, allowing aggregations to be performed on them.

.. versionadded:: 1.4.0

.. versionchanged:: 3.4.0
```

it has to be {cols}


> Improve error messages related to argument types in cute, rollup, groupby, 
> and pivot
> 
>
> Key: SPARK-46063
> URL: https://issues.apache.org/jira/browse/SPARK-46063
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> {code}
> >>> spark.range(1).cube(cols=1.2)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube
> raise PySparkTypeError(
> pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument 
> `cube` should be a Column or str, got float.
> {code}
> {code}
> >>> help(spark.range(1).cube)
> Help on method cube in module pyspark.sql.connect.dataframe:
> cube(*cols: 'ColumnOrName') -> 'GroupedData' method of 
> pyspark.sql.connect.dataframe.DataFrame instance
> Create a multi-dimensional cube for the current :class:`DataFrame` using
> the specified columns, allowing aggregations to be performed on them.
> .. versionadded:: 1.4.0
> .. versionchanged:: 3.4.0
> {code}
> it has to be {cols}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46063) Improve error messages related to argument types in cute, rollup, groupby, and pivot

2023-11-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46063:
-
Summary: Improve error messages related to argument types in cute, rollup, 
groupby, and pivot  (was: Improve error messages related to argument types in 
cute, rollup, and pivot)

> Improve error messages related to argument types in cute, rollup, groupby, 
> and pivot
> 
>
> Key: SPARK-46063
> URL: https://issues.apache.org/jira/browse/SPARK-46063
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> {code}
> >>> spark.range(1).cube(cols=1.2)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube
> raise PySparkTypeError(
> pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument 
> `cube` should be a Column or str, got float.
> {code}
> ```
> Help on method cube in module pyspark.sql.connect.dataframe:
> cube(*cols: 'ColumnOrName') -> 'GroupedData' method of 
> pyspark.sql.connect.dataframe.DataFrame instance
> Create a multi-dimensional cube for the current :class:`DataFrame` using
> the specified columns, allowing aggregations to be performed on them.
> .. versionadded:: 1.4.0
> .. versionchanged:: 3.4.0
> ```
> it has to be {cols}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46063) Improve error messages related to argument types in cute, rollup, and pivot

2023-11-22 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46063:


 Summary: Improve error messages related to argument types in cute, 
rollup, and pivot
 Key: SPARK-46063
 URL: https://issues.apache.org/jira/browse/SPARK-46063
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
>>> spark.range(1).cube(cols=1.2)
Traceback (most recent call last):
  File "", line 1, in 
  File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube
raise PySparkTypeError(
pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument 
`cube` should be a Column or str, got float.
{code}

```
Help on method cube in module pyspark.sql.connect.dataframe:

cube(*cols: 'ColumnOrName') -> 'GroupedData' method of 
pyspark.sql.connect.dataframe.DataFrame instance
Create a multi-dimensional cube for the current :class:`DataFrame` using
the specified columns, allowing aggregations to be performed on them.

.. versionadded:: 1.4.0

.. versionchanged:: 3.4.0
```

it has to be {cols}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46061) Add the test party for reattach test case

2023-11-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46061:


Assignee: Hyukjin Kwon

> Add the test party for reattach test case
> -
>
> Key: SPARK-46061
> URL: https://issues.apache.org/jira/browse/SPARK-46061
> Project: Spark
>  Issue Type: Test
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> We need the same test "ReleaseSession releases all queries and does not allow 
> more requests in the session" added in SPARK-45798 to identify an issue like 
> SPARK-46042.
> This is caused by SPARK-46039



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46061) Add the test party for reattach test case

2023-11-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46061.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43965
[https://github.com/apache/spark/pull/43965]

> Add the test party for reattach test case
> -
>
> Key: SPARK-46061
> URL: https://issues.apache.org/jira/browse/SPARK-46061
> Project: Spark
>  Issue Type: Test
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We need the same test "ReleaseSession releases all queries and does not allow 
> more requests in the session" added in SPARK-45798 to identify an issue like 
> SPARK-46042.
> This is caused by SPARK-46039



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45600) Make Python data source registration session level

2023-11-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45600.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43742
[https://github.com/apache/spark/pull/43742]

> Make Python data source registration session level
> --
>
> Key: SPARK-45600
> URL: https://issues.apache.org/jira/browse/SPARK-45600
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, registered data sources are stored in `sharedState` and can be 
> accessed across multiple sessions. This, however, will not work with Spark 
> Connect. We should make this registration session level, and support static 
> registration (e.g. using pip install) in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45600) Make Python data source registration session level

2023-11-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45600:


Assignee: Allison Wang

> Make Python data source registration session level
> --
>
> Key: SPARK-45600
> URL: https://issues.apache.org/jira/browse/SPARK-45600
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Currently, registered data sources are stored in `sharedState` and can be 
> accessed across multiple sessions. This, however, will not work with Spark 
> Connect. We should make this registration session level, and support static 
> registration (e.g. using pip install) in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46061) Add the test party for reattach test case

2023-11-22 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46061:


 Summary: Add the test party for reattach test case
 Key: SPARK-46061
 URL: https://issues.apache.org/jira/browse/SPARK-46061
 Project: Spark
  Issue Type: New Feature
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


We need the same test "ReleaseSession releases all queries and does not allow 
more requests in the session" added in SPARK-45798 to identify an issue like 
SPARK-46042.

This is caused by SPARK-46039



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46061) Add the test party for reattach test case

2023-11-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46061:
-
Issue Type: Test  (was: New Feature)

> Add the test party for reattach test case
> -
>
> Key: SPARK-46061
> URL: https://issues.apache.org/jira/browse/SPARK-46061
> Project: Spark
>  Issue Type: Test
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We need the same test "ReleaseSession releases all queries and does not allow 
> more requests in the session" added in SPARK-45798 to identify an issue like 
> SPARK-46042.
> This is caused by SPARK-46039



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46048) Support groupingSets operation in PySpark

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46048.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43951
[https://github.com/apache/spark/pull/43951]

> Support groupingSets operation in PySpark
> -
>
> Key: SPARK-46048
> URL: https://issues.apache.org/jira/browse/SPARK-46048
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Python version of SPARK-45929



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46048) Support groupingSets operation in PySpark

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46048:


Assignee: Hyukjin Kwon

> Support groupingSets operation in PySpark
> -
>
> Key: SPARK-46048
> URL: https://issues.apache.org/jira/browse/SPARK-46048
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Python version of SPARK-45929



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46048) Support groupingSets operation in PySpark

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46048:
-
Issue Type: New Feature  (was: Bug)

> Support groupingSets operation in PySpark
> -
>
> Key: SPARK-46048
> URL: https://issues.apache.org/jira/browse/SPARK-46048
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Python version of SPARK-45929



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46048) Support groupingSets operation in PySpark

2023-11-21 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46048:


 Summary: Support groupingSets operation in PySpark
 Key: SPARK-46048
 URL: https://issues.apache.org/jira/browse/SPARK-46048
 Project: Spark
  Issue Type: Bug
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Python version of SPARK-45929



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46049) Support groupingSets operation in PySpark (Spark Connect)

2023-11-21 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46049:


 Summary: Support groupingSets operation in PySpark (Spark Connect)
 Key: SPARK-46049
 URL: https://issues.apache.org/jira/browse/SPARK-46049
 Project: Spark
  Issue Type: New Feature
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Connect version of SPARK-46048



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46017) PySpark doc build doesn't work properly on Mac

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46017:


Assignee: Haejoon Lee

> PySpark doc build doesn't work properly on Mac
> --
>
> Key: SPARK-46017
> URL: https://issues.apache.org/jira/browse/SPARK-46017
> Project: Spark
>  Issue Type: Bug
>  Components: Build, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> PySpark doc build is working properly on GitHub CI, but doesn't work properly 
> on local Mac env for some reason. We should investigate and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46022) Remove deprecated functions APIs from documents

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46022.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43932
[https://github.com/apache/spark/pull/43932]

> Remove deprecated functions APIs from documents
> ---
>
> Key: SPARK-46022
> URL: https://issues.apache.org/jira/browse/SPARK-46022
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We should not expose the deprecated APIs on official documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46022) Remove deprecated functions APIs from documents

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46022:


Assignee: Haejoon Lee

> Remove deprecated functions APIs from documents
> ---
>
> Key: SPARK-46022
> URL: https://issues.apache.org/jira/browse/SPARK-46022
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> We should not expose the deprecated APIs on official documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46013) Improve basic datasource examples

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46013:


Assignee: Allison Wang

> Improve basic datasource examples
> -
>
> Key: SPARK-46013
> URL: https://issues.apache.org/jira/browse/SPARK-46013
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> We should improve the Python examples on this page: 
> [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html]
>  (basic_datasource_examples.py)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46017) PySpark doc build doesn't work properly on Mac

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46017.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43932
[https://github.com/apache/spark/pull/43932]

> PySpark doc build doesn't work properly on Mac
> --
>
> Key: SPARK-46017
> URL: https://issues.apache.org/jira/browse/SPARK-46017
> Project: Spark
>  Issue Type: Bug
>  Components: Build, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 4.0.0
>
>
> PySpark doc build is working properly on GitHub CI, but doesn't work properly 
> on local Mac env for some reason. We should investigate and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46013) Improve basic datasource examples

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46013.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43917
[https://github.com/apache/spark/pull/43917]

> Improve basic datasource examples
> -
>
> Key: SPARK-46013
> URL: https://issues.apache.org/jira/browse/SPARK-46013
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We should improve the Python examples on this page: 
> [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html]
>  (basic_datasource_examples.py)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46042) Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46042:
-
Description: 
https://github.com/apache/spark/pull/43942#issuecomment-1821896165

> Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite
> 
>
> Key: SPARK-46042
> URL: https://issues.apache.org/jira/browse/SPARK-46042
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> https://github.com/apache/spark/pull/43942#issuecomment-1821896165



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f

2023-11-21 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788430#comment-17788430
 ] 

Hyukjin Kwon commented on SPARK-46032:
--

Are executors using the same versions too? The error is most likely from a 
different version of JDK and Scala version. I can't reproduce them locally so 
sharing fulll specification of the server and the client would be very helpful.

> connect: cannot assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f
> -
>
> Key: SPARK-46032
> URL: https://issues.apache.org/jira/browse/SPARK-46032
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Bobby Wang
>Priority: Major
>
> I downloaded spark 3.5 from the spark official website, and then I started a 
> Spark Standalone cluster in which both master and the only worker are in the 
> same node. 
>  
> Then I started the connect server by 
> {code:java}
> start-connect-server.sh \
>     --master spark://10.19.183.93:7077 \
>     --packages org.apache.spark:spark-connect_2.12:3.5.0 \
>     --conf spark.executor.cores=12 \
>     --conf spark.task.cpus=1 \
>     --executor-memory 30G \
>     --conf spark.executor.resource.gpu.amount=1 \
>     --conf spark.task.resource.gpu.amount=0.08 \
>     --driver-memory 1G{code}
>  
> I can 100% ensure the spark standalone cluster, the connect server and spark 
> driver are started observed from the webui.
>  
> Finally, I tried to run a very simple spark job 
> (spark.range(100).filter("id>2").collect()) from spark-connect-client using 
> pyspark, but I got the below error.
>  
> _pyspark --remote sc://localhost_
> _Python 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] on linux_
> _Type "help", "copyright", "credits" or "license" for more information._
> _Welcome to_
>       _              ___
>      _/ __/_  {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_
>     {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/  '{_}/{_}
>    {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\   version 3.5.0{_}
>       {_}/{_}/_
>  
> _Using Python version 3.10.0 (default, Mar  3 2022 09:58:08)_
> _Client connected to the Spark Connect server at localhost_
> _SparkSession available as 'spark'._
> _>>> spark.range(100).filter("id > 3").collect()_
> _Traceback (most recent call last):_
>   _File "", line 1, in _
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py",
>  line 1645, in collect_
>     _table, schema = self._session.client.to_table(query)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 858, in to_table_
>     _table, schema, _, _, _ = self._execute_and_fetch(req)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1282, in _execute_and_fetch_
>     _for response in self._execute_and_fetch_as_iterator(req):_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1263, in _execute_and_fetch_as_iterator_
>     _self._handle_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1502, in _handle_error_
>     _self._handle_rpc_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1538, in _handle_rpc_error_
>     _raise convert_exception(info, status.message) from None_
> _pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot 
> assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD_
> _at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_
> _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at 

[jira] [Commented] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f

2023-11-21 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788381#comment-17788381
 ] 

Hyukjin Kwon commented on SPARK-46032:
--

and can you run without Spark Connect? Seems like just regular Spark shell 
would fail given the error messages.

> connect: cannot assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f
> -
>
> Key: SPARK-46032
> URL: https://issues.apache.org/jira/browse/SPARK-46032
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Bobby Wang
>Priority: Major
>
> I downloaded spark 3.5 from the spark official website, and then I started a 
> Spark Standalone cluster in which both master and the only worker are in the 
> same node. 
>  
> Then I started the connect server by 
> {code:java}
> start-connect-server.sh \
>     --master spark://10.19.183.93:7077 \
>     --packages org.apache.spark:spark-connect_2.12:3.5.0 \
>     --conf spark.executor.cores=12 \
>     --conf spark.task.cpus=1 \
>     --executor-memory 30G \
>     --conf spark.executor.resource.gpu.amount=1 \
>     --conf spark.task.resource.gpu.amount=0.08 \
>     --driver-memory 1G{code}
>  
> I can 100% ensure the spark standalone cluster, the connect server and spark 
> driver are started observed from the webui.
>  
> Finally, I tried to run a very simple spark job 
> (spark.range(100).filter("id>2").collect()) from spark-connect-client using 
> pyspark, but I got the below error.
>  
> _pyspark --remote sc://localhost_
> _Python 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] on linux_
> _Type "help", "copyright", "credits" or "license" for more information._
> _Welcome to_
>       _              ___
>      _/ __/_  {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_
>     {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/  '{_}/{_}
>    {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\   version 3.5.0{_}
>       {_}/{_}/_
>  
> _Using Python version 3.10.0 (default, Mar  3 2022 09:58:08)_
> _Client connected to the Spark Connect server at localhost_
> _SparkSession available as 'spark'._
> _>>> spark.range(100).filter("id > 3").collect()_
> _Traceback (most recent call last):_
>   _File "", line 1, in _
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py",
>  line 1645, in collect_
>     _table, schema = self._session.client.to_table(query)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 858, in to_table_
>     _table, schema, _, _, _ = self._execute_and_fetch(req)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1282, in _execute_and_fetch_
>     _for response in self._execute_and_fetch_as_iterator(req):_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1263, in _execute_and_fetch_as_iterator_
>     _self._handle_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1502, in _handle_error_
>     _self._handle_rpc_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1538, in _handle_rpc_error_
>     _raise convert_exception(info, status.message) from None_
> _pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot 
> assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD_
> _at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_
> _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at 

[jira] [Commented] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f

2023-11-21 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788380#comment-17788380
 ] 

Hyukjin Kwon commented on SPARK-46032:
--

What's your Scala version [~wbo4958]?

> connect: cannot assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f
> -
>
> Key: SPARK-46032
> URL: https://issues.apache.org/jira/browse/SPARK-46032
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Bobby Wang
>Priority: Major
>
> I downloaded spark 3.5 from the spark official website, and then I started a 
> Spark Standalone cluster in which both master and the only worker are in the 
> same node. 
>  
> Then I started the connect server by 
> {code:java}
> start-connect-server.sh \
>     --master spark://10.19.183.93:7077 \
>     --packages org.apache.spark:spark-connect_2.12:3.5.0 \
>     --conf spark.executor.cores=12 \
>     --conf spark.task.cpus=1 \
>     --executor-memory 30G \
>     --conf spark.executor.resource.gpu.amount=1 \
>     --conf spark.task.resource.gpu.amount=0.08 \
>     --driver-memory 1G{code}
>  
> I can 100% ensure the spark standalone cluster, the connect server and spark 
> driver are started observed from the webui.
>  
> Finally, I tried to run a very simple spark job 
> (spark.range(100).filter("id>2").collect()) from spark-connect-client using 
> pyspark, but I got the below error.
>  
> _pyspark --remote sc://localhost_
> _Python 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] on linux_
> _Type "help", "copyright", "credits" or "license" for more information._
> _Welcome to_
>       _              ___
>      _/ __/_  {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_
>     {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/  '{_}/{_}
>    {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\   version 3.5.0{_}
>       {_}/{_}/_
>  
> _Using Python version 3.10.0 (default, Mar  3 2022 09:58:08)_
> _Client connected to the Spark Connect server at localhost_
> _SparkSession available as 'spark'._
> _>>> spark.range(100).filter("id > 3").collect()_
> _Traceback (most recent call last):_
>   _File "", line 1, in _
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py",
>  line 1645, in collect_
>     _table, schema = self._session.client.to_table(query)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 858, in to_table_
>     _table, schema, _, _, _ = self._execute_and_fetch(req)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1282, in _execute_and_fetch_
>     _for response in self._execute_and_fetch_as_iterator(req):_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1263, in _execute_and_fetch_as_iterator_
>     _self._handle_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1502, in _handle_error_
>     _self._handle_rpc_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1538, in _handle_rpc_error_
>     _raise convert_exception(info, status.message) from None_
> _pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot 
> assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD_
> _at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_
> _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at 

[jira] [Comment Edited] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f

2023-11-21 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788380#comment-17788380
 ] 

Hyukjin Kwon edited comment on SPARK-46032 at 11/21/23 11:34 AM:
-

What's your Scala and JDK versions [~wbo4958]?


was (Author: gurwls223):
What's your Scala version [~wbo4958]?

> connect: cannot assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f
> -
>
> Key: SPARK-46032
> URL: https://issues.apache.org/jira/browse/SPARK-46032
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Bobby Wang
>Priority: Major
>
> I downloaded spark 3.5 from the spark official website, and then I started a 
> Spark Standalone cluster in which both master and the only worker are in the 
> same node. 
>  
> Then I started the connect server by 
> {code:java}
> start-connect-server.sh \
>     --master spark://10.19.183.93:7077 \
>     --packages org.apache.spark:spark-connect_2.12:3.5.0 \
>     --conf spark.executor.cores=12 \
>     --conf spark.task.cpus=1 \
>     --executor-memory 30G \
>     --conf spark.executor.resource.gpu.amount=1 \
>     --conf spark.task.resource.gpu.amount=0.08 \
>     --driver-memory 1G{code}
>  
> I can 100% ensure the spark standalone cluster, the connect server and spark 
> driver are started observed from the webui.
>  
> Finally, I tried to run a very simple spark job 
> (spark.range(100).filter("id>2").collect()) from spark-connect-client using 
> pyspark, but I got the below error.
>  
> _pyspark --remote sc://localhost_
> _Python 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] on linux_
> _Type "help", "copyright", "credits" or "license" for more information._
> _Welcome to_
>       _              ___
>      _/ __/_  {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_
>     {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/  '{_}/{_}
>    {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\   version 3.5.0{_}
>       {_}/{_}/_
>  
> _Using Python version 3.10.0 (default, Mar  3 2022 09:58:08)_
> _Client connected to the Spark Connect server at localhost_
> _SparkSession available as 'spark'._
> _>>> spark.range(100).filter("id > 3").collect()_
> _Traceback (most recent call last):_
>   _File "", line 1, in _
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py",
>  line 1645, in collect_
>     _table, schema = self._session.client.to_table(query)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 858, in to_table_
>     _table, schema, _, _, _ = self._execute_and_fetch(req)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1282, in _execute_and_fetch_
>     _for response in self._execute_and_fetch_as_iterator(req):_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1263, in _execute_and_fetch_as_iterator_
>     _self._handle_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1502, in _handle_error_
>     _self._handle_rpc_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1538, in _handle_rpc_error_
>     _raise convert_exception(info, status.message) from None_
> _pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot 
> assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD_
> _at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_
> _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_

[jira] [Updated] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46032:
-
Priority: Major  (was: Blocker)

> connect: cannot assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f
> -
>
> Key: SPARK-46032
> URL: https://issues.apache.org/jira/browse/SPARK-46032
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Bobby Wang
>Priority: Major
>
> I downloaded spark 3.5 from the spark official website, and then I started a 
> Spark Standalone cluster in which both master and the only worker are in the 
> same node. 
>  
> Then I started the connect server by 
> {code:java}
> start-connect-server.sh \
>     --master spark://10.19.183.93:7077 \
>     --packages org.apache.spark:spark-connect_2.12:3.5.0 \
>     --conf spark.executor.cores=12 \
>     --conf spark.task.cpus=1 \
>     --executor-memory 30G \
>     --conf spark.executor.resource.gpu.amount=1 \
>     --conf spark.task.resource.gpu.amount=0.08 \
>     --driver-memory 1G{code}
>  
> I can 100% ensure the spark standalone cluster, the connect server and spark 
> driver are started observed from the webui.
>  
> Finally, I tried to run a very simple spark job 
> (spark.range(100).filter("id>2").collect()) from spark-connect-client using 
> pyspark, but I got the below error.
>  
> _pyspark --remote sc://localhost_
> _Python 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] on linux_
> _Type "help", "copyright", "credits" or "license" for more information._
> _Welcome to_
>       _              ___
>      _/ __/_  {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_
>     {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/  '{_}/{_}
>    {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\   version 3.5.0{_}
>       {_}/{_}/_
>  
> _Using Python version 3.10.0 (default, Mar  3 2022 09:58:08)_
> _Client connected to the Spark Connect server at localhost_
> _SparkSession available as 'spark'._
> _>>> spark.range(100).filter("id > 3").collect()_
> _Traceback (most recent call last):_
>   _File "", line 1, in _
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py",
>  line 1645, in collect_
>     _table, schema = self._session.client.to_table(query)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 858, in to_table_
>     _table, schema, _, _, _ = self._execute_and_fetch(req)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1282, in _execute_and_fetch_
>     _for response in self._execute_and_fetch_as_iterator(req):_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1263, in _execute_and_fetch_as_iterator_
>     _self._handle_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1502, in _handle_error_
>     _self._handle_rpc_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1538, in _handle_rpc_error_
>     _raise convert_exception(info, status.message) from None_
> _pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot 
> assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD_
> _at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_
> _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)_
> _at 

[jira] [Assigned] (SPARK-46023) Annotate parameters at docstrings in pyspark.sql module

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46023:


Assignee: Hyukjin Kwon

> Annotate parameters at docstrings in pyspark.sql module
> ---
>
> Key: SPARK-46023
> URL: https://issues.apache.org/jira/browse/SPARK-46023
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> See PR



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46023) Annotate parameters at docstrings in pyspark.sql module

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46023.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43925
[https://github.com/apache/spark/pull/43925]

> Annotate parameters at docstrings in pyspark.sql module
> ---
>
> Key: SPARK-46023
> URL: https://issues.apache.org/jira/browse/SPARK-46023
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> See PR



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46026) Refine docstring of UDTF

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46026:


Assignee: Hyukjin Kwon

> Refine docstring of UDTF
> 
>
> Key: SPARK-46026
> URL: https://issues.apache.org/jira/browse/SPARK-46026
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46026) Refine docstring of UDTF

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46026.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43928
[https://github.com/apache/spark/pull/43928]

> Refine docstring of UDTF
> 
>
> Key: SPARK-46026
> URL: https://issues.apache.org/jira/browse/SPARK-46026
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46024) Document parameters and examples for RuntimeConf get, set and unset

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46024:


Assignee: Hyukjin Kwon

> Document parameters and examples for RuntimeConf get, set and unset
> ---
>
> Key: SPARK-46024
> URL: https://issues.apache.org/jira/browse/SPARK-46024
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46024) Document parameters and examples for RuntimeConf get, set and unset

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46024.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43927
[https://github.com/apache/spark/pull/43927]

> Document parameters and examples for RuntimeConf get, set and unset
> ---
>
> Key: SPARK-46024
> URL: https://issues.apache.org/jira/browse/SPARK-46024
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46027) Add `Python 3.12` to the Daily Python Github Action job

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46027.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43929
[https://github.com/apache/spark/pull/43929]

> Add `Python 3.12` to the Daily Python Github Action job
> ---
>
> Key: SPARK-46027
> URL: https://issues.apache.org/jira/browse/SPARK-46027
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46027) Add `Python 3.12` to the Daily Python Github Action job

2023-11-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46027:


 Summary: Add `Python 3.12` to the Daily Python Github Action job
 Key: SPARK-46027
 URL: https://issues.apache.org/jira/browse/SPARK-46027
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46004) Refine docstring of `DataFrame.dropna/fillna/replace`

2023-11-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46004:


Assignee: BingKun Pan

> Refine docstring of `DataFrame.dropna/fillna/replace`
> -
>
> Key: SPARK-46004
> URL: https://issues.apache.org/jira/browse/SPARK-46004
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46004) Refine docstring of `DataFrame.dropna/fillna/replace`

2023-11-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46004.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43907
[https://github.com/apache/spark/pull/43907]

> Refine docstring of `DataFrame.dropna/fillna/replace`
> -
>
> Key: SPARK-46004
> URL: https://issues.apache.org/jira/browse/SPARK-46004
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46026) Refine docstring of UDTF

2023-11-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46026:


 Summary: Refine docstring of UDTF
 Key: SPARK-46026
 URL: https://issues.apache.org/jira/browse/SPARK-46026
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-46025) Support Python 3.12 in PySpark

2023-11-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon deleted SPARK-46025:
-


> Support Python 3.12 in PySpark
> --
>
> Key: SPARK-46025
> URL: https://issues.apache.org/jira/browse/SPARK-46025
> Project: Spark
>  Issue Type: Improvement
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Python 3.12 is released out. We should make sure the tests pass, and mark it 
> supported in setup.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46025) Support Python 3.12 in PySpark

2023-11-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46025:
-
Issue Type: Improvement  (was: Bug)

> Support Python 3.12 in PySpark
> --
>
> Key: SPARK-46025
> URL: https://issues.apache.org/jira/browse/SPARK-46025
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Python 3.12 is released out. We should make sure the tests pass, and mark it 
> supported in setup.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46025) Support Python 3.12 in PySpark

2023-11-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46025:


 Summary: Support Python 3.12 in PySpark
 Key: SPARK-46025
 URL: https://issues.apache.org/jira/browse/SPARK-46025
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Python 3.12 is released out. We should make sure the tests pass, and mark it 
supported in setup.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    5   6   7   8   9   10   11   12   13   14   >