[jira] [Updated] (SPARK-46149) Skip `TorchDistributorLocalUnitTests.test_end_to_end_run_locally` with Python 3.12
[ https://issues.apache.org/jira/browse/SPARK-46149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46149: - Description: {code} == ERROR [12.635s]: test_end_to_end_run_locally (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 403, in test_end_to_end_run_locally output = TorchDistributor(num_processes=2, local_mode=True, use_gpu=False).run( ^^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, in run return self._run( ^^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, in _run output = self._run_local_training( ^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, in _run_local_training output = TorchDistributor._get_output_from_framework_wrapper( File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, in _get_output_from_framework_wrapper return framework_wrapper( ^^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, in _run_training_on_pytorch_function raise RuntimeError( RuntimeError: TorchDistributor failed during training.View stdout logs for detailed error message. == ERROR [14.850s]: test_end_to_end_run_locally (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 403, in test_end_to_end_run_locally output = TorchDistributor(num_processes=2, local_mode=True, use_gpu=False).run( ^^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, in run return self._run( ^^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, in _run output = self._run_local_training( ^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, in _run_local_training output = TorchDistributor._get_output_from_framework_wrapper( File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, in _get_output_from_framework_wrapper return framework_wrapper( ^^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, in _run_training_on_pytorch_function raise RuntimeError( RuntimeError: TorchDistributor failed during training.View stdout logs for detailed error message. -- {code} https://github.com/apache/spark/actions/runs/7020654429/job/19100964890 was: {code} == ERROR [12.635s]: test_end_to_end_run_locally (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 403, in test_end_to_end_run_locally output = TorchDistributor(num_processes=2, local_mode=True, use_gpu=False).run( ^^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, in run return self._run( ^^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, in _run output = self._run_local_training( ^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, in _run_local_training output = TorchDistributor._get_output_from_framework_wrapper( File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, in _get_output_from_framework_wrapper return framework_wrapper( ^^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, in _run_training_on_pytorch_function
[jira] [Updated] (SPARK-46149) Skip `TorchDistributorLocalUnitTests.test_end_to_end_run_locally` with Python 3.12
[ https://issues.apache.org/jira/browse/SPARK-46149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46149: - Summary: Skip `TorchDistributorLocalUnitTests.test_end_to_end_run_locally` with Python 3.12 (was: Skip `TorchDistributorLocalUnitTests.test_end_to_end_run_locally`) > Skip `TorchDistributorLocalUnitTests.test_end_to_end_run_locally` with Python > 3.12 > -- > > Key: SPARK-46149 > URL: https://issues.apache.org/jira/browse/SPARK-46149 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > {code} > == > ERROR [12.635s]: test_end_to_end_run_locally > (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally) > -- > Traceback (most recent call last): > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 403, in test_end_to_end_run_locally > output = TorchDistributor(num_processes=2, local_mode=True, > use_gpu=False).run( > > ^^ > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, > in run > return self._run( >^^ > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, > in _run > output = self._run_local_training( > ^ > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, > in _run_local_training > output = TorchDistributor._get_output_from_framework_wrapper( > > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, > in _get_output_from_framework_wrapper > return framework_wrapper( >^^ > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, > in _run_training_on_pytorch_function > raise RuntimeError( > RuntimeError: TorchDistributor failed during training.View stdout logs for > detailed error message. > == > ERROR [14.850s]: test_end_to_end_run_locally > (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally) > -- > Traceback (most recent call last): > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 403, in test_end_to_end_run_locally > output = TorchDistributor(num_processes=2, local_mode=True, > use_gpu=False).run( > > ^^ > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, > in run > return self._run( >^^ > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, > in _run > output = self._run_local_training( > ^ > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, > in _run_local_training > output = TorchDistributor._get_output_from_framework_wrapper( > > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, > in _get_output_from_framework_wrapper > return framework_wrapper( >^^ > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, > in _run_training_on_pytorch_function > raise RuntimeError( > RuntimeError: TorchDistributor failed during training.View stdout logs for > detailed error message. > -- > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46149) Skip `TorchDistributorLocalUnitTests.test_end_to_end_run_locally`
[ https://issues.apache.org/jira/browse/SPARK-46149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46149: - Priority: Minor (was: Major) > Skip `TorchDistributorLocalUnitTests.test_end_to_end_run_locally` > - > > Key: SPARK-46149 > URL: https://issues.apache.org/jira/browse/SPARK-46149 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > {code} > == > ERROR [12.635s]: test_end_to_end_run_locally > (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally) > -- > Traceback (most recent call last): > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 403, in test_end_to_end_run_locally > output = TorchDistributor(num_processes=2, local_mode=True, > use_gpu=False).run( > > ^^ > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, > in run > return self._run( >^^ > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, > in _run > output = self._run_local_training( > ^ > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, > in _run_local_training > output = TorchDistributor._get_output_from_framework_wrapper( > > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, > in _get_output_from_framework_wrapper > return framework_wrapper( >^^ > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, > in _run_training_on_pytorch_function > raise RuntimeError( > RuntimeError: TorchDistributor failed during training.View stdout logs for > detailed error message. > == > ERROR [14.850s]: test_end_to_end_run_locally > (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally) > -- > Traceback (most recent call last): > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 403, in test_end_to_end_run_locally > output = TorchDistributor(num_processes=2, local_mode=True, > use_gpu=False).run( > > ^^ > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, > in run > return self._run( >^^ > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, > in _run > output = self._run_local_training( > ^ > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, > in _run_local_training > output = TorchDistributor._get_output_from_framework_wrapper( > > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, > in _get_output_from_framework_wrapper > return framework_wrapper( >^^ > File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, > in _run_training_on_pytorch_function > raise RuntimeError( > RuntimeError: TorchDistributor failed during training.View stdout logs for > detailed error message. > -- > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46149) Skip `TorchDistributorLocalUnitTests.test_end_to_end_run_locally`
Hyukjin Kwon created SPARK-46149: Summary: Skip `TorchDistributorLocalUnitTests.test_end_to_end_run_locally` Key: SPARK-46149 URL: https://issues.apache.org/jira/browse/SPARK-46149 Project: Spark Issue Type: Sub-task Components: ML, PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} == ERROR [12.635s]: test_end_to_end_run_locally (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 403, in test_end_to_end_run_locally output = TorchDistributor(num_processes=2, local_mode=True, use_gpu=False).run( ^^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, in run return self._run( ^^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, in _run output = self._run_local_training( ^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, in _run_local_training output = TorchDistributor._get_output_from_framework_wrapper( File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, in _get_output_from_framework_wrapper return framework_wrapper( ^^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, in _run_training_on_pytorch_function raise RuntimeError( RuntimeError: TorchDistributor failed during training.View stdout logs for detailed error message. == ERROR [14.850s]: test_end_to_end_run_locally (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 403, in test_end_to_end_run_locally output = TorchDistributor(num_processes=2, local_mode=True, use_gpu=False).run( ^^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 969, in run return self._run( ^^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 985, in _run output = self._run_local_training( ^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 593, in _run_local_training output = TorchDistributor._get_output_from_framework_wrapper( File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 567, in _get_output_from_framework_wrapper return framework_wrapper( ^^ File "/__w/spark/spark/python/pyspark/ml/torch/distributor.py", line 908, in _run_training_on_pytorch_function raise RuntimeError( RuntimeError: TorchDistributor failed during training.View stdout logs for detailed error message. -- {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46148) Fix pyspark.pandas.mlflow.load_model test (Python 3.12)
[ https://issues.apache.org/jira/browse/SPARK-46148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46148: - Description: {code} ** File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 172, in pyspark.pandas.mlflow.load_model Failed example: prediction_df Exception raised: Traceback (most recent call last): File "/usr/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in prediction_df File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13291, in __repr__ pdf = cast("DataFrame", self._get_or_create_repr_pandas_cache(max_display_count)) File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13282, in _get_or_create_repr_pandas_cache self, "_repr_pandas_cache", {n: self.head(n + 1)._to_internal_pandas()} File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13277, in _to_internal_pandas return self._internal.to_pandas_frame File "/__w/spark/spark/python/pyspark/pandas/utils.py", line 599, in wrapped_lazy_property setattr(self, attr_name, fn(self)) File "/__w/spark/spark/python/pyspark/pandas/internal.py", line 1110, in to_pandas_frame pdf = sdf.toPandas() File "/__w/spark/spark/python/pyspark/sql/pandas/conversion.py", line 213, in toPandas rows = self.collect() File "/__w/spark/spark/python/pyspark/sql/dataframe.py", line 1369, in collect sock_info = self._jdf.collectToPython() File "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ return_value = get_return_value( File "/__w/spark/spark/python/pyspark/errors/exceptions/captured.py", line 188, in deco raise converted from None pyspark.errors.exceptions.captured.PythonException: An exception was thrown from the Python worker. Please see the stack trace below. Traceback (most recent call last): File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1523, in main process() File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1515, in process serializer.dump_stream(out_iter, outfile) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 485, in dump_stream return ArrowStreamSerializer.dump_stream(self, init_stream_yield_batches(), stream) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 101, in dump_stream for batch in iterator: File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 478, in init_stream_yield_batches for series in iterator: File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1284, in func for result_batch, result_type in result_iter: File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", line 1619, in udf yield _predict_row_batch(batch_predict_fn, row_batch_args) File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", line 1383, in _predict_row_batch result = predict_fn(pdf, params) File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", line 1601, in batch_predict_fn return loaded_model.predict(pdf, params=params) File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", line 491, in predict return _predict() File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", line 477, in _predict return self._predict_fn(data, params=params) File "/usr/local/lib/python3.10/dist-packages/mlflow/sklearn/__init__.py", line 517, in predict return self.sklearn_model.predict(data) File "/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_base.py", line 386, in predict return self._decision_function(X) File "/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_base.py", line 369, in _decision_function X = self._validate_data(X, accept_sparse=["csr", "csc", "coo"], reset=False) File "/usr/local/lib/python3.10/dist-packages/sklearn/base.py", line 580, in _validate_data self._check_feature_names(X, reset=reset) File "/usr/local/lib/python3.10/dist-packages/sklearn/base.py", line 507, in _check_feature_names raise ValueError(message) ValueError: The feature names should match those that were passed during fit. Feature names unseen at fit time: - 0 - 1 Feature names seen at fit time, yet now missing: - x1 - x2 JVM stacktrace: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 1.0 failed 1 times, most recent
[jira] [Created] (SPARK-46148) Fix pyspark.pandas.mlflow.load_model test (Python 3.12)
Hyukjin Kwon created SPARK-46148: Summary: Fix pyspark.pandas.mlflow.load_model test (Python 3.12) Key: SPARK-46148 URL: https://issues.apache.org/jira/browse/SPARK-46148 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} ** File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 172, in pyspark.pandas.mlflow.load_model Failed example: prediction_df Exception raised: Traceback (most recent call last): File "/usr/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "", line 1, in prediction_df File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13291, in __repr__ pdf = cast("DataFrame", self._get_or_create_repr_pandas_cache(max_display_count)) File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13282, in _get_or_create_repr_pandas_cache self, "_repr_pandas_cache", {n: self.head(n + 1)._to_internal_pandas()} File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13277, in _to_internal_pandas return self._internal.to_pandas_frame File "/__w/spark/spark/python/pyspark/pandas/utils.py", line 599, in wrapped_lazy_property setattr(self, attr_name, fn(self)) File "/__w/spark/spark/python/pyspark/pandas/internal.py", line 1110, in to_pandas_frame pdf = sdf.toPandas() File "/__w/spark/spark/python/pyspark/sql/pandas/conversion.py", line 213, in toPandas rows = self.collect() File "/__w/spark/spark/python/pyspark/sql/dataframe.py", line 1369, in collect sock_info = self._jdf.collectToPython() File "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ return_value = get_return_value( File "/__w/spark/spark/python/pyspark/errors/exceptions/captured.py", line 188, in deco raise converted from None pyspark.errors.exceptions.captured.PythonException: An exception was thrown from the Python worker. Please see the stack trace below. Traceback (most recent call last): File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1523, in main process() File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1515, in process serializer.dump_stream(out_iter, outfile) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 485, in dump_stream return ArrowStreamSerializer.dump_stream(self, init_stream_yield_batches(), stream) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 101, in dump_stream for batch in iterator: File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 478, in init_stream_yield_batches for series in iterator: File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1284, in func for result_batch, result_type in result_iter: File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", line 1619, in udf yield _predict_row_batch(batch_predict_fn, row_batch_args) File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", line 1383, in _predict_row_batch result = predict_fn(pdf, params) File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", line 1601, in batch_predict_fn return loaded_model.predict(pdf, params=params) File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", line 491, in predict return _predict() File "/usr/local/lib/python3.10/dist-packages/mlflow/pyfunc/__init__.py", line 477, in _predict return self._predict_fn(data, params=params) File "/usr/local/lib/python3.10/dist-packages/mlflow/sklearn/__init__.py", line 517, in predict return self.sklearn_model.predict(data) File "/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_base.py", line 386, in predict return self._decision_function(X) File "/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_base.py", line 369, in _decision_function X = self._validate_data(X, accept_sparse=["csr", "csc", "coo"], reset=False) File "/usr/local/lib/python3.10/dist-packages/sklearn/base.py", line 580, in _validate_data self._check_feature_names(X, reset=reset) File "/usr/local/lib/python3.10/dist-packages/sklearn/base.py", line 507, in _check_feature_names raise ValueError(message) ValueError: The feature names should match those that were passed during fit. Feature names unseen at fit time: - 0 - 1 Feature names seen
[jira] [Updated] (SPARK-46147) Fix the doctest in pyspark.pandas.series.Series.to_dict (Python 3.12)
[ https://issues.apache.org/jira/browse/SPARK-46147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46147: - Fix Version/s: (was: 4.0.0) > Fix the doctest in pyspark.pandas.series.Series.to_dict (Python 3.12) > - > > Key: SPARK-46147 > URL: https://issues.apache.org/jira/browse/SPARK-46147 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > > {code} > File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 2515, in > pyspark.pandas.frame.DataFrame.to_dict > Failed example: > df.to_dict(into=OrderedDict) > Expected: > OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', > OrderedDict([('row1', 0.5), ('row2', 0.75)]))]) > Got: > OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}), 'col2': > OrderedDict({'row1': 0.5, 'row2': 0.75})}) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46147) Fix the doctest in pyspark.pandas.series.Series.to_dict (Python 3.12)
[ https://issues.apache.org/jira/browse/SPARK-46147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46147: - Labels: (was: pull-request-available) > Fix the doctest in pyspark.pandas.series.Series.to_dict (Python 3.12) > - > > Key: SPARK-46147 > URL: https://issues.apache.org/jira/browse/SPARK-46147 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 4.0.0 > > > {code} > File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 2515, in > pyspark.pandas.frame.DataFrame.to_dict > Failed example: > df.to_dict(into=OrderedDict) > Expected: > OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', > OrderedDict([('row1', 0.5), ('row2', 0.75)]))]) > Got: > OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}), 'col2': > OrderedDict({'row1': 0.5, 'row2': 0.75})}) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46147) Fix the doctest in pyspark.pandas.series.Series.to_dict (Python 3.12)
[ https://issues.apache.org/jira/browse/SPARK-46147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46147: - Description: {code} File "/__w/spark/spark/python/pyspark/pandas/series.py", line 1633, in pyspark.pandas.series.Series.to_dict Failed example: s.to_dict(OrderedDict) Expected: OrderedDict([(0, 1), (1, 2), (2, 3), (3, 4)]) Got: OrderedDict({0: 1, 1: 2, 2: 3, 3: 4}) {code} was: {code} File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 2515, in pyspark.pandas.frame.DataFrame.to_dict Failed example: df.to_dict(into=OrderedDict) Expected: OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))]) Got: OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}), 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})}) {code} > Fix the doctest in pyspark.pandas.series.Series.to_dict (Python 3.12) > - > > Key: SPARK-46147 > URL: https://issues.apache.org/jira/browse/SPARK-46147 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > > {code} > File "/__w/spark/spark/python/pyspark/pandas/series.py", line 1633, in > pyspark.pandas.series.Series.to_dict > Failed example: > s.to_dict(OrderedDict) > Expected: > OrderedDict([(0, 1), (1, 2), (2, 3), (3, 4)]) > Got: > OrderedDict({0: 1, 1: 2, 2: 3, 3: 4}) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46147) Fix the doctest in pyspark.pandas.series.Series.to_dict (Python 3.12)
Hyukjin Kwon created SPARK-46147: Summary: Fix the doctest in pyspark.pandas.series.Series.to_dict (Python 3.12) Key: SPARK-46147 URL: https://issues.apache.org/jira/browse/SPARK-46147 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Assignee: Hyukjin Kwon Fix For: 4.0.0 {code} File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 2515, in pyspark.pandas.frame.DataFrame.to_dict Failed example: df.to_dict(into=OrderedDict) Expected: OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))]) Got: OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}), 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})}) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-46136) Do not combine adjacent Python UDFs if pythonExec is different
[ https://issues.apache.org/jira/browse/SPARK-46136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon deleted SPARK-46136: - > Do not combine adjacent Python UDFs if pythonExec is different > -- > > Key: SPARK-46136 > URL: https://issues.apache.org/jira/browse/SPARK-46136 > Project: Spark > Issue Type: Improvement >Reporter: Hyukjin Kwon >Priority: Major > > In {{ExtractPythonUDFs}}, we are combining the adjacent Python UDFs and run > them in one Python worker which should not happen if pythonExec is different. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46136) Do not combine adjacent Python UDFs if pythonExec is different
Hyukjin Kwon created SPARK-46136: Summary: Do not combine adjacent Python UDFs if pythonExec is different Key: SPARK-46136 URL: https://issues.apache.org/jira/browse/SPARK-46136 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon In {{ExtractPythonUDFs}}, we are combining the adjacent Python UDFs and run them in one Python worker which should not happen if pythonExec is different. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32407) Remove upperbound of Sphinx version
[ https://issues.apache.org/jira/browse/SPARK-32407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-32407. -- Fix Version/s: 4.0.0 Assignee: Ruifeng Zheng Resolution: Fixed https://github.com/apache/spark/pull/44046 > Remove upperbound of Sphinx version > --- > > Key: SPARK-32407 > URL: https://issues.apache.org/jira/browse/SPARK-32407 > Project: Spark > Issue Type: Improvement > Components: Documentation, PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Sphinx 3.1+ does not correctly indexes nested classes. See also > https://github.com/sphinx-doc/sphinx/issues/7551. We should remove the > upperbound of sphinx version once that issue is fixed in Sphinx. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46127) Flaky `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` with Python 3.12
[ https://issues.apache.org/jira/browse/SPARK-46127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46127: Assignee: Hyukjin Kwon > Flaky > `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` > with Python 3.12 > --- > > Key: SPARK-46127 > URL: https://issues.apache.org/jira/browse/SPARK-46127 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > {code} > Traceback (most recent call last): > File "/__w/spark/spark/python/pyspark/tests/test_worker.py", line 241, in > test_python_segfault > self.sc.parallelize([1]).map(lambda x: f()).count() > File "/__w/spark/spark/python/pyspark/rdd.py", line 2315, in count > return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() >^^^ > File "/__w/spark/spark/python/pyspark/rdd.py", line 2290, in sum > return self.mapPartitions(lambda x: [sum(x)]).fold( # type: > ignore[return-value] > > ^^ > File "/__w/spark/spark/python/pyspark/rdd.py", line 2043, in fold > vals = self.mapPartitions(func).collect() >^^ > File "/__w/spark/spark/python/pyspark/rdd.py", line 1832, in collect > sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) > ^ > File > "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", > line 1322, in __call__ > return_value = get_return_value( >^ > File "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", > line 326, in get_return_value > raise Py4JJavaError( > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.collectAndServe. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 > (TID 0) (localhost executor driver): org.apache.spark.SparkException: Python > worker exited unexpectedly (crashed) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:560) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:535) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35) > at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:863) > at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:843) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:473) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.mutable.Growable.addAll(Growable.scala:61) > at scala.collection.mutable.Growable.addAll$(Growable.scala:57) > at scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:67) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) > at py4j.Gateway.invoke(Gateway.java:282) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at > py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) > at py4j.ClientServerConnection.run(ClientServerConnection.java:106) > at java.base/java.lang.Thread.run(Thread.java:840) > Caused by: org.apache.spark.SparkException: Python worker exited unexpectedly > (crashed) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:560) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:535) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35) > at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:863) > at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:843) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:473) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at
[jira] [Resolved] (SPARK-46131) Install torchvision for Python 3.12 build
[ https://issues.apache.org/jira/browse/SPARK-46131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46131. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44045 [https://github.com/apache/spark/pull/44045] > Install torchvision for Python 3.12 build > - > > Key: SPARK-46131 > URL: https://issues.apache.org/jira/browse/SPARK-46131 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > == > ERROR [0.001s]: test_end_to_end_run_distributedly > (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorDistributedUnitTestsOnConnect.test_end_to_end_run_distributedly) > -- > Traceback (most recent call last): > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 495, in test_end_to_end_run_distributedly > train_fn = create_training_function(self.mnist_dir_path) >^ > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 60, in create_training_function > from torchvision import transforms, datasets > ModuleNotFoundError: No module named 'torchvision' > == > ERROR [0.001s]: test_end_to_end_run_locally > (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally) > -- > Traceback (most recent call last): > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 402, in test_end_to_end_run_locally > train_fn = create_training_function(self.mnist_dir_path) >^ > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 60, in create_training_function > from torchvision import transforms, datasets > ModuleNotFoundError: No module named 'torchvision' > == > ERROR [0.001s]: test_end_to_end_run_locally > (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally) > -- > Traceback (most recent call last): > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 402, in test_end_to_end_run_locally > train_fn = create_training_function(self.mnist_dir_path) >^ > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 60, in create_training_function > from torchvision import transforms, datasets > ModuleNotFoundError: No module named 'torchvision' > -- > Ran 23 tests in 50.860s > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46127) Flaky `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` with Python 3.12
[ https://issues.apache.org/jira/browse/SPARK-46127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46127. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44044 [https://github.com/apache/spark/pull/44044] > Flaky > `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` > with Python 3.12 > --- > > Key: SPARK-46127 > URL: https://issues.apache.org/jira/browse/SPARK-46127 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > Traceback (most recent call last): > File "/__w/spark/spark/python/pyspark/tests/test_worker.py", line 241, in > test_python_segfault > self.sc.parallelize([1]).map(lambda x: f()).count() > File "/__w/spark/spark/python/pyspark/rdd.py", line 2315, in count > return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() >^^^ > File "/__w/spark/spark/python/pyspark/rdd.py", line 2290, in sum > return self.mapPartitions(lambda x: [sum(x)]).fold( # type: > ignore[return-value] > > ^^ > File "/__w/spark/spark/python/pyspark/rdd.py", line 2043, in fold > vals = self.mapPartitions(func).collect() >^^ > File "/__w/spark/spark/python/pyspark/rdd.py", line 1832, in collect > sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) > ^ > File > "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", > line 1322, in __call__ > return_value = get_return_value( >^ > File "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", > line 326, in get_return_value > raise Py4JJavaError( > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.collectAndServe. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 > (TID 0) (localhost executor driver): org.apache.spark.SparkException: Python > worker exited unexpectedly (crashed) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:560) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:535) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35) > at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:863) > at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:843) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:473) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.mutable.Growable.addAll(Growable.scala:61) > at scala.collection.mutable.Growable.addAll$(Growable.scala:57) > at scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:67) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) > at py4j.Gateway.invoke(Gateway.java:282) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at > py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) > at py4j.ClientServerConnection.run(ClientServerConnection.java:106) > at java.base/java.lang.Thread.run(Thread.java:840) > Caused by: org.apache.spark.SparkException: Python worker exited unexpectedly > (crashed) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:560) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:535) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35) > at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:863) > at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:843) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:473) > at >
[jira] [Assigned] (SPARK-46131) Install torchvision for Python 3.12 build
[ https://issues.apache.org/jira/browse/SPARK-46131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46131: Assignee: Hyukjin Kwon > Install torchvision for Python 3.12 build > - > > Key: SPARK-46131 > URL: https://issues.apache.org/jira/browse/SPARK-46131 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > == > ERROR [0.001s]: test_end_to_end_run_distributedly > (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorDistributedUnitTestsOnConnect.test_end_to_end_run_distributedly) > -- > Traceback (most recent call last): > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 495, in test_end_to_end_run_distributedly > train_fn = create_training_function(self.mnist_dir_path) >^ > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 60, in create_training_function > from torchvision import transforms, datasets > ModuleNotFoundError: No module named 'torchvision' > == > ERROR [0.001s]: test_end_to_end_run_locally > (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally) > -- > Traceback (most recent call last): > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 402, in test_end_to_end_run_locally > train_fn = create_training_function(self.mnist_dir_path) >^ > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 60, in create_training_function > from torchvision import transforms, datasets > ModuleNotFoundError: No module named 'torchvision' > == > ERROR [0.001s]: test_end_to_end_run_locally > (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally) > -- > Traceback (most recent call last): > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 402, in test_end_to_end_run_locally > train_fn = create_training_function(self.mnist_dir_path) >^ > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 60, in create_training_function > from torchvision import transforms, datasets > ModuleNotFoundError: No module named 'torchvision' > -- > Ran 23 tests in 50.860s > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46131) Install torchvision for Python 3.12 build
[ https://issues.apache.org/jira/browse/SPARK-46131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46131: - Description: {code} == ERROR [0.001s]: test_end_to_end_run_distributedly (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorDistributedUnitTestsOnConnect.test_end_to_end_run_distributedly) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 495, in test_end_to_end_run_distributedly train_fn = create_training_function(self.mnist_dir_path) ^ File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 60, in create_training_function from torchvision import transforms, datasets ModuleNotFoundError: No module named 'torchvision' == ERROR [0.001s]: test_end_to_end_run_locally (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 402, in test_end_to_end_run_locally train_fn = create_training_function(self.mnist_dir_path) ^ File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 60, in create_training_function from torchvision import transforms, datasets ModuleNotFoundError: No module named 'torchvision' == ERROR [0.001s]: test_end_to_end_run_locally (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 402, in test_end_to_end_run_locally train_fn = create_training_function(self.mnist_dir_path) ^ File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 60, in create_training_function from torchvision import transforms, datasets ModuleNotFoundError: No module named 'torchvision' -- Ran 23 tests in 50.860s {code} was: {code} == ERROR [0.001s]: test_end_to_end_run_distributedly (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorDistributedUnitTestsOnConnect.test_end_to_end_run_distributedly) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 495, in test_end_to_end_run_distributedly train_fn = create_training_function(self.mnist_dir_path) ^ File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 60, in create_training_function from torchvision import transforms, datasets ModuleNotFoundError: No module named 'torchvision' == ERROR [0.001s]: test_end_to_end_run_locally (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 402, in test_end_to_end_run_locally train_fn = create_training_function(self.mnist_dir_path) ^ File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 60, in create_training_function from torchvision import transforms, datasets ModuleNotFoundError: No module named 'torchvision' == ERROR [0.001s]: test_end_to_end_run_locally (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 402, in test_end_to_end_run_locally train_fn = create_training_function(self.mnist_dir_path)
[jira] [Created] (SPARK-46131) Install torchvision for Python 3.12 build
Hyukjin Kwon created SPARK-46131: Summary: Install torchvision for Python 3.12 build Key: SPARK-46131 URL: https://issues.apache.org/jira/browse/SPARK-46131 Project: Spark Issue Type: Sub-task Components: Project Infra Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} == ERROR [0.001s]: test_end_to_end_run_distributedly (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorDistributedUnitTestsOnConnect.test_end_to_end_run_distributedly) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 495, in test_end_to_end_run_distributedly train_fn = create_training_function(self.mnist_dir_path) ^ File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 60, in create_training_function from torchvision import transforms, datasets ModuleNotFoundError: No module named 'torchvision' == ERROR [0.001s]: test_end_to_end_run_locally (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsIIOnConnect.test_end_to_end_run_locally) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 402, in test_end_to_end_run_locally train_fn = create_training_function(self.mnist_dir_path) ^ File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 60, in create_training_function from torchvision import transforms, datasets ModuleNotFoundError: No module named 'torchvision' == ERROR [0.001s]: test_end_to_end_run_locally (pyspark.ml.tests.connect.test_parity_torch_distributor.TorchDistributorLocalUnitTestsOnConnect.test_end_to_end_run_locally) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 402, in test_end_to_end_run_locally train_fn = create_training_function(self.mnist_dir_path) ^ File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", line 60, in create_training_function from torchvision import transforms, datasets ModuleNotFoundError: No module named 'torchvision' -- Ran 23 tests in 50.860s [code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46130) Reeanble `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` with Python 3.12
[ https://issues.apache.org/jira/browse/SPARK-46130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46130: - Parent: SPARK-45981 Issue Type: Sub-task (was: Test) > Reeanble > `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` > with Python 3.12 > -- > > Key: SPARK-46130 > URL: https://issues.apache.org/jira/browse/SPARK-46130 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > Reeanble https://issues.apache.org/jira/browse/SPARK-46127. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46130) Reeanble `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` with Python 3.12
Hyukjin Kwon created SPARK-46130: Summary: Reeanble `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` with Python 3.12 Key: SPARK-46130 URL: https://issues.apache.org/jira/browse/SPARK-46130 Project: Spark Issue Type: Test Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Reeanble https://issues.apache.org/jira/browse/SPARK-46127. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46126) Fix the doctest in pyspark.pandas.frame.DataFrame.to_dict (Python 3.12)
[ https://issues.apache.org/jira/browse/SPARK-46126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46126: Assignee: Hyukjin Kwon > Fix the doctest in pyspark.pandas.frame.DataFrame.to_dict (Python 3.12) > --- > > Key: SPARK-46126 > URL: https://issues.apache.org/jira/browse/SPARK-46126 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > {code} > File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 2515, in > pyspark.pandas.frame.DataFrame.to_dict > Failed example: > df.to_dict(into=OrderedDict) > Expected: > OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', > OrderedDict([('row1', 0.5), ('row2', 0.75)]))]) > Got: > OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}), 'col2': > OrderedDict({'row1': 0.5, 'row2': 0.75})}) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46126) Fix the doctest in pyspark.pandas.frame.DataFrame.to_dict (Python 3.12)
[ https://issues.apache.org/jira/browse/SPARK-46126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46126. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44042 [https://github.com/apache/spark/pull/44042] > Fix the doctest in pyspark.pandas.frame.DataFrame.to_dict (Python 3.12) > --- > > Key: SPARK-46126 > URL: https://issues.apache.org/jira/browse/SPARK-46126 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 2515, in > pyspark.pandas.frame.DataFrame.to_dict > Failed example: > df.to_dict(into=OrderedDict) > Expected: > OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', > OrderedDict([('row1', 0.5), ('row2', 0.75)]))]) > Got: > OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}), 'col2': > OrderedDict({'row1': 0.5, 'row2': 0.75})}) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46111) Add copyright to the PySpark official documentation.
[ https://issues.apache.org/jira/browse/SPARK-46111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46111. -- Fix Version/s: 4.0.0 Assignee: Haejoon Lee Resolution: Fixed Fixed in https://github.com/apache/spark/pull/44026 > Add copyright to the PySpark official documentation. > > > Key: SPARK-46111 > URL: https://issues.apache.org/jira/browse/SPARK-46111 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add copyright to the PySpark official documentation by using Sphinx extension. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46127) Flaky `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` with Python 3.12
Hyukjin Kwon created SPARK-46127: Summary: Flaky `pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault` with Python 3.12 Key: SPARK-46127 URL: https://issues.apache.org/jira/browse/SPARK-46127 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/tests/test_worker.py", line 241, in test_python_segfault self.sc.parallelize([1]).map(lambda x: f()).count() File "/__w/spark/spark/python/pyspark/rdd.py", line 2315, in count return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() ^^^ File "/__w/spark/spark/python/pyspark/rdd.py", line 2290, in sum return self.mapPartitions(lambda x: [sum(x)]).fold( # type: ignore[return-value] ^^ File "/__w/spark/spark/python/pyspark/rdd.py", line 2043, in fold vals = self.mapPartitions(func).collect() ^^ File "/__w/spark/spark/python/pyspark/rdd.py", line 1832, in collect sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) ^ File "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ return_value = get_return_value( ^ File "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (localhost executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:560) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:535) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:863) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:843) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:473) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.mutable.Growable.addAll(Growable.scala:61) at scala.collection.mutable.Growable.addAll$(Growable.scala:57) at scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:67) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.base/java.lang.Thread.run(Thread.java:840) Caused by: org.apache.spark.SparkException: Python worker exited unexpectedly (crashed) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:560) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:535) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:863) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:843) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:473) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.mutable.Growable.addAll(Growable.scala:61) at scala.collection.mutable.Growable.addAll$(Growable.scala:57) at scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:67) at scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1346) at scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1339) at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) at
[jira] [Created] (SPARK-46126) Fix the doctest in pyspark.pandas.frame.DataFrame.to_dict (Python 3.12)
Hyukjin Kwon created SPARK-46126: Summary: Fix the doctest in pyspark.pandas.frame.DataFrame.to_dict (Python 3.12) Key: SPARK-46126 URL: https://issues.apache.org/jira/browse/SPARK-46126 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 2515, in pyspark.pandas.frame.DataFrame.to_dict Failed example: df.to_dict(into=OrderedDict) Expected: OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])), ('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))]) Got: OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}), 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})}) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46121) Refine docstring of `concat/array_position/element_at/try_element_at`
[ https://issues.apache.org/jira/browse/SPARK-46121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46121. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44039 [https://github.com/apache/spark/pull/44039] > Refine docstring of `concat/array_position/element_at/try_element_at` > - > > Key: SPARK-46121 > URL: https://issues.apache.org/jira/browse/SPARK-46121 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46121) Refine docstring of `concat/array_position/element_at/try_element_at`
[ https://issues.apache.org/jira/browse/SPARK-46121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46121: Assignee: Yang Jie > Refine docstring of `concat/array_position/element_at/try_element_at` > - > > Key: SPARK-46121 > URL: https://issues.apache.org/jira/browse/SPARK-46121 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46115) Restrict charsets in encode()
[ https://issues.apache.org/jira/browse/SPARK-46115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46115. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44020 [https://github.com/apache/spark/pull/44020] > Restrict charsets in encode() > - > > Key: SPARK-46115 > URL: https://issues.apache.org/jira/browse/SPARK-46115 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently the list of supported charsets in encode() is not stable and fully > depends on the used JDK version. So, sometimes user code might not work > because a devop changed Java version in Spark cluster. The ticket aims to > restrict the list of supported charsets by: > {code} > 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45957) SQL on streaming Temp view fails
[ https://issues.apache.org/jira/browse/SPARK-45957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45957: Assignee: Jean-Francois Desjeans Gauthier > SQL on streaming Temp view fails > > > Key: SPARK-45957 > URL: https://issues.apache.org/jira/browse/SPARK-45957 > Project: Spark > Issue Type: Bug > Components: Connect, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Raghu Angadi >Assignee: Jean-Francois Desjeans Gauthier >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > The following code fails in the last step with Spark Connect. > The root cause is that Connect server triggers physical plan on a streaming > Dataframe [in > SparkConnectPlanner.scala|https://github.com/apache/spark/blob/334d952f9555cbfad8ef84987d6f978eb6b37b9b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala#L2591]. > Better to avoid that entirely, but at least for streaming it should be > avoided since it cannot be done with a batch execution engine. > {code:java} > df = spark.readStream.format("rate").option("numPartitions", "1").load() > df.createOrReplaceTempView("temp_view") > view_df = spark.sql("SELECT * FROM temp_view") // FAILS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45957) SQL on streaming Temp view fails
[ https://issues.apache.org/jira/browse/SPARK-45957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45957. -- Resolution: Fixed Issue resolved by pull request 43851 [https://github.com/apache/spark/pull/43851] > SQL on streaming Temp view fails > > > Key: SPARK-45957 > URL: https://issues.apache.org/jira/browse/SPARK-45957 > Project: Spark > Issue Type: Bug > Components: Connect, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Raghu Angadi >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > The following code fails in the last step with Spark Connect. > The root cause is that Connect server triggers physical plan on a streaming > Dataframe [in > SparkConnectPlanner.scala|https://github.com/apache/spark/blob/334d952f9555cbfad8ef84987d6f978eb6b37b9b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala#L2591]. > Better to avoid that entirely, but at least for streaming it should be > avoided since it cannot be done with a batch execution engine. > {code:java} > df = spark.readStream.format("rate").option("numPartitions", "1").load() > df.createOrReplaceTempView("temp_view") > view_df = spark.sql("SELECT * FROM temp_view") // FAILS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46110) Use error classes in catalog, conf, connect, observation, pandas modules
[ https://issues.apache.org/jira/browse/SPARK-46110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46110: Assignee: Hyukjin Kwon > Use error classes in catalog, conf, connect, observation, pandas modules > > > Key: SPARK-46110 > URL: https://issues.apache.org/jira/browse/SPARK-46110 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46110) Use error classes in catalog, conf, connect, observation, pandas modules
[ https://issues.apache.org/jira/browse/SPARK-46110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46110. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44024 [https://github.com/apache/spark/pull/44024] > Use error classes in catalog, conf, connect, observation, pandas modules > > > Key: SPARK-46110 > URL: https://issues.apache.org/jira/browse/SPARK-46110 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46120) Remove helper function DataFrame.withPlan
[ https://issues.apache.org/jira/browse/SPARK-46120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46120: Assignee: Ruifeng Zheng > Remove helper function DataFrame.withPlan > - > > Key: SPARK-46120 > URL: https://issues.apache.org/jira/browse/SPARK-46120 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46120) Remove helper function DataFrame.withPlan
[ https://issues.apache.org/jira/browse/SPARK-46120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46120. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44037 [https://github.com/apache/spark/pull/44037] > Remove helper function DataFrame.withPlan > - > > Key: SPARK-46120 > URL: https://issues.apache.org/jira/browse/SPARK-46120 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46123) Using brighter color for document title for better visibility
[ https://issues.apache.org/jira/browse/SPARK-46123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46123: Assignee: Haejoon Lee > Using brighter color for document title for better visibility > - > > Key: SPARK-46123 > URL: https://issues.apache.org/jira/browse/SPARK-46123 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > With the increasing popularity of dark mode for its eye comfort and > energy-saving benefits, it's important to ensure that our documentation is > easily readable in both light and dark settings. The current title font color > in dark mode is not optimal for readability, which can hinder user > experience. By adjusting the color, we aim to enhance the overall > accessibility and readability of the PySpark documentation in dark mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46123) Using brighter color for document title for better visibility
[ https://issues.apache.org/jira/browse/SPARK-46123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46123. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44040 [https://github.com/apache/spark/pull/44040] > Using brighter color for document title for better visibility > - > > Key: SPARK-46123 > URL: https://issues.apache.org/jira/browse/SPARK-46123 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > With the increasing popularity of dark mode for its eye comfort and > energy-saving benefits, it's important to ensure that our documentation is > easily readable in both light and dark settings. The current title font color > in dark mode is not optimal for readability, which can hinder user > experience. By adjusting the color, we aim to enhance the overall > accessibility and readability of the PySpark documentation in dark mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46114) Define IndexError for PySpark error framework
[ https://issues.apache.org/jira/browse/SPARK-46114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46114: Assignee: Hyukjin Kwon > Define IndexError for PySpark error framework > - > > Key: SPARK-46114 > URL: https://issues.apache.org/jira/browse/SPARK-46114 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46114) Define IndexError for PySpark error framework
[ https://issues.apache.org/jira/browse/SPARK-46114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46114. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44028 [https://github.com/apache/spark/pull/44028] > Define IndexError for PySpark error framework > - > > Key: SPARK-46114 > URL: https://issues.apache.org/jira/browse/SPARK-46114 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46114) Define IndexError for PySpark error framework
Hyukjin Kwon created SPARK-46114: Summary: Define IndexError for PySpark error framework Key: SPARK-46114 URL: https://issues.apache.org/jira/browse/SPARK-46114 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46110) Use error classes in catalog, conf, connect, observation, pandas modules
Hyukjin Kwon created SPARK-46110: Summary: Use error classes in catalog, conf, connect, observation, pandas modules Key: SPARK-46110 URL: https://issues.apache.org/jira/browse/SPARK-46110 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-46109) Migrate to error classes in PySpark
[ https://issues.apache.org/jira/browse/SPARK-46109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon deleted SPARK-46109: - > Migrate to error classes in PySpark > --- > > Key: SPARK-46109 > URL: https://issues.apache.org/jira/browse/SPARK-46109 > Project: Spark > Issue Type: Umbrella >Reporter: Hyukjin Kwon >Priority: Major > > SPARK-41597 continues here to use error classes in PySpark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46109) Migrate to error classes in PySpark
[ https://issues.apache.org/jira/browse/SPARK-46109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46109: - > Migrate to error classes in PySpark > --- > > Key: SPARK-46109 > URL: https://issues.apache.org/jira/browse/SPARK-46109 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > SPARK-41597 continues here to use error classes in PySpark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46109) Migrate to error classes in PySpark
Hyukjin Kwon created SPARK-46109: Summary: Migrate to error classes in PySpark Key: SPARK-46109 URL: https://issues.apache.org/jira/browse/SPARK-46109 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon SPARK-41597 continues here to use error classes in PySpark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32933) Use keyword-only syntax for keyword_only methods
[ https://issues.apache.org/jira/browse/SPARK-32933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789874#comment-17789874 ] Hyukjin Kwon commented on SPARK-32933: -- Here the PR and JIRA: https://github.com/apache/spark/pull/44023 https://issues.apache.org/jira/browse/SPARK-46107 > Use keyword-only syntax for keyword_only methods > > > Key: SPARK-32933 > URL: https://issues.apache.org/jira/browse/SPARK-32933 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.1.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Minor > Fix For: 3.1.0 > > > Since 3.0, provides syntax for indicating keyword-only arguments ([PEP > 3102|https://www.python.org/dev/peps/pep-3102/]). > It is not a full replacement for our current usage of {{keyword_only}}, but > it would allow us to make our expectations explicit: > {code:python} > @keyword_only > def __init__(self, degree=2, inputCol=None, outputCol=None): > {code} > {code:python} > @keyword_only > def __init__(self, *, degree=2, inputCol=None, outputCol=None): > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46107) Deprecate pyspark.keyword_only API
Hyukjin Kwon created SPARK-46107: Summary: Deprecate pyspark.keyword_only API Key: SPARK-46107 URL: https://issues.apache.org/jira/browse/SPARK-46107 Project: Spark Issue Type: Improvement Components: ML, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon See https://issues.apache.org/jira/browse/SPARK-32933. We don't need this anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46074) [CONNECT][SCALA] Insufficient details in error when a UDF fails
[ https://issues.apache.org/jira/browse/SPARK-46074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46074. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43983 [https://github.com/apache/spark/pull/43983] > [CONNECT][SCALA] Insufficient details in error when a UDF fails > --- > > Key: SPARK-46074 > URL: https://issues.apache.org/jira/browse/SPARK-46074 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Niranjan Jayakar >Assignee: Niranjan Jayakar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, when a UDF fails the connect client does not receive the actual > error that caused the failure. > As an example, the error message looks like - > {code:java} > Exception in thread "main" org.apache.spark.SparkException: > grpc_shaded.io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to > stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost > task 2.3 in stage 0.0 (TID 10) (10.68.141.158 executor 0): > org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user > defined function (` (Main$$$Lambda$4770/1714264622)`: (int) => int). > SQLSTATE: 39000 {code} > In this case, the actual error was a {{{}java.lang.NoClassDefFoundError{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46074) [CONNECT][SCALA] Insufficient details in error when a UDF fails
[ https://issues.apache.org/jira/browse/SPARK-46074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46074: Assignee: Niranjan Jayakar > [CONNECT][SCALA] Insufficient details in error when a UDF fails > --- > > Key: SPARK-46074 > URL: https://issues.apache.org/jira/browse/SPARK-46074 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Niranjan Jayakar >Assignee: Niranjan Jayakar >Priority: Major > Labels: pull-request-available > > Currently, when a UDF fails the connect client does not receive the actual > error that caused the failure. > As an example, the error message looks like - > {code:java} > Exception in thread "main" org.apache.spark.SparkException: > grpc_shaded.io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to > stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost > task 2.3 in stage 0.0 (TID 10) (10.68.141.158 executor 0): > org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user > defined function (` (Main$$$Lambda$4770/1714264622)`: (int) => int). > SQLSTATE: 39000 {code} > In this case, the actual error was a {{{}java.lang.NoClassDefFoundError{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45922) Multiple policies follow-up (Python)
[ https://issues.apache.org/jira/browse/SPARK-45922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45922. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43800 [https://github.com/apache/spark/pull/43800] > Multiple policies follow-up (Python) > > > Key: SPARK-45922 > URL: https://issues.apache.org/jira/browse/SPARK-45922 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Alice Sayutina >Assignee: Alice Sayutina >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Minor further improvements for multiple policies work -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46016) Fix pandas API support list properly
[ https://issues.apache.org/jira/browse/SPARK-46016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46016. -- Fix Version/s: 3.4.2 4.0.0 3.5.1 Assignee: Haejoon Lee Resolution: Fixed Fixed in https://github.com/apache/spark/pull/43996 > Fix pandas API support list properly > > > Key: SPARK-46016 > URL: https://issues.apache.org/jira/browse/SPARK-46016 > Project: Spark > Issue Type: Bug > Components: Documentation, Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 3.4.2, 4.0.0, 3.5.1 > > > Currently Supported pandas API is not generated properly, so we should fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46082) Fix protobuf string representation for Pandas Functions API with Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-46082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46082: Assignee: Hyukjin Kwon > Fix protobuf string representation for Pandas Functions API with Spark Connect > -- > > Key: SPARK-46082 > URL: https://issues.apache.org/jira/browse/SPARK-46082 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > {code} > df = spark.range(1) > df.mapInPandas(lambda x: x, df.schema)._plan.print() > {code} > prints as below. It should includes functions. > {code} > > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46082) Fix protobuf string representation for Pandas Functions API with Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-46082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46082. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43991 [https://github.com/apache/spark/pull/43991] > Fix protobuf string representation for Pandas Functions API with Spark Connect > -- > > Key: SPARK-46082 > URL: https://issues.apache.org/jira/browse/SPARK-46082 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > df = spark.range(1) > df.mapInPandas(lambda x: x, df.schema)._plan.print() > {code} > prints as below. It should includes functions. > {code} > > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46085) Dataset.groupingSets in Scala Spark Connect client
Hyukjin Kwon created SPARK-46085: Summary: Dataset.groupingSets in Scala Spark Connect client Key: SPARK-46085 URL: https://issues.apache.org/jira/browse/SPARK-46085 Project: Spark Issue Type: New Feature Components: Connect, SQL Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Scala Spark Connect client for SPARK-45929 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46083) Make SparkNoSuchElementException as a canonical error API
Hyukjin Kwon created SPARK-46083: Summary: Make SparkNoSuchElementException as a canonical error API Key: SPARK-46083 URL: https://issues.apache.org/jira/browse/SPARK-46083 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon https://github.com/apache/spark/pull/43927 added SparkNoSuchElementException. It should be a canonical error API, documented properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46082) Fix protobuf string representation for Pandas Functions API with Spark Connect
Hyukjin Kwon created SPARK-46082: Summary: Fix protobuf string representation for Pandas Functions API with Spark Connect Key: SPARK-46082 URL: https://issues.apache.org/jira/browse/SPARK-46082 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} df = spark.range(1) df.mapInPandas(lambda x: x, df.schema)._plan.print() {code} prints as below. It should includes functions. {code} {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46080) Upgrade Cloudpickle to 3.0.0
Hyukjin Kwon created SPARK-46080: Summary: Upgrade Cloudpickle to 3.0.0 Key: SPARK-46080 URL: https://issues.apache.org/jira/browse/SPARK-46080 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon It includes official support of Python 3.12 (https://github.com/cloudpipe/cloudpickle/pull/517) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46076) Remove `unittest` deprecated alias usage for Python 3.12
[ https://issues.apache.org/jira/browse/SPARK-46076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46076. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43986 [https://github.com/apache/spark/pull/43986] > Remove `unittest` deprecated alias usage for Python 3.12 > > > Key: SPARK-46076 > URL: https://issues.apache.org/jira/browse/SPARK-46076 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46065) Refactor `(DataFrame|Series).factorize()` to use `create_map`.
[ https://issues.apache.org/jira/browse/SPARK-46065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46065. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43970 [https://github.com/apache/spark/pull/43970] > Refactor `(DataFrame|Series).factorize()` to use `create_map`. > -- > > Key: SPARK-46065 > URL: https://issues.apache.org/jira/browse/SPARK-46065 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We can accept Column object for Column.__getitem__ on remote Session, so we > can optimize the existing factorize implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46065) Refactor `(DataFrame|Series).factorize()` to use `create_map`.
[ https://issues.apache.org/jira/browse/SPARK-46065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46065: Assignee: Haejoon Lee > Refactor `(DataFrame|Series).factorize()` to use `create_map`. > -- > > Key: SPARK-46065 > URL: https://issues.apache.org/jira/browse/SPARK-46065 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > We can accept Column object for Column.__getitem__ on remote Session, so we > can optimize the existing factorize implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-46049) Support groupingSets operation in PySpark (Spark Connect)
[ https://issues.apache.org/jira/browse/SPARK-46049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon deleted SPARK-46049: - > Support groupingSets operation in PySpark (Spark Connect) > - > > Key: SPARK-46049 > URL: https://issues.apache.org/jira/browse/SPARK-46049 > Project: Spark > Issue Type: New Feature >Reporter: Hyukjin Kwon >Priority: Major > > Connect version of SPARK-46048 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46063) Improve error messages related to argument types in cute, rollup, groupby, and pivot
[ https://issues.apache.org/jira/browse/SPARK-46063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46063: - Description: {code} >>> spark.range(1).cube(cols=1.2) Traceback (most recent call last): File "", line 1, in File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube raise PySparkTypeError( pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument `cube` should be a Column or str, got float. {code} {code} >>> help(spark.range(1).cube) Help on method cube in module pyspark.sql.connect.dataframe: cube(*cols: 'ColumnOrName') -> 'GroupedData' method of pyspark.sql.connect.dataframe.DataFrame instance Create a multi-dimensional cube for the current :class:`DataFrame` using the specified columns, allowing aggregations to be performed on them. .. versionadded:: 1.4.0 .. versionchanged:: 3.4.0 {code} it has to be {cols} was: {code} >>> spark.range(1).cube(cols=1.2) Traceback (most recent call last): File "", line 1, in File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube raise PySparkTypeError( pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument `cube` should be a Column or str, got float. {code} ``` Help on method cube in module pyspark.sql.connect.dataframe: cube(*cols: 'ColumnOrName') -> 'GroupedData' method of pyspark.sql.connect.dataframe.DataFrame instance Create a multi-dimensional cube for the current :class:`DataFrame` using the specified columns, allowing aggregations to be performed on them. .. versionadded:: 1.4.0 .. versionchanged:: 3.4.0 ``` it has to be {cols} > Improve error messages related to argument types in cute, rollup, groupby, > and pivot > > > Key: SPARK-46063 > URL: https://issues.apache.org/jira/browse/SPARK-46063 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > {code} > >>> spark.range(1).cube(cols=1.2) > Traceback (most recent call last): > File "", line 1, in > File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube > raise PySparkTypeError( > pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument > `cube` should be a Column or str, got float. > {code} > {code} > >>> help(spark.range(1).cube) > Help on method cube in module pyspark.sql.connect.dataframe: > cube(*cols: 'ColumnOrName') -> 'GroupedData' method of > pyspark.sql.connect.dataframe.DataFrame instance > Create a multi-dimensional cube for the current :class:`DataFrame` using > the specified columns, allowing aggregations to be performed on them. > .. versionadded:: 1.4.0 > .. versionchanged:: 3.4.0 > {code} > it has to be {cols} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46063) Improve error messages related to argument types in cute, rollup, groupby, and pivot
[ https://issues.apache.org/jira/browse/SPARK-46063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46063: - Summary: Improve error messages related to argument types in cute, rollup, groupby, and pivot (was: Improve error messages related to argument types in cute, rollup, and pivot) > Improve error messages related to argument types in cute, rollup, groupby, > and pivot > > > Key: SPARK-46063 > URL: https://issues.apache.org/jira/browse/SPARK-46063 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > {code} > >>> spark.range(1).cube(cols=1.2) > Traceback (most recent call last): > File "", line 1, in > File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube > raise PySparkTypeError( > pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument > `cube` should be a Column or str, got float. > {code} > ``` > Help on method cube in module pyspark.sql.connect.dataframe: > cube(*cols: 'ColumnOrName') -> 'GroupedData' method of > pyspark.sql.connect.dataframe.DataFrame instance > Create a multi-dimensional cube for the current :class:`DataFrame` using > the specified columns, allowing aggregations to be performed on them. > .. versionadded:: 1.4.0 > .. versionchanged:: 3.4.0 > ``` > it has to be {cols} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46063) Improve error messages related to argument types in cute, rollup, and pivot
Hyukjin Kwon created SPARK-46063: Summary: Improve error messages related to argument types in cute, rollup, and pivot Key: SPARK-46063 URL: https://issues.apache.org/jira/browse/SPARK-46063 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} >>> spark.range(1).cube(cols=1.2) Traceback (most recent call last): File "", line 1, in File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube raise PySparkTypeError( pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument `cube` should be a Column or str, got float. {code} ``` Help on method cube in module pyspark.sql.connect.dataframe: cube(*cols: 'ColumnOrName') -> 'GroupedData' method of pyspark.sql.connect.dataframe.DataFrame instance Create a multi-dimensional cube for the current :class:`DataFrame` using the specified columns, allowing aggregations to be performed on them. .. versionadded:: 1.4.0 .. versionchanged:: 3.4.0 ``` it has to be {cols} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46061) Add the test party for reattach test case
[ https://issues.apache.org/jira/browse/SPARK-46061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46061: Assignee: Hyukjin Kwon > Add the test party for reattach test case > - > > Key: SPARK-46061 > URL: https://issues.apache.org/jira/browse/SPARK-46061 > Project: Spark > Issue Type: Test > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > We need the same test "ReleaseSession releases all queries and does not allow > more requests in the session" added in SPARK-45798 to identify an issue like > SPARK-46042. > This is caused by SPARK-46039 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46061) Add the test party for reattach test case
[ https://issues.apache.org/jira/browse/SPARK-46061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46061. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43965 [https://github.com/apache/spark/pull/43965] > Add the test party for reattach test case > - > > Key: SPARK-46061 > URL: https://issues.apache.org/jira/browse/SPARK-46061 > Project: Spark > Issue Type: Test > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We need the same test "ReleaseSession releases all queries and does not allow > more requests in the session" added in SPARK-45798 to identify an issue like > SPARK-46042. > This is caused by SPARK-46039 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45600) Make Python data source registration session level
[ https://issues.apache.org/jira/browse/SPARK-45600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45600. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43742 [https://github.com/apache/spark/pull/43742] > Make Python data source registration session level > -- > > Key: SPARK-45600 > URL: https://issues.apache.org/jira/browse/SPARK-45600 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, registered data sources are stored in `sharedState` and can be > accessed across multiple sessions. This, however, will not work with Spark > Connect. We should make this registration session level, and support static > registration (e.g. using pip install) in the future. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45600) Make Python data source registration session level
[ https://issues.apache.org/jira/browse/SPARK-45600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45600: Assignee: Allison Wang > Make Python data source registration session level > -- > > Key: SPARK-45600 > URL: https://issues.apache.org/jira/browse/SPARK-45600 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > Currently, registered data sources are stored in `sharedState` and can be > accessed across multiple sessions. This, however, will not work with Spark > Connect. We should make this registration session level, and support static > registration (e.g. using pip install) in the future. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46061) Add the test party for reattach test case
Hyukjin Kwon created SPARK-46061: Summary: Add the test party for reattach test case Key: SPARK-46061 URL: https://issues.apache.org/jira/browse/SPARK-46061 Project: Spark Issue Type: New Feature Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon We need the same test "ReleaseSession releases all queries and does not allow more requests in the session" added in SPARK-45798 to identify an issue like SPARK-46042. This is caused by SPARK-46039 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46061) Add the test party for reattach test case
[ https://issues.apache.org/jira/browse/SPARK-46061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46061: - Issue Type: Test (was: New Feature) > Add the test party for reattach test case > - > > Key: SPARK-46061 > URL: https://issues.apache.org/jira/browse/SPARK-46061 > Project: Spark > Issue Type: Test > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > We need the same test "ReleaseSession releases all queries and does not allow > more requests in the session" added in SPARK-45798 to identify an issue like > SPARK-46042. > This is caused by SPARK-46039 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46048) Support groupingSets operation in PySpark
[ https://issues.apache.org/jira/browse/SPARK-46048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46048. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43951 [https://github.com/apache/spark/pull/43951] > Support groupingSets operation in PySpark > - > > Key: SPARK-46048 > URL: https://issues.apache.org/jira/browse/SPARK-46048 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Python version of SPARK-45929 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46048) Support groupingSets operation in PySpark
[ https://issues.apache.org/jira/browse/SPARK-46048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46048: Assignee: Hyukjin Kwon > Support groupingSets operation in PySpark > - > > Key: SPARK-46048 > URL: https://issues.apache.org/jira/browse/SPARK-46048 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > Python version of SPARK-45929 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46048) Support groupingSets operation in PySpark
[ https://issues.apache.org/jira/browse/SPARK-46048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46048: - Issue Type: New Feature (was: Bug) > Support groupingSets operation in PySpark > - > > Key: SPARK-46048 > URL: https://issues.apache.org/jira/browse/SPARK-46048 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > Python version of SPARK-45929 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46048) Support groupingSets operation in PySpark
Hyukjin Kwon created SPARK-46048: Summary: Support groupingSets operation in PySpark Key: SPARK-46048 URL: https://issues.apache.org/jira/browse/SPARK-46048 Project: Spark Issue Type: Bug Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Python version of SPARK-45929 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46049) Support groupingSets operation in PySpark (Spark Connect)
Hyukjin Kwon created SPARK-46049: Summary: Support groupingSets operation in PySpark (Spark Connect) Key: SPARK-46049 URL: https://issues.apache.org/jira/browse/SPARK-46049 Project: Spark Issue Type: New Feature Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Connect version of SPARK-46048 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46017) PySpark doc build doesn't work properly on Mac
[ https://issues.apache.org/jira/browse/SPARK-46017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46017: Assignee: Haejoon Lee > PySpark doc build doesn't work properly on Mac > -- > > Key: SPARK-46017 > URL: https://issues.apache.org/jira/browse/SPARK-46017 > Project: Spark > Issue Type: Bug > Components: Build, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > PySpark doc build is working properly on GitHub CI, but doesn't work properly > on local Mac env for some reason. We should investigate and fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46022) Remove deprecated functions APIs from documents
[ https://issues.apache.org/jira/browse/SPARK-46022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46022. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43932 [https://github.com/apache/spark/pull/43932] > Remove deprecated functions APIs from documents > --- > > Key: SPARK-46022 > URL: https://issues.apache.org/jira/browse/SPARK-46022 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We should not expose the deprecated APIs on official documentation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46022) Remove deprecated functions APIs from documents
[ https://issues.apache.org/jira/browse/SPARK-46022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46022: Assignee: Haejoon Lee > Remove deprecated functions APIs from documents > --- > > Key: SPARK-46022 > URL: https://issues.apache.org/jira/browse/SPARK-46022 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > We should not expose the deprecated APIs on official documentation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46013) Improve basic datasource examples
[ https://issues.apache.org/jira/browse/SPARK-46013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46013: Assignee: Allison Wang > Improve basic datasource examples > - > > Key: SPARK-46013 > URL: https://issues.apache.org/jira/browse/SPARK-46013 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > We should improve the Python examples on this page: > [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html] > (basic_datasource_examples.py) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46017) PySpark doc build doesn't work properly on Mac
[ https://issues.apache.org/jira/browse/SPARK-46017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46017. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43932 [https://github.com/apache/spark/pull/43932] > PySpark doc build doesn't work properly on Mac > -- > > Key: SPARK-46017 > URL: https://issues.apache.org/jira/browse/SPARK-46017 > Project: Spark > Issue Type: Bug > Components: Build, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 4.0.0 > > > PySpark doc build is working properly on GitHub CI, but doesn't work properly > on local Mac env for some reason. We should investigate and fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46013) Improve basic datasource examples
[ https://issues.apache.org/jira/browse/SPARK-46013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46013. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43917 [https://github.com/apache/spark/pull/43917] > Improve basic datasource examples > - > > Key: SPARK-46013 > URL: https://issues.apache.org/jira/browse/SPARK-46013 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We should improve the Python examples on this page: > [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html] > (basic_datasource_examples.py) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46042) Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite
[ https://issues.apache.org/jira/browse/SPARK-46042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46042: - Description: https://github.com/apache/spark/pull/43942#issuecomment-1821896165 > Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite > > > Key: SPARK-46042 > URL: https://issues.apache.org/jira/browse/SPARK-46042 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > https://github.com/apache/spark/pull/43942#issuecomment-1821896165 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f
[ https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788430#comment-17788430 ] Hyukjin Kwon commented on SPARK-46032: -- Are executors using the same versions too? The error is most likely from a different version of JDK and Scala version. I can't reproduce them locally so sharing fulll specification of the server and the client would be very helpful. > connect: cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f > - > > Key: SPARK-46032 > URL: https://issues.apache.org/jira/browse/SPARK-46032 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Bobby Wang >Priority: Major > > I downloaded spark 3.5 from the spark official website, and then I started a > Spark Standalone cluster in which both master and the only worker are in the > same node. > > Then I started the connect server by > {code:java} > start-connect-server.sh \ > --master spark://10.19.183.93:7077 \ > --packages org.apache.spark:spark-connect_2.12:3.5.0 \ > --conf spark.executor.cores=12 \ > --conf spark.task.cpus=1 \ > --executor-memory 30G \ > --conf spark.executor.resource.gpu.amount=1 \ > --conf spark.task.resource.gpu.amount=0.08 \ > --driver-memory 1G{code} > > I can 100% ensure the spark standalone cluster, the connect server and spark > driver are started observed from the webui. > > Finally, I tried to run a very simple spark job > (spark.range(100).filter("id>2").collect()) from spark-connect-client using > pyspark, but I got the below error. > > _pyspark --remote sc://localhost_ > _Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] on linux_ > _Type "help", "copyright", "credits" or "license" for more information._ > _Welcome to_ > _ ___ > _/ __/_ {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_ > {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/ '{_}/{_} > {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\ version 3.5.0{_} > {_}/{_}/_ > > _Using Python version 3.10.0 (default, Mar 3 2022 09:58:08)_ > _Client connected to the Spark Connect server at localhost_ > _SparkSession available as 'spark'._ > _>>> spark.range(100).filter("id > 3").collect()_ > _Traceback (most recent call last):_ > _File "", line 1, in _ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py", > line 1645, in collect_ > _table, schema = self._session.client.to_table(query)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 858, in to_table_ > _table, schema, _, _, _ = self._execute_and_fetch(req)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1282, in _execute_and_fetch_ > _for response in self._execute_and_fetch_as_iterator(req):_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1263, in _execute_and_fetch_as_iterator_ > _self._handle_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1502, in _handle_error_ > _self._handle_rpc_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1538, in _handle_rpc_error_ > _raise convert_exception(info, status.message) from None_ > _pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot > assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD_ > _at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_ > _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at
[jira] [Commented] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f
[ https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788381#comment-17788381 ] Hyukjin Kwon commented on SPARK-46032: -- and can you run without Spark Connect? Seems like just regular Spark shell would fail given the error messages. > connect: cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f > - > > Key: SPARK-46032 > URL: https://issues.apache.org/jira/browse/SPARK-46032 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Bobby Wang >Priority: Major > > I downloaded spark 3.5 from the spark official website, and then I started a > Spark Standalone cluster in which both master and the only worker are in the > same node. > > Then I started the connect server by > {code:java} > start-connect-server.sh \ > --master spark://10.19.183.93:7077 \ > --packages org.apache.spark:spark-connect_2.12:3.5.0 \ > --conf spark.executor.cores=12 \ > --conf spark.task.cpus=1 \ > --executor-memory 30G \ > --conf spark.executor.resource.gpu.amount=1 \ > --conf spark.task.resource.gpu.amount=0.08 \ > --driver-memory 1G{code} > > I can 100% ensure the spark standalone cluster, the connect server and spark > driver are started observed from the webui. > > Finally, I tried to run a very simple spark job > (spark.range(100).filter("id>2").collect()) from spark-connect-client using > pyspark, but I got the below error. > > _pyspark --remote sc://localhost_ > _Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] on linux_ > _Type "help", "copyright", "credits" or "license" for more information._ > _Welcome to_ > _ ___ > _/ __/_ {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_ > {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/ '{_}/{_} > {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\ version 3.5.0{_} > {_}/{_}/_ > > _Using Python version 3.10.0 (default, Mar 3 2022 09:58:08)_ > _Client connected to the Spark Connect server at localhost_ > _SparkSession available as 'spark'._ > _>>> spark.range(100).filter("id > 3").collect()_ > _Traceback (most recent call last):_ > _File "", line 1, in _ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py", > line 1645, in collect_ > _table, schema = self._session.client.to_table(query)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 858, in to_table_ > _table, schema, _, _, _ = self._execute_and_fetch(req)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1282, in _execute_and_fetch_ > _for response in self._execute_and_fetch_as_iterator(req):_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1263, in _execute_and_fetch_as_iterator_ > _self._handle_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1502, in _handle_error_ > _self._handle_rpc_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1538, in _handle_rpc_error_ > _raise convert_exception(info, status.message) from None_ > _pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot > assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD_ > _at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_ > _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at
[jira] [Commented] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f
[ https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788380#comment-17788380 ] Hyukjin Kwon commented on SPARK-46032: -- What's your Scala version [~wbo4958]? > connect: cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f > - > > Key: SPARK-46032 > URL: https://issues.apache.org/jira/browse/SPARK-46032 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Bobby Wang >Priority: Major > > I downloaded spark 3.5 from the spark official website, and then I started a > Spark Standalone cluster in which both master and the only worker are in the > same node. > > Then I started the connect server by > {code:java} > start-connect-server.sh \ > --master spark://10.19.183.93:7077 \ > --packages org.apache.spark:spark-connect_2.12:3.5.0 \ > --conf spark.executor.cores=12 \ > --conf spark.task.cpus=1 \ > --executor-memory 30G \ > --conf spark.executor.resource.gpu.amount=1 \ > --conf spark.task.resource.gpu.amount=0.08 \ > --driver-memory 1G{code} > > I can 100% ensure the spark standalone cluster, the connect server and spark > driver are started observed from the webui. > > Finally, I tried to run a very simple spark job > (spark.range(100).filter("id>2").collect()) from spark-connect-client using > pyspark, but I got the below error. > > _pyspark --remote sc://localhost_ > _Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] on linux_ > _Type "help", "copyright", "credits" or "license" for more information._ > _Welcome to_ > _ ___ > _/ __/_ {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_ > {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/ '{_}/{_} > {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\ version 3.5.0{_} > {_}/{_}/_ > > _Using Python version 3.10.0 (default, Mar 3 2022 09:58:08)_ > _Client connected to the Spark Connect server at localhost_ > _SparkSession available as 'spark'._ > _>>> spark.range(100).filter("id > 3").collect()_ > _Traceback (most recent call last):_ > _File "", line 1, in _ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py", > line 1645, in collect_ > _table, schema = self._session.client.to_table(query)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 858, in to_table_ > _table, schema, _, _, _ = self._execute_and_fetch(req)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1282, in _execute_and_fetch_ > _for response in self._execute_and_fetch_as_iterator(req):_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1263, in _execute_and_fetch_as_iterator_ > _self._handle_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1502, in _handle_error_ > _self._handle_rpc_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1538, in _handle_rpc_error_ > _raise convert_exception(info, status.message) from None_ > _pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot > assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD_ > _at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_ > _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at
[jira] [Comment Edited] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f
[ https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788380#comment-17788380 ] Hyukjin Kwon edited comment on SPARK-46032 at 11/21/23 11:34 AM: - What's your Scala and JDK versions [~wbo4958]? was (Author: gurwls223): What's your Scala version [~wbo4958]? > connect: cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f > - > > Key: SPARK-46032 > URL: https://issues.apache.org/jira/browse/SPARK-46032 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Bobby Wang >Priority: Major > > I downloaded spark 3.5 from the spark official website, and then I started a > Spark Standalone cluster in which both master and the only worker are in the > same node. > > Then I started the connect server by > {code:java} > start-connect-server.sh \ > --master spark://10.19.183.93:7077 \ > --packages org.apache.spark:spark-connect_2.12:3.5.0 \ > --conf spark.executor.cores=12 \ > --conf spark.task.cpus=1 \ > --executor-memory 30G \ > --conf spark.executor.resource.gpu.amount=1 \ > --conf spark.task.resource.gpu.amount=0.08 \ > --driver-memory 1G{code} > > I can 100% ensure the spark standalone cluster, the connect server and spark > driver are started observed from the webui. > > Finally, I tried to run a very simple spark job > (spark.range(100).filter("id>2").collect()) from spark-connect-client using > pyspark, but I got the below error. > > _pyspark --remote sc://localhost_ > _Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] on linux_ > _Type "help", "copyright", "credits" or "license" for more information._ > _Welcome to_ > _ ___ > _/ __/_ {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_ > {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/ '{_}/{_} > {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\ version 3.5.0{_} > {_}/{_}/_ > > _Using Python version 3.10.0 (default, Mar 3 2022 09:58:08)_ > _Client connected to the Spark Connect server at localhost_ > _SparkSession available as 'spark'._ > _>>> spark.range(100).filter("id > 3").collect()_ > _Traceback (most recent call last):_ > _File "", line 1, in _ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py", > line 1645, in collect_ > _table, schema = self._session.client.to_table(query)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 858, in to_table_ > _table, schema, _, _, _ = self._execute_and_fetch(req)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1282, in _execute_and_fetch_ > _for response in self._execute_and_fetch_as_iterator(req):_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1263, in _execute_and_fetch_as_iterator_ > _self._handle_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1502, in _handle_error_ > _self._handle_rpc_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1538, in _handle_rpc_error_ > _raise convert_exception(info, status.message) from None_ > _pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot > assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD_ > _at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_ > _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
[jira] [Updated] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f
[ https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46032: - Priority: Major (was: Blocker) > connect: cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f > - > > Key: SPARK-46032 > URL: https://issues.apache.org/jira/browse/SPARK-46032 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Bobby Wang >Priority: Major > > I downloaded spark 3.5 from the spark official website, and then I started a > Spark Standalone cluster in which both master and the only worker are in the > same node. > > Then I started the connect server by > {code:java} > start-connect-server.sh \ > --master spark://10.19.183.93:7077 \ > --packages org.apache.spark:spark-connect_2.12:3.5.0 \ > --conf spark.executor.cores=12 \ > --conf spark.task.cpus=1 \ > --executor-memory 30G \ > --conf spark.executor.resource.gpu.amount=1 \ > --conf spark.task.resource.gpu.amount=0.08 \ > --driver-memory 1G{code} > > I can 100% ensure the spark standalone cluster, the connect server and spark > driver are started observed from the webui. > > Finally, I tried to run a very simple spark job > (spark.range(100).filter("id>2").collect()) from spark-connect-client using > pyspark, but I got the below error. > > _pyspark --remote sc://localhost_ > _Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] on linux_ > _Type "help", "copyright", "credits" or "license" for more information._ > _Welcome to_ > _ ___ > _/ __/_ {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_ > {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/ '{_}/{_} > {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\ version 3.5.0{_} > {_}/{_}/_ > > _Using Python version 3.10.0 (default, Mar 3 2022 09:58:08)_ > _Client connected to the Spark Connect server at localhost_ > _SparkSession available as 'spark'._ > _>>> spark.range(100).filter("id > 3").collect()_ > _Traceback (most recent call last):_ > _File "", line 1, in _ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py", > line 1645, in collect_ > _table, schema = self._session.client.to_table(query)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 858, in to_table_ > _table, schema, _, _, _ = self._execute_and_fetch(req)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1282, in _execute_and_fetch_ > _for response in self._execute_and_fetch_as_iterator(req):_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1263, in _execute_and_fetch_as_iterator_ > _self._handle_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1502, in _handle_error_ > _self._handle_rpc_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1538, in _handle_rpc_error_ > _raise convert_exception(info, status.message) from None_ > _pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot > assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD_ > _at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_ > _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)_ > _at
[jira] [Assigned] (SPARK-46023) Annotate parameters at docstrings in pyspark.sql module
[ https://issues.apache.org/jira/browse/SPARK-46023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46023: Assignee: Hyukjin Kwon > Annotate parameters at docstrings in pyspark.sql module > --- > > Key: SPARK-46023 > URL: https://issues.apache.org/jira/browse/SPARK-46023 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > See PR -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46023) Annotate parameters at docstrings in pyspark.sql module
[ https://issues.apache.org/jira/browse/SPARK-46023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46023. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43925 [https://github.com/apache/spark/pull/43925] > Annotate parameters at docstrings in pyspark.sql module > --- > > Key: SPARK-46023 > URL: https://issues.apache.org/jira/browse/SPARK-46023 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > See PR -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46026) Refine docstring of UDTF
[ https://issues.apache.org/jira/browse/SPARK-46026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46026: Assignee: Hyukjin Kwon > Refine docstring of UDTF > > > Key: SPARK-46026 > URL: https://issues.apache.org/jira/browse/SPARK-46026 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46026) Refine docstring of UDTF
[ https://issues.apache.org/jira/browse/SPARK-46026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46026. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43928 [https://github.com/apache/spark/pull/43928] > Refine docstring of UDTF > > > Key: SPARK-46026 > URL: https://issues.apache.org/jira/browse/SPARK-46026 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46024) Document parameters and examples for RuntimeConf get, set and unset
[ https://issues.apache.org/jira/browse/SPARK-46024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46024: Assignee: Hyukjin Kwon > Document parameters and examples for RuntimeConf get, set and unset > --- > > Key: SPARK-46024 > URL: https://issues.apache.org/jira/browse/SPARK-46024 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46024) Document parameters and examples for RuntimeConf get, set and unset
[ https://issues.apache.org/jira/browse/SPARK-46024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46024. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43927 [https://github.com/apache/spark/pull/43927] > Document parameters and examples for RuntimeConf get, set and unset > --- > > Key: SPARK-46024 > URL: https://issues.apache.org/jira/browse/SPARK-46024 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46027) Add `Python 3.12` to the Daily Python Github Action job
[ https://issues.apache.org/jira/browse/SPARK-46027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46027. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43929 [https://github.com/apache/spark/pull/43929] > Add `Python 3.12` to the Daily Python Github Action job > --- > > Key: SPARK-46027 > URL: https://issues.apache.org/jira/browse/SPARK-46027 > Project: Spark > Issue Type: Sub-task > Components: Project Infra, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46027) Add `Python 3.12` to the Daily Python Github Action job
Hyukjin Kwon created SPARK-46027: Summary: Add `Python 3.12` to the Daily Python Github Action job Key: SPARK-46027 URL: https://issues.apache.org/jira/browse/SPARK-46027 Project: Spark Issue Type: Sub-task Components: Project Infra, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46004) Refine docstring of `DataFrame.dropna/fillna/replace`
[ https://issues.apache.org/jira/browse/SPARK-46004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46004: Assignee: BingKun Pan > Refine docstring of `DataFrame.dropna/fillna/replace` > - > > Key: SPARK-46004 > URL: https://issues.apache.org/jira/browse/SPARK-46004 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46004) Refine docstring of `DataFrame.dropna/fillna/replace`
[ https://issues.apache.org/jira/browse/SPARK-46004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46004. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43907 [https://github.com/apache/spark/pull/43907] > Refine docstring of `DataFrame.dropna/fillna/replace` > - > > Key: SPARK-46004 > URL: https://issues.apache.org/jira/browse/SPARK-46004 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46026) Refine docstring of UDTF
Hyukjin Kwon created SPARK-46026: Summary: Refine docstring of UDTF Key: SPARK-46026 URL: https://issues.apache.org/jira/browse/SPARK-46026 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-46025) Support Python 3.12 in PySpark
[ https://issues.apache.org/jira/browse/SPARK-46025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon deleted SPARK-46025: - > Support Python 3.12 in PySpark > -- > > Key: SPARK-46025 > URL: https://issues.apache.org/jira/browse/SPARK-46025 > Project: Spark > Issue Type: Improvement >Reporter: Hyukjin Kwon >Priority: Major > > Python 3.12 is released out. We should make sure the tests pass, and mark it > supported in setup.py -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46025) Support Python 3.12 in PySpark
[ https://issues.apache.org/jira/browse/SPARK-46025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46025: - Issue Type: Improvement (was: Bug) > Support Python 3.12 in PySpark > -- > > Key: SPARK-46025 > URL: https://issues.apache.org/jira/browse/SPARK-46025 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > Python 3.12 is released out. We should make sure the tests pass, and mark it > supported in setup.py -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46025) Support Python 3.12 in PySpark
Hyukjin Kwon created SPARK-46025: Summary: Support Python 3.12 in PySpark Key: SPARK-46025 URL: https://issues.apache.org/jira/browse/SPARK-46025 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Python 3.12 is released out. We should make sure the tests pass, and mark it supported in setup.py -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org